Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MobileDet on TFLite #21

Closed
khanhlvg opened this issue Sep 11, 2020 · 36 comments
Closed

MobileDet on TFLite #21

khanhlvg opened this issue Sep 11, 2020 · 36 comments

Comments

@khanhlvg
Copy link

khanhlvg commented Sep 11, 2020

@sayakpaul , could you help trying out the MobileDet model for TFLite?

  1. Convert the pretrained MobileDet TF1 model to TFLite using the existing export script and convert with latest (v2.3) TFLiteConverter.
  2. Put together an e2e guide on training a MobileDet model with the pet dataset and converting to TFLite.

@farmaker47 FYI

@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 11, 2020

@khanhlvg I tried to convert the pre-trained checkpoints with the latest TFLite exporter script but I ran into issues. I tried the different MobileDet variants available here. In all of them, I ran into a common issue of

ValueError: ssd_mobiledet_dsp is not supported. See `model_builder.py` for features extractors compatible with different versions of Tensorflow` (checkpoint name varies). 

Here's the detailed trace: https://pastebin.com/70hgfiTJ.

Here's my Colab Notebook.

@khanhlvg
Copy link
Author

khanhlvg commented Sep 11, 2020

MobileDet is implemented in TF1 so you'll need to use the older export script and use %tensorflow_version 1.x instead of tf-nightly.

@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 11, 2020

@khanhlvg I was able to convert the different variants of MobileDet using the 2.3 TFLiteConverter. A couple of things to note:

  • The exporter scripts generate a frozen graph and not a SavedModel. Hence, I used tf.compat.v1.lite.TFLiteConverter.from_frozen_graph.

  • According to the instructions here I felt adding the add_postprocessing_op flag during the TFLite compatible graph export is a good option since it adds optimized postprocessing ops to the graph.

  • Now while doing the actual TFLite conversion, I constructed the converter object in the following way -

    converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(
      graph_def_file='tflite_graph.pb', 
      input_arrays=['normalized_input_image_tensor'],
      output_arrays=['raw_outputs/box_encodings', 'raw_outputs/class_predictions', 'anchors'],
      input_shapes={'normalized_input_image_tensor': [1, 320, 320, 3]}
    )
    

    But note the original output tensor of the graph and its inputs -

    Screen Shot 2020-09-11 at 12 35 09 PM

    When I tried with the actual output tensor it resulted in errors.

Size stat:

The original TFLite compatible graph is about 17.4 MB and using the dynamic-range quantization I was able to get it to 4.3 MB.

All of these have been updated in the Colab Notebook linked above.

@khanhlvg
Copy link
Author

khanhlvg commented Sep 11, 2020

The conversion code is incorrect. I rewrote it following this guide and it works. Could you retry?

converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(
    graph_def_file='tflite_graph.pb', 
    input_arrays=['normalized_input_image_tensor'],
    output_arrays=['TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3'],
    input_shapes={'normalized_input_image_tensor': [1, 320, 320, 3]}
)

converter.allow_custom_ops = True
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()

@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 11, 2020

@khanhlvg my bad. I had to carefully read this part -

image

I will modify it and will try out inference.

Quick question: Does it make sense to upload it to TF-Hub?

@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 11, 2020

@khanhlvg I updated the Colab Notebook to include more utilities and also inference results. Would be great if you could give it a round of review. Also, the models are pretty fast. I will share the benchmark results soon.

I have uploaded the models here (gs://demo-experiments/MobileDet) and here's the structure -

├── ssd_mobiledet_cpu_coco
│   ├── ssd_mobiledet_cpu_coco_dr.tflite
│   └── ssd_mobiledet_cpu_coco_fp16.tflite
├── ssd_mobiledet_dsp_coco
│   └── fp32
│       ├── ssd_mobiledet_dsp_coco_fp32_dr.tflite
│       └── ssd_mobiledet_dsp_coco_fp32_fp16.tflite
└── ssd_mobiledet_edgetpu_coco
    └── fp32
        ├── ssd_mobiledet_edgetpu_coco_fp32_dr.tflite
        └── ssd_mobiledet_edgetpu_coco_fp32_fp16.tflite

The EdgeTPU and DSP variants of the model contain fp32 and uint8. While the conversion with fp32 variants went smooth it wasn't the case for uint8. The output tensors of the generated graph, in that case, varies from what's been specified in the guide. Could you take a look?

@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 11, 2020

@khanhlvg benchmarking results:

image

The timing denotes inference_avg (ms). Benchmarking performed on Pixel 4.

@khanhlvg
Copy link
Author

khanhlvg commented Sep 11, 2020

Quick question: Does it make sense to upload it to TF-Hub?

For the MobileDet models trained on COCO, let me contact the author of the paper and asked them to upload their model to TFHub as they already have the converted TFLite model included with the checkpoints.

For the MobileDet models trained with other dataset (e.g. pet detector), definitely yes :)

@khanhlvg I updated the Colab Notebook to include more utilities and also inference results. Would be great if you could give it a round of review.

The notebook mostly LGTM. I have only a few comments:

  • ssd_mobiledet_dsp_coco and ssd_mobiledet_edgetpu_coco are optimized for Hexagon DSP and EdgeTPU so their TFLite dr, fp16 and float versions actually don't have much practical usage. As you can see in the benchmark, they're much slower than their counterpart (ssd_mobiledet_cpu_coco).
  • Maybe add a few lines to the notebook to clarify the supported platform details. They are available in the paper but may not be obvious for ML folks who don't work extensively with mobile.

The EdgeTPU and DSP variants of the model contain fp32 and uint8. While the conversion with fp32 variants went smooth it wasn't the case for uint8. The output tensors of the generated graph, in that case, varies from what's been specified in the guide. Could you take a look?

I don't see the int8 conversion in your notebook. Could you point me to where they are?

@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 12, 2020

For the MobileDet models trained on COCO, let me contact the author of the paper and asked them to upload their model to TFHub as they already have the converted TFLite model included with the checkpoints.

Sure. Their checkpoints contain TFLite models already, so that's covered but my guess is that those models probably weren't generated using the latest TFLiteConverter.

For the MobileDet models trained with other dataset (e.g. pet detector), definitely yes :)

Thanks for the suggestion. Although that dataset is great for demonstrating results, workflows, etc. but I don't think it's going to be practically very useful for the broader community.

ssd_mobiledet_dsp_coco and ssd_mobiledet_edgetpu_coco are optimized for Hexagon DSP and EdgeTPU so their TFLite dr, fp16 and float versions actually don't have much practical usage. As you can see in the benchmark, they're much slower than their counterpart (ssd_mobiledet_cpu_coco).

Got it. So, I am guessing integer quantization should be the way to go here?

Maybe add a few lines to the notebook to clarify the supported platform details. They are available in the paper but may not be obvious for ML folks who don't work extensively with mobile.

Could you expand on this a bit more? How you would envision this section in the notebook?

I don't see the int8 conversion in your notebook. Could you point me to where they are?

That's because I did not perform integer quantization yet. Downloading the entire COCO training dataset (that would be needed for the representative dataset) might be painful so, what would you suggest here?

@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 12, 2020

A number of updates -

I included int8 quantization in the Colab Notebook. Here's how I prepared the representative dataset -

  • Downloaded the train2014 split (~14 GB) from here.
  • Subsampled 100 images (more on this later). I have hosted this version here in order for other people to use it.
  • Then I followed the same preprocessing utility functions to create the generator.

Impact on the different models

  • EdgeTPU and the DSP variants come in fp32 and uint8 formats (see here). For the fp32 format, the integer quantization went seamlessly. For uint8 I ran into issues. See here. The predictions seemed quite good too.
  • For the normal CPU variant of MobileDet, integer quantization resulted in a poor quality model. I experimented with a varied amount of images for the representative dataset and also the probability threshold but that did not help much.

Note on the full integer enforcement

It's currently giving a not-supported error. I tried with tf-nightly as well.

I will be back with the benchmarking results for the models I have been able to export -

  • ssd_mobiledet_dsp_coco_fp32_int8.tflite
  • ssd_mobiledet_edgetpu_coco_fp32_int8.tflite

Let me know if anything is unclear @khanhlvg.

@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 12, 2020

Here are the benchmarking results -

ssd_mobiledet_edgetpu_coco_fp32_int8

inference_avg (ms)
                  CPU w/ 4 threads  CPU w/ 4 threads (XNNPACK)
device                                                        
Pixel 4 (API 29)           23.7463                     18.7443

ssd_mobiledet_dsp_coco_fp32_int8

inference_avg (ms)
                  CPU w/ 4 threads  CPU w/ 4 threads (XNNPACK)  
device                                                                     
Pixel 4 (API 29)           22.5022                     22.5228    

@khanhlvg
Copy link
Author

khanhlvg commented Sep 14, 2020

Sure. Their checkpoints contain TFLite models already, so that's covered but my guess is that those models probably weren't generated using the latest TFLiteConverter.

I'm not sure about this. The TFLite model bundled with the checkpoint was converted with the MLIR converter so it should be a recent TFLiteConverter though.

Maybe add a few lines to the notebook to clarify the supported platform details. They are available in the paper but may not be obvious for ML folks who don't work extensively with mobile.

I meant to point out in the notebook that the ssd_mobiledet_edgetpu_coco and ssd_mobiledet_dsp_coco are supposed to be integer only quantized to be supported by EdgeTPU and DSP, as the two hardware accelerators only support integer models.

It's currently giving a not-supported error. I tried with tf-nightly as well.

I took a closer look at the checkpoint and found that the uint8 in ssdlite_mobiledet_dsp_320x320_coco_2020_05_19/uint8/ models are actually QAT models using the TF1 QAT mechanism. To convert those model, you'll need some trick:

converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(
    graph_def_file=model_to_be_quantized, 
    input_arrays=['normalized_input_image_tensor'],
    output_arrays=['TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3'],
    input_shapes={'normalized_input_image_tensor': [1, 320, 320, 3]}
)
converter.allow_custom_ops = True
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
converter.quantized_input_stats = {"normalized_input_image_tensor": (128, 128)}
tflite_model = converter.convert()

Please note that you must not specify optimization property in the converter.

I'll look more into if we can integer-only quantize the ssdlite_mobiledet_dsp_320x320_coco_2020_05_19/fp32/ model.

@khanhlvg
Copy link
Author

khanhlvg commented Sep 14, 2020

The EdgeTPU uint8 model (ssd_mobiledet_edgetpu_coco_uint8)'s model output is completely different from the other models and I'm confused too. I'll need to investigate more.

@khanhlvg
Copy link
Author

khanhlvg commented Sep 14, 2020

I was able to integer quantize the fp32 (no QAT) version of the ssd_mobiledet_edgetpu_coco and ssd_mobiledet_dsp_coco variants. Please see this Notebook for details.

An important point here is that the final post processing custom ops doesn't support int8, so you CANNOT specify these parameters to make conversion possible.

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_output_type = tf.uint8

As you seems to have a Pixel 4, I'd suggest benchmarking the two models with the respective hardware accelerator to see the performance boost.

  • For ssd_mobiledet_dsp_coco_fp32_int8 and ssd_mobiledet_dsp_coco_uint8_qat : Specify use_hexagon=true in the benchmark tool
  • For ssd_mobiledet_edgetpu_coco_fp32_int8 and ssd_mobiledet_edgetpu_coco_uint8_qat : Specify use_nnapi=true in the benchmark tool

I think the MobileDet conversion is tricky enough that we should have an e2e blog post guiding it. Unfortunately as it's implemented in TF1, we can't get them on TF Blog. WDYT about writing a personal blog detailing these steps?

@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 15, 2020

@khanhlvg one doubt I forgot to ask during our discussion today

Please note that you must not specify optimization property in the converter.

I am guessing this is because during QAT the fake nodes that get inserted to the graph are in int, so setting this might lead to inconsistencies?

@khanhlvg
Copy link
Author

khanhlvg commented Sep 15, 2020

I am guessing this is because during QAT the fake nodes that get inserted to the graph are in int, so setting this might lead to inconsistencies?

Yes that's correct.

@khanhlvg
Copy link
Author

khanhlvg commented Sep 15, 2020

The ssdlite_mobiledet_dsp_320x320_coco_2020_05_19/uint8 bundled TFLite model also use mean = 128, std = 128 quantized stats so I suspect that the fp32 and uint8 models are trained with different normalization method.

image

@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 15, 2020

Okay. Also referring back to

ssd_mobiledet_dsp_coco and ssd_mobiledet_edgetpu_coco are optimized for Hexagon DSP and EdgeTPU so their TFLite dr, fp16 and float versions actually don't have much practical usage. As you can see in the benchmark, they're much slower than their counterpart (ssd_mobiledet_cpu_coco).

I guess we can skip the fp16 conversion for EdgeTPU and DSP variants? WDYT?

@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 15, 2020

Another question is how did you figure out that the FP32 post-processing operations don't support integer? I see the output types are float32, float32, float32, float32 but it may not be immediately conceivable that those operations won't support integer precision. Please correct me if I am wrong.

@khanhlvg
Copy link
Author

khanhlvg commented Sep 15, 2020

I guess we can skip the fp16 conversion for EdgeTPU and DSP variants? WDYT?

Yes, they aren't useful so we can skip them.

Another question is how did you figure out that the FP32 post-processing operations don't support integer? I see the output types are float32, float32, float32, float32 but it may not be immediately conceivable that those operations won't support integer precision. Please correct me if I am wrong.

I tried converting with these parameters and the converter returned an op not supported error. You need the tf-nightly build to see the error because the error message was broken in v2.3.

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_output_type = tf.uint8

@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 15, 2020

I see. Very useful, thank you!

@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 15, 2020

I incorporated your suggestions into the conversion notebook. The integer quantized models are performing very unexpectedly. The results are undesirable. Here's my updated Colab Notebook.

Here is the converted TFLite files as a zip.

Let me know your thoughts.

@khanhlvg
Copy link
Author

khanhlvg commented Sep 15, 2020

The way you processed the input looks incorrect so I rewrote that part in this notebook, which include both type of quantization: integer only quantize the fp32 model (model_int8.tflite) and converting the qat uint8 model (model_qat.tflite).

Here is how to preprocess the input:

  • Model with uint8 input: Keep the input image to [0..255] uint8.
  • Model with float32 input: Normalize the input to [0..1] float32.

Minor thing: you may add a threshold to only show detection with probability above a certain level (e.g. 50%) as the result is quite busy if showing all detection.
image

@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 15, 2020

Thanks!

Minor thing: you may add a threshold to only show detection with probability above a certain level (e.g. 50%) as the result is quite busy if showing all detection.

I think it's already there (see the threshold argument) -

# Preprocess image and perform inference
resultant_image = display_results("image.png", interpreter, threshold=0.3)
print(resultant_image.shape)
Image.fromarray(resultant_image)  

@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 15, 2020

Also, according to this comment you are also setting converter.inference_output_type = tf.uint8 but in the notebook you linked above I don't see it. Could you expand on this?

@khanhlvg
Copy link
Author

khanhlvg commented Sep 15, 2020

I think it's already there -

Oh I see. Thanks for pointing out.

It turned out that converter.inference_output_type = tf.uint8 doesn't have any effect as the final TFLite_Detection_PostProcess op is a custom ops that doesn't support uint8 processing. Dequantize ops were automatically added before the op.

@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 15, 2020

Okay. So, what would be the best way for someone to decide when to set it and when not to? One example I have is the error that you got with tf-nightly converter as you mentioned here. But in this case, what would be the approach?

@khanhlvg
Copy link
Author

khanhlvg commented Sep 15, 2020

Unfortunately it'll be trial-and-error. I won't send converter.inference_input/output_type if it's not absolutely necessary (e.g. running on a hardware accelerator that only support integer) to keep the integer quantized model interface compatible with its float version. Most mobile use cases are fine with this. If I need to have integer input/output, I'll try setting both to input/output type to integer and adapt accordingly if there's a conversion error.

@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 15, 2020

That makes sense. I think this is practical advice and we should include these commentaries in the blog post. WDYT?

@khanhlvg
Copy link
Author

khanhlvg commented Sep 15, 2020

SGTM. Let's include the findings we have in the blog post.

@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 15, 2020

Another finding is that I see that you are not including the representative dataset here. Is this expected?

@khanhlvg
Copy link
Author

khanhlvg commented Sep 15, 2020

Another finding is that I see that you are not including the representative dataset here. Is this expected?

Yes, the output range of the each op is already recorded during QAT so there's no need to provide representative dataset during conversion.

@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 15, 2020

That makes sense. The performance is way better now -

download

I also looked into why the integer only quantization for the CPU-based MobileDet models were not performing well and it looks like I have gotten past it. I have updated my notebook to reflect these changes. Could you take a look?

@khanhlvg
Copy link
Author

khanhlvg commented Sep 16, 2020

Great, thanks Sayak!

Minor feedback: I think ssd_mobiledet_dsp_coco_uint8_int8 should be ssd_mobiledet_dsp_coco_uint8_qat to avoid confusing reader that you apply integer only post-training quantization to the uint8 checkpoint.

@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 16, 2020

Alrighty. I will rename it accordingly.

@sayakpaul sayakpaul moved this from In progress to Done in E2E TFLite Tutorials Project Board Sep 26, 2020
@sayakpaul
Copy link
Contributor

sayakpaul commented Sep 26, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants