Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full Integer Quantization Not Working #53

Open
sterlingrpi opened this issue May 22, 2020 · 35 comments
Open

Full Integer Quantization Not Working #53

sterlingrpi opened this issue May 22, 2020 · 35 comments

Comments

@sterlingrpi
Copy link

I tried to get the full int8 quantization by running convert_tflite.py and setting the flag --quantize_mode full_int8. However, I got the following error:

RuntimeError: Quantization not yet supported for op: RESIZE_NEAREST_NEIGHBOR

I gave it a representative dataset and I have all the requirements installed. Has anyone else been able to do a full int8 quantization of yolo? Thank you!

@sterlingrpi sterlingrpi changed the title RuntimeError: Quantization not yet supported for op: RESIZE_NEAREST_NEIGHBOR Full Integer Quantization Not Working May 24, 2020
@mm7721
Copy link

mm7721 commented May 27, 2020

I'm attempting the same thing, but mine died with a seg fault before reaching that point. Traced back to line 51 of /site-packages/tensorflow_core/lite/python/optimize/calibrator.py. Did you encounter that issue too?

@in-die-nibelungen
Copy link

@sterlingrpi

I got the same error, but it could be resolved. In my case, it caused that the image file paths written in val2017.txt, provided with --dataset argument, are wrong, so no image would be provided in the calibration process. Confirm they are correct (absolute paths will be good).

When representative_data_gen() is working, i.e. the image files are found in the calibration process, some numbers will be displayed in console, denoting them by <== HERE in my log shown below:

2020-05-28 08:22:36.338159: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2020-05-28 08:22:36.341075: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-05-28 08:22:36.413729: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:814] Optimization results for grappler item: graph_to_optimize
2020-05-28 08:22:36.416676: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   function_optimizer: function_optimizer did nothing. time = 0.017ms.
2020-05-28 08:22:36.422199: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   function_optimizer: function_optimizer did nothing. time = 0.003ms.
2020-05-28 08:22:49.053607: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2020-05-28 08:22:49.056863: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-05-28 08:22:53.839949: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:814] Optimization results for grappler item: graph_to_optimize
2020-05-28 08:22:53.842214: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   constant_folding: Graph size after: 1356 nodes (-541), 3100 edges (-541), time = 2270.63403ms.
2020-05-28 08:22:53.846921: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   constant_folding: Graph size after: 1356 nodes (0), 3100 edges (0), time = 692.577ms.
0   <== HERE
2  <== HERE
29  <== HERE
50  <== HERE
54  <== HERE
57  <== HERE
60  <== HERE
67  <== HERE
79  <== HERE
82  <== HERE
88  <== HERE
91  <== HERE
95  <== HERE
I0528 08:24:40.245354 140080696526656 convert_tflite.py:83] model saved to: ./data/yolov4-full_int8.tflite
I0528 08:24:40.433028 140080696526656 convert_tflite.py:88] tflite model loaded

Would this help you?

@sterlingrpi
Copy link
Author

I'm attempting the same thing, but mine died with a seg fault before reaching that point. Traced back to line 51 of /site-packages/tensorflow_core/lite/python/optimize/calibrator.py. Did you encounter that issue too?

No, I don't think I ran into that problem. Could you post your error?

I forgot to mention that I originally had an error that LeakRelu was not supported by full 8 bit quantization. I changed it to regular Relu, retrained, and got past that error when quantizing. I now think my new error is because the upsampling operation of Yolo is not supported. see core.common.upsample which uses the function tf.image.resize. Not sure what I can replace that with yet

@sterlingrpi
Copy link
Author

@sterlingrpi

I got the same error, but it could be resolved. In my case, it caused that the image file paths written in val2017.txt, provided with --dataset argument, are wrong, so no image would be provided in the calibration process. Confirm they are correct (absolute paths will be good).

When representative_data_gen() is working, i.e. the image files are found in the calibration process, some numbers will be displayed in console, denoting them by <== HERE in my log shown below:

2020-05-28 08:22:36.338159: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2020-05-28 08:22:36.341075: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-05-28 08:22:36.413729: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:814] Optimization results for grappler item: graph_to_optimize
2020-05-28 08:22:36.416676: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   function_optimizer: function_optimizer did nothing. time = 0.017ms.
2020-05-28 08:22:36.422199: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   function_optimizer: function_optimizer did nothing. time = 0.003ms.
2020-05-28 08:22:49.053607: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2020-05-28 08:22:49.056863: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-05-28 08:22:53.839949: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:814] Optimization results for grappler item: graph_to_optimize
2020-05-28 08:22:53.842214: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   constant_folding: Graph size after: 1356 nodes (-541), 3100 edges (-541), time = 2270.63403ms.
2020-05-28 08:22:53.846921: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816]   constant_folding: Graph size after: 1356 nodes (0), 3100 edges (0), time = 692.577ms.
0   <== HERE
2  <== HERE
29  <== HERE
50  <== HERE
54  <== HERE
57  <== HERE
60  <== HERE
67  <== HERE
79  <== HERE
82  <== HERE
88  <== HERE
91  <== HERE
95  <== HERE
I0528 08:24:40.245354 140080696526656 convert_tflite.py:83] model saved to: ./data/yolov4-full_int8.tflite
I0528 08:24:40.433028 140080696526656 convert_tflite.py:88] tflite model loaded

Would this help you?

Thank you. I am providing it with my own dataset andI created my own .txt file similar to val2017.txt. And I used hard coded paths as you suggested. I don't know if it's my images that are causing the problem. Where did you get your dataset? Just to be sure, are you doing the full int8 quantization and setting the flag --quantize_mode full_int8?

@mm7721
Copy link

mm7721 commented May 28, 2020

Thanks, guys. And sorry for distracting from the issue you're facing - hopefully if I can get to that point, I can help figure it out.

I'm pretty sure my seg fault isn't caused by bad paths to the images. I'd already copied the coco/ directory to the same level as the convert_tflite.py script, and separately called the representative_data_gen() function to make sure it's finding the files (and it is). But it's clearly something going wrong with the steps in representative data calibration, as the seg fault disappears if I remove the line "converter.representative_dataset = representative_data_gen". And as mentioned, it's dying at line 51 of calibrator.py:

Current thread 0x00007f5065f37740 (most recent call first):
File "/home/user1/anaconda3/envs/tf2p1/lib/python3.7/site-packages/tensorflow_core/lite/python/optimize/calibrator.py", line 51 in init
File "/home/user1/anaconda3/envs/tf2p1/lib/python3.7/site-packages/tensorflow_core/lite/python/lite.py", line 240 in _calibrate_quantize_model
File "/home/user1/anaconda3/envs/tf2p1/lib/python3.7/site-packages/tensorflow_core/lite/python/lite.py", line 469 in convert
File "convert_tflite.py", line 105 in save_tflite
File "convert_tflite.py", line 131 in main
File "/home/user1/anaconda3/envs/tf2p1/lib/python3.7/site-packages/absl/app.py", line 250 in _run_main
File "/home/user1/anaconda3/envs/tf2p1/lib/python3.7/site-packages/absl/app.py", line 299 in run
File "convert_tflite.py", line 136 in
Segmentation fault (core dumped)

(Line numbers in convert_tflite are slightly different because I've added code.)

@mm7721
Copy link

mm7721 commented May 28, 2020

I think something might be going wrong in the step before calibration (in the script lite.py):

result = _toco_convert_impl(
    input_data=graph_def,
    input_tensors=input_tensors,
    output_tensors=output_tensors,
    **converter_kwargs)

This step is common to int8, float16, and full_int8 quantization, and produces a tflite flat buffer result, which in the case of int8 and float16 is saved to a .tflite file. full_int8 has an extra step that runs the calibration before the file is saved, and that's where the above failure occurs for me.

I tried int8 and float16 too. Both fail initially, unless I add the line:

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]

Without this line, these two options fail with the error:

Some of the operators in the model are not supported by the standard TensorFlow Lite runtime. If those are native TensorFlow operators, you might be able to use the extended runtime by passing --enable_select_tf_ops, or by setting target_ops=TFLITE_BUILTINS,SELECT_TF_OPS when calling tf.lite.TFLiteConverter(). Otherwise, if you have a custom implementation for them you can disable this error with --allow_custom_ops, or by setting allow_custom_ops=True when calling tf.lite.TFLiteConverter(). Here is a list of builtin operators you are using: CONCATENATION, CONV_2D, EXP, LEAKY_RELU, LOG, LOGISTIC, MAX_POOL_2D, MUL, PACK, PAD, RESHAPE, RESIZE_NEAREST_NEIGHBOR, SHAPE, SPLIT_V, STRIDED_SLICE, TANH. Here is a list of operators for which you will need custom implementations: AddV2.

This looks similar to the original issue of this thread. Perhaps you're using int8 instead of full_int8? If so, you could try inserting this supported_ops line to allow custom ops.

Continuing, int8 and float16 produce a .tflite file. But then when the convert_tflite script calls demo(), it fails on the first line:

Traceback (most recent call last):
File "convert_tflite.py", line 137, in
app.run(main)
File "/home/user1/anaconda3/envs/tf2p1/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/user1/anaconda3/envs/tf2p1/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "convert_tflite.py", line 133, in main
demo()
File "convert_tflite.py", line 112, in demo
interpreter = tf.lite.Interpreter(model_path=FLAGS.output)
File "/home/m/anaconda3/envs/tf2p1/lib/python3.7/site-packages/tensorflow_core/lite/python/interpreter.py", line 209, in init
model_path, self._custom_op_registerers))
ValueError: Input array not provided for operation 'reshape'.

Looks like something is wrong with the .tflite file, and that's why I think the failure for all cases is happening in the toco_convert_impl(...) step.

What do you guys think?

@in-die-nibelungen
Copy link

@sterlingrpi ,

Where did you get your dataset?

I'm using mscoco dataset. Followed the instructions written in the README (here).

Just to be sure, are you doing the full int8 quantization and setting the flag --quantize_mode full_int8?

Yes, I'm doing full integer quantization.
Here shows the command (actually, --model yolov4 is not needed):

python convert_tflite.py \
    --weights ./data/yolov4.weights \
    --output ./data/yolov4-full_int.tflite \
    --quantize_mode full_int8 \
    --model yolov4 \
    --dataset /full_path_to/val2017.txt

My env is:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.4 LTS
Release:        18.04
Codename:       bionic

$ python --version
Python 3.6.9

$ pip freeze | grep tensorflow
tensorflow==2.1.0
tensorflow-addons==0.9.1
tensorflow-estimator==2.1.0

Have you confirmed that representative_data_gen() is called? If it's called, the problem will be in reading image files, e.g. the paths in the file are invalid, or preprocessing the images, e.g. the format of your dataset is different from coco's.

@mm7721
Copy link

mm7721 commented May 28, 2020

I'm using the coco dataset too, and my setup is:

Ubuntu 18.04
python 3.7.7
tensorflow 2.1.0
tensorflow-addons 0.9.1

Frustrating that we're all doing essentially the same thing but getting different errors (though it's no big surprise with tensorflow). Could you try with int8 and/or float16 and see what happens? This takes representative_data_gen completely out of the picture.

@in-die-nibelungen
Copy link

Hi, @mm7721

Could you try with int8 and/or float16 and see what happens? This takes representative_data_gen completely out of the picture.

The both of int8 and float16 quantization worked well on my env. I forgot to say, but I'm working with Ubuntu on WSL.

And, this wouldn't help you, but attaching the log (int8_float16.zip ).

@mm7721
Copy link

mm7721 commented May 28, 2020

@in-die-nibelungen
Thanks for the log - it's very helpful. Looks like model.summary() is producing the same result for both of us (based on a quick visual scan). However, one difference is that your code is executing on a CPU, wheareas mine is using multiple GPUs. I suspect that's the root of the problem (I've seen several Toco related issues in the past on GPUs, on cases that worked fine on CPUs). Will try to disable GPUs for this script and see if that works.

@sterlingrpi
For your original problem, I notice that the latest checkin for convert_tflite.py (16 days ago) mentions an issue with the RESIZE_NEAREST_NEIGHBOR op. Perhaps you could pull the latest version from the repo and give it another try?

@mm7721
Copy link

mm7721 commented May 28, 2020

@in-die-nibelungen
Unfortunately, the issue isn't GPU vs. CPU - the same failures occur on CPU. I also tried reverting to python 3.6.9 to match your setup, and no luck. I'm starting to think the difference is in the code contained in this repo. Is it possible for you to post your copies of convert_tflite.py and yolov4.py?

@sterlingrpi
Copy link
Author

@mm7721 I just pulled latest and can confirm that it does successfully convert and save the full int8 tflite file. But now just like you it fails on the first line of demo, although with a different error:

ValueError: Didn't find op for builtin opcode 'LEAKY_RELU' version '2'

This is running TF 2.2. I've been trying all different versions, but not since getting the latest. I might try 2.1 now.

@sterlingrpi
Copy link
Author

sterlingrpi commented May 28, 2020

I think I'm going to continue trying to place replace all the operations that are not supported. This will require me to retrain in another project I am working on.

@in-die-nibelungen
Copy link

@mm7721 ,

Is it possible for you to post your copies of convert_tflite.py and yolov4.py?

Unfortunately, no change made on the codes in the repo. The HEAD is 9483cafbf541312f70e7ab635e2b7b3ab96d5001.

@sterlingrpi
Copy link
Author

Well that was easier than expected. I only had to change one line and the whole thing runs now. I replaced leaky_relu operation with regular relu in core/common.py on line 36:

before:
conv = tf.nn.leaky_relu(conv, alpha=0.1)
after:
conv = tf.nn.relu(conv)

This is still with TF 2.2. I like using the current version of things. I'm going to test out latency on RPi, which is what my project is running on. I might need to switch to v3 or tiny to get the latency I need. Then ultimately I will need to retrain on my dataset and with the operation change from leaky_relu to regular relu.

@sterlingrpi
Copy link
Author

Tragic. yolov4 latency is 16 secs on RPi and slows to 18 secs after the CPU heats up, even with a heat sink. Yolov3 tiny (is there a v4 tiny?) latency is 1.1 sec all day. So that's what I'll be going with. I might also try converting the inputs and outputs to int8 to help speed. But might start another thread on that one as it's another topic.

@mm7721
Copy link

mm7721 commented May 30, 2020

@sterlingrpi

Glad you made progress. I did too, by switching to a different machine. There are a few differences in the setup of this machine, but mostly it's the same (ubuntu 18.04, anaconda, TF 2.1/2.2, etc). No clue why, but my seg fault is gone...mysterious, but at least I have something working.

And I see exactly the same thing as you: it fails on leaky relu. As TF's documentation states, they only support relu and relu6 for TF Lite models. So your change to relu enables it to pass - that's great. However, it breaks the model. You can see this by running the original detection on the kite image with the updated common.py:

python detect.py --weights data/yolov4.weights --framework tf --size 608 --image ./data/kite.jpg

Nothing is detected now. Not sure what the best solution is, but one thing I'm going to investigate is training the original Darknet model with relu (or even better, relu6). Will let you know if I can get that working.

Also wanted to point out that "full_int8" quantization as defined in convert_tflite.py is not actually full integer quantization. It's only quantizing the weights (hence why you see the tflite file size at about 25% of the original weights file). But the input and activations are still fp32. There are some extra lines that can force additional quantization, but I haven't tried to get it working yet.

@sterlingrpi
Copy link
Author

@mm7721 thank you for confirming that changing leaky_relu to relu broke the model. I'm not surprised though and anticipated needing to retraining. Hopefully it won't be too far off and train pretty quickly and with similar accuracy.

Do you recommend using the using the darknet repo for training? What's your process flow like? I looked at the darknet training instuctions and it looked intense. You'll have to change the leaky_relu operator there as well.

Currently I am looking to train in this repo. But train.py doesn't seem to support tiny too well. I get an error where the prediction shape doesn't match the model shape. I suspect the prediction shape is different for yolo tiny than regular yolo.

@sterlingrpi
Copy link
Author

I had to do some really crude hacks but I got tiny yolo training in this repo. First I commented out all the freezing layers because that didn't work. Then I changed the strides in cfg file. I doubled the values here. Then there ought to only be two boxes predicted per zone in tiny yolo. So for the optimizing process stages in train.py I change "for i in range(3):" to "for i in range(2):". Had to do this in two places. Not ideal, but at least it working.

@sterlingrpi
Copy link
Author

Also wanted to point out that "full_int8" quantization as defined in convert_tflite.py is not actually full integer quantization. It's only quantizing the weights (hence why you see the tflite file size at about 25% of the original weights file). But the input and activations are still fp32. There are some extra lines that can force additional quantization, but I haven't tried to get it working yet.

You are right about the inputs and outputs still being float32. I have changed those to int8 before for another project. However, you have to scale the input data from 0-255 instead of 0-1, otherwise there isn't enough dynamic range to learn from. Also labels should be 0 or 255. Here's the commands I used:
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

It's my understanding that when you set the flag converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] you are getting 8 bit weights AND 8 bit activations. While regular tflite is only 8 bit weights, which makes the file smaller. Quantizing the activations has made the models even faster in my experience.

@mm7721
Copy link

mm7721 commented May 31, 2020

@sterlingrpi
I'd like to train in this current repo too, but I noted this comment from the author:
The training performance is not fully reproduced yet, so I recommended to use Alex's Darknet to train your own data, then convert the .weights to tensorflow or tflite. So I'm sticking with darknet for now.

Setting up darknet is a bit of a pain - have to get a bunch of dependencies in place, like CUDA, the full OpenCV version, etc. I still don't have everything working perfectly, but can run training and single image detection (video detection isn't working yet). Have already retrained with relu6 in place of relu_leaky, imported the results into the current repo, and evaluated some images/videos. Performance looks fairly similar on the small set I tested. There are some new issues with "full int8" conversion to TF Lite, which I haven't debugged yet...but I was able to use int8 mode.

Regarding TF Lite conversion, your comments make sense and align with my understanding. What worries me is that there are lots of posts out there suggesting the conversion process doesn't always work as expected...including one post suggesting it's impossible to get everything into int8 if you're doing post-training quantization like we are. Not to mention the fact that it's really hard to debug TF Lite models. If you get a fully int8 quantized version working, could you share your code?

So far I haven't explored yolo-tiny, but I'll keep your comments in mind. Thanks.

@sterlingrpi
Copy link
Author

@mm7721 I agree the tflite conversion is a bit of a black box. But I think the full int is doing something. I got 16 sec latency with yolov4 on my RPi. While a regular tflite yolov3 took over 60 sec. So I believe it is replacing quite a few float 32 calculations with 8 bit.

I'm happy to share any of the code I did on this. But I think I shared all the steps I took to get to this part. Is there anything specific that would help? I even got training to work and loaded the trained weights into the demo. Still have to validate it for accuracy yet.

@mm7721
Copy link

mm7721 commented May 31, 2020

@sterlingrpi That's a nice speedup factor. So yeah, you're right, full_int8 is definitely doing something different. Viewing the two networks, int8 ends up with no quantize/dequantize operations, whereas full_int8 ends up with lots of them. For example, every mish activation is built as:
int8: -->exp-->add-->log-->tanh-->
full_int8: -->dequantize-->exp-->quantize-->add-->dequantize-->log-->quantize-->tanh-->

Lots of other examples of -->dequantize-->op-->quantize in the full_int8 network too. Not very pretty. My ultimate goal is a model which is integer quantized from input to output, without all these intermediate operations.

With my current version that uses relu6, the full_int8 version converts successfully and saves to a tflite file, but it crashes on the interpreter.allocate_tensors() line in demo(). Not sure why yet (it's one of those pesky Fatal Python Error: Aborted messages).

@a-rich
Copy link

a-rich commented Jun 3, 2020

@sterlingrpi would you mind sharing any changes you made in convert_tflite.py?
I made the tf.nn.leaky_relu -> tf.nn.relu change in an attempt to replicate what you've done, but my output model still takes 60+ seconds -- not sure how you're getting that 4x speed up :/

@Namburger
Copy link

Namburger commented Jun 4, 2020

humnn, I'm also having this problem where my yolov4 is clocking at 55s per inference after doing a full int8 conversion
:(
[edit] with just a normal tflite convertion without full int quantization, I can run inference in about 5 seconds.

@sterlingrpi
Copy link
Author

@a-rich are you sure you are doing full int8 quantization by setting the flag --quantize_mode full_int8?

@Namburger
Copy link

@sterlingrpi not sure about @a-rich , but here is a quick reference of my ~57 inference :(
quantized
fyi using tf2.1

@Namburger
Copy link

Namburger commented Jun 4, 2020

@mm7721

Also wanted to point out that "full_int8" quantization as defined in convert_tflite.py is not actually full integer quantization. It's only quantizing the weights (hence why you see the tflite file size at about 25% of the original weights file). But the input and activations are still fp32. There are some extra lines that can force additional quantization, but I haven't tried to get it working yet.

FYI, unfortunately,

converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

is deprecated for tf2.x, at least for now, so it is expected to see IO to be float with some quantized/dequantized ops

@Ekta246
Copy link

Ekta246 commented Jun 9, 2020

I'm attempting the same thing, but mine died with a seg fault before reaching that point. Traced back to line 51 of /site-packages/tensorflow_core/lite/python/optimize/calibrator.py. Did you encounter that issue too?

No, I don't think I ran into that problem. Could you post your error?

I forgot to mention that I originally had an error that LeakRelu was not supported by full 8 bit quantization. I changed it to regular Relu, retrained, and got past that error when quantizing. I now think my new error is because the upsampling operation of Yolo is not supported. see core.common.upsample which uses the function tf.image.resize. Not sure what I can replace that with yet

Hi,
As I see you have mentioned about the unsupported functions/activations/operations, you replace them and then retrain it.
I wanted to ask you if I want to replace the operations, take the pre-trained weights, do you think it is possible to avoid the re-training of the model?

@Ekta246
Copy link

Ekta246 commented Jun 9, 2020

Also, does anyone of you mind sharing your representative_datset_gen function?

@sterlingrpi
Copy link
Author

sterlingrpi commented Jun 11, 2020

Also, does anyone of you mind sharing your representative_datset_gen function?

def representative_data_gen():
fimage = open(FLAGS.dataset).read().split()
for input_value in range(100):
if os.path.exists(fimage[input_value]):
original_image=cv2.imread(fimage[input_value])
original_image = cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)
image_data = utils.image_preporcess(np.copy(original_image), [FLAGS.input_size, FLAGS.input_size])
img_in = image_data[np.newaxis, ...].astype(np.float32)
print(input_value)
yield [img_in]
else:
continue

sorry, don't know how to make github take the indentations :-(

@Ekta246
Copy link

Ekta246 commented Jun 12, 2020 via email

@raryanpur
Copy link

raryanpur commented Jul 27, 2020

Tragic. yolov4 latency is 16 secs on RPi and slows to 18 secs after the CPU heats up, even with a heat sink. Yolov3 tiny (is there a v4 tiny?) latency is 1.1 sec all day. So that's what I'll be going with. I might also try converting the inputs and outputs to int8 to help speed. But might start another thread on that one as it's another topic.

@sterlingrpi I'm very surprised yolov4 is this slow after quantization running on ARM silicon - which RPi are you running on? Out of curiosity, are you using TF for Python to inference or C++? Have you been able to benchmark your application using the TFLite benchmarking tool or are you using a custom application?

For reference, on a dual-core 1.3GHz ARMv8-based SOC I'm able to get 20 seconds per inference on the 416 x 416 model without quantization. Am eager to benchmark on this setup with an int8 quantized yolov4, but alas quantization remains broken in this repo so I haven't been able to generate a model for testing.

Using the benchmarking tool on the float model, ~50% of the inferencing time is spent in CONV2D ops. I believe these would be quantized, hence able to take advantage of ARMs accelerations. Because of this, would also expect a sizeable speedup on the RPi for the quantized vs the float model.

@ixtiyoruz
Copy link

ixtiyoruz commented Jul 30, 2020

Also, does anyone of you mind sharing your representative_datset_gen function?

def representative_data_gen():
fimage = open(FLAGS.dataset).read().split()
for input_value in range(100):
if os.path.exists(fimage[input_value]):
original_image=cv2.imread(fimage[input_value])
original_image = cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)
image_data = utils.image_preporcess(np.copy(original_image), [FLAGS.input_size, FLAGS.input_size])
img_in = image_data[np.newaxis, ...].astype(np.float32)
print(input_value)
yield [img_in]
else:
continue

sorry, don't know how to make github take the indentations :-(

you can use pre operator in html

@ichakroun
Copy link

I'm attempting the same thing, but mine died with a seg fault before reaching that point. Traced back to line 51 of /site-packages/tensorflow_core/lite/python/optimize/calibrator.py. Did you encounter that issue too?

No, I don't think I ran into that problem. Could you post your error?

I forgot to mention that I originally had an error that LeakRelu was not supported by full 8 bit quantization. I changed it to regular Relu, retrained, and got past that error when quantizing. I now think my new error is because the upsampling operation of Yolo is not supported. see core.common.upsample which uses the function tf.image.resize. Not sure what I can replace that with yet

Did you manage to replace the resample function? I have this error "RuntimeError: Quantization not yet supported for op: EXP"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants