-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow quantized tflite model inference #138
Comments
Hi! |
Info from compiler about ops mapped to tpu:
Inference time using:
|
@mieszkokl |
Input tensor shape is (1, 224, 224, 3). |
@mieszkokl sorry, but it just seems really odd to me that the tflite model performs worse than the original graph model. Our compiler can only delegates from the CPU's tflite model so we can't really deal with performance drops from the original graph file. [Edit] I've follow up with some of my own findings on that issues: Closing this issue for now since I don't see this as a bug on our side. |
Hi Namburger, I am testing the performance/throughput of fp32 and quantized models on my platform. My TF versions are as follows:
The results on: FP32 on CPU
And results for
Observations: The performance of FP32 model is almost double than INT8 models on CPU, but Google TensorFlow lite benchmarking mentions the opposite: https://www.tensorflow.org/lite/guide/hosted_models#quantized_models I also tried replacing the models from the models present in above Hosted location, but the harness gives the similar results. Do I understand correctly that, this observation is obvious? Could you let me know, where could go wrong? Thanks |
I have trained keras model which is semantic segmentation FPN with ResNet101 backbone(using https://github.com/qubvel/segmentation_models). I want to deploy it on Coral Dev Board.
Firstly I've converted it to tflite using:
Output from conversion:
Then I've compiled it to edge TPU using edge_compiler:
Logs from compiler:
Single frame inference on Coral Dev Board using TPU takes ~4s. Is it normal that it's so slow? What can I do to make it faster?
I've also created an issue in Tensorflow repository, because converted tflite model inference on PC using CPU takes a lot of time, but finally they forwarded me here.
The text was updated successfully, but these errors were encountered: