Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to preload a model into the Coral Edge TPU and keep it there for some time? #522

Closed
christianbaun opened this issue Jan 7, 2022 · 10 comments
Assignees
Labels
comp:model Model related isssues Hardware:USB Accelerator Coral USB Accelerator issues subtype:ubuntu/linux Ubuntu/Linux Build/installation issues type:support Support question or issue

Comments

@christianbaun
Copy link

For a software project, I want to do object detection with many individual images over some time (hours/days).

When using the Coral Edge TPU Accelerator (USB) on a RasPi 4 (4 GB) with the Raspberry Pi OS (I tested versions with Debian 10 or Debian 11), the performance for analyzing single images by just using the CPU is faster compared with using the Coral TPU.

Here, some measurements of this code

Without the Coral USB Accelerator:

$ time python3 TFLite_detection_image_modified.py \
--modeldir=/home/pi/model_2021_07_08 \
--graph=detect.tflite \
--labels=/home/pi/model_2021_07_08/labelmap.txt \
--image=testimage.jpg

real	0m1,174s
user	0m1,236s
sys	0m0,754s

With the Coral USB Accelerator:

$ time python3 TFLite_detection_image_modified.py \
--modeldir=/home/pi/model_2021_07_08 \
--graph=detect_edgetpu.tflite \
--labels=/home/pi/model_2021_07_08/labelmap.txt \
--edgetpu \
--image=testimage.jpg 

real	0m3,831s
user	0m1,118s
sys	0m0,729s

I also tried a loop over 170 images and the result was a real time less than 2 minutes without the Coral Edge TPU Accelerator (USB) compared with more than 10 minutes when using the Coral TPU.

The reason is probably as mentioned here and on other places like here: the first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory. Thus when using the Coral USB Accelerator to several image files, the model is loaded into the TPU memory for every image file. This overhead does occur just a single time when a video file or stream is processed.

If this is the root cause for the bad performance, I see only two possible solutions:

  1. Copy the model into the Coral TPU memory in advance and avoid loading it for every image (is this possible?) or
  2. Provide a stream instead of single images.

If these assumptions are correct, is using the Coral TPU useful at all for analyzing single images?

Is it possible to speed up the process of loading the model into Edge TPU memory?

Can I load a model into the Edge TPU memory and keep it there for some time?

Thanks for any help.

@google-coral-bot google-coral-bot bot added comp:model Model related isssues Hardware:USB Accelerator Coral USB Accelerator issues labels Jan 7, 2022
@hjonnala
Copy link
Contributor

hjonnala commented Jan 7, 2022

Yes, It is possible to load a model into the Edge TPU. The script you are using also loading model into interpreter for only once when passing image directory as input. Can you please share the detect.tflite model you are using. Thanks!

@christianbaun
Copy link
Author

christianbaun commented Jan 7, 2022

But I never execute the script for a group of images. Only for a single image. The most recent one.

The workflow is: I create an image with the camera and execute the script for object detection for exactly this new image. And I do this this in an infinte loop.

The question is: Can I load the model into the Edge TPU memory and keep it there for some time?

And can I do object detection for a single image file without loading the model prior into the Edge TPU memory by telling to use the already uploaded one?

@hjonnala
Copy link
Contributor

hjonnala commented Jan 7, 2022

Okay, I think, It is not possible to preload a model into the Coral Edge TPU and keep it there for some time.

We can pass images in image directory or stream of images to avoid loading it for every image.

Feel free to check the streaming examples from this repo: https://github.com/google-coral/examples-camera. Thanks!

@hjonnala hjonnala added type:support Support question or issue subtype:ubuntu/linux Ubuntu/Linux Build/installation issues labels Jan 7, 2022
@christianbaun
Copy link
Author

christianbaun commented Jan 7, 2022

But this way (if I collect a number of images and do the object detection when I have a number X of images available), it is no longer "real time". I still hope there is another solution that includes working with single images but still performs well.

If in the end object recognition is not possible without the prior upload of a model, then it is likely that the Coral TPU cannot be used to work efficiently with individual images (one image per process).

BTW: The model I use is here now.

@hjonnala
Copy link
Contributor

hjonnala commented Jan 10, 2022

The total script execution is more for single image when running edgeptu model is because of the time taken to make edgeptu interpreter line vs cpu tflite interpeter.

You can check the inference time difference by calculating time taken for this line.

ref: https://github.com/google-coral/pycoral/blob/master/examples/classify_image.py#L110

@cdrose
Copy link

cdrose commented Jan 17, 2022

@christianbaun From what I can gather you have some other process which is saving images into a folder and then you trigger your python script to run inference on the images in that folder? Instead of creating a new edgetpu interpreter every time, you need to have a long running script that creates the interpreter, loads the model then loops indefinitely processing images as they become available.

@christianbaun
Copy link
Author

@cdrose This is a very good idea. Probably the approach that requires the least change to my existing software. I will try this way.

@christianbaun
Copy link
Author

@hjonnala As you suggested, I did some timing tests to find out where most time is lost, and responsible is not the line where the invoke-operation happens:

interpreter.invoke()

This operation (here) costs approx. 2.75 seconds on a Raspberry Pi 4:

    interpreter = Interpreter(model_path=PATH_TO_CKPT,
                              experimental_delegates=[load_delegate('libedgetpu.so.1.0')])

This finding does not change anything (I still need to rewrite my code), but it is worth knowing.

@hjonnala
Copy link
Contributor

Feel free to reopen if you have any questions. Thanks!

@google-coral-bot
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:model Model related isssues Hardware:USB Accelerator Coral USB Accelerator issues subtype:ubuntu/linux Ubuntu/Linux Build/installation issues type:support Support question or issue
Projects
None yet
Development

No branches or pull requests

3 participants