New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected inference time on the object detection model - >4s? #259

Closed
chadwallacehart opened this Issue Jan 19, 2018 · 9 comments

Comments

Projects
None yet
4 participants
@chadwallacehart
Copy link

chadwallacehart commented Jan 19, 2018

I did some tests and it take a little over 4 seconds to process a single frame with the built in object detection model. I realize it is not completely the same, but I was hoping to get performance somewhat like what Intel published for their Movidius Neural Compute stick using MobileNets here: https://software.intel.com/en-us/articles/mobilenets-on-intel-movidius-neural-compute-stick-and-raspberry-pi-3 .

Is this expected behavior? Is there a way to speed up inference with this build-in model?

Also, what does the duration_ms time represent in the inference.run response?

For reference, I used the face_camera_trigger.py demo app to generate an image and the following code to see how quickly I could repeatedly process that image. Here is the main module:

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--input', '-i', dest='input', required=True)
    parser.add_argument('--output', '-o', dest='output')
    args = parser.parse_args()

    image = Image.open(args.input)
    image_center, offset = _crop_center(image)
    draw = ImageDraw.Draw(image)

    with ImageInference(object_detection.model()) as inference:
        print("Object detection model loaded")

        while True:
            try:
                start = time()
                result = inference.run(image_center)
                print("Total time %s seconds. Frame time %s ms" % (time() - start, result.duration_ms))

            except KeyboardInterrupt:
                break

if __name__ == '__main__':
    main()

This was after I tried a similar procedure with 'CameraInference(object_detection.model())`, which is my real goal.

@PeterMalkin

This comment has been minimized.

Copy link
Collaborator

PeterMalkin commented Jan 19, 2018

Chad,

The implementation of the inference run itself for Movidius should be slightly faster than on the neural compute stick - we have written the inference engine from scratch, improving on runtime in certain places.

To understand what's happening here, you may need to know a bit more how the bonnet is wired. We have developed the board for the purpose of analyzing the images that are being captured by the camera. As such, we implemented a MIPI passthrough interface. It means that VisionBonnet is "listening in" on the image traffic that goes between the raspberry pi and PiCam2, and is capable of running inference on those pictures, without slowing down raspberry pi. In fact, raspberry pi does not even know that there's someone "listening in" on the pixel traffic between it and the camera.

However, for this to work we also need to have another channel of communication between the VisionBonnet and raspberry pi - to transmit the network file, and deliver the results of inference. Raspberry pi is very limited when it comes to fast buses, so we utilized SPI bus that takes a few of pins on the 40-pin header. This bus runs fairly slow in comparison to PCIe, USB3 or any other fast interface. But this is the only one available on the raspberry pi right now.

So in case of CameraInference() - the network is transmitted only once, after which we run inference on the pictures that are coming from the camera. However, in case of the example you have provided here, you actually transmit your image over the SPI bus to the Movidius chip on the Vision Bonnet each time you run the inference. And I am sure the majority of the time is taken by this transmission.

I am curious what is the resolution you use for your image? Have you tried timing the object_detection_camera.py example?

@dmitriykovalev

This comment has been minimized.

Copy link
Collaborator

dmitriykovalev commented Jan 19, 2018

Chad,

duration_ms is the real inference time without any overheads (like compute graph transfer time or image transfer time). We are still working hard on the Python API code. Some features are still experimental and not exposed publicly.

@chadwallacehart

This comment has been minimized.

Copy link

chadwallacehart commented Jan 22, 2018

@PeterMalkin - thanks for the process flow explanation. Everything makes a lot more sense now.

In my static image tests, I was using an image from the face_camera_trigger.py example - it was set to 1640x922.

I did my tests again using CameraInference() this time, but I am still getting around 1 frame processed every 2 seconds. Still far from running in real time. Also, python on the Pi ZeroW CPU runs at 90% while running the CameraInference - is that normal for it to be so high? I expected less if all the heavy lifting is run on the bonnet.

Here is my code:

from time import time

from picamera import PiCamera

from aiy.vision.inference import CameraInference
from aiy.vision.models import object_detection


def main():

    with PiCamera() as camera:
        camera.sensor_mode = 4
        camera.resolution = (1640, 922)  #  1232 is Full height (Camera v2), 922 is 16:9 
        camera.framerate = 30
        camera.start_preview(fullscreen=True) #so I can see what's going on

        last_time = time()
        with CameraInference(object_detection.model()) as inference:
            print("Object detection model loaded after %s" % (time() - last_time))

            # run until ctrl-c
            while True:
                try:
                    last_time = time()
                    for result in inference.run():
                        for object in object_detection.get_objects(result, 0.3):
                            print(object)
                        now = time()
                        print("Total process time: %s " % (now - last_time))
                        last_time = now

                except KeyboardInterrupt:
                    break

        camera.stop_preview()

if __name__ == '__main__':
  main()

Here is the output:

(env) pi@aiyvisionpi:~/code $ python object_detection_camera_timer.py 
Object detection model loaded after 0.33312249183654785
Total process time: 2.108851194381714 
Total process time: 2.109384298324585 
Total process time: 2.1128971576690674 
Total process time: 2.1531147956848145 
Total process time: 2.0657546520233154

Removing the get_objects call in lines 26 & 27 sped it up by around 300ms. Dropping the resolution did not make much difference, but should be the case if the bonnet is feeding direct from the camera. Maybe I should adjust the camera.sensor_mode to run at a lower resolution? Does the Bonnet care what mode the camera is set to?

I am in the middle of an reinstall after downloading the repo updates from a few days ago - I'll give the new object_detection_camera.py a try tomorrow.

@chadwallacehart

This comment has been minimized.

Copy link

chadwallacehart commented Jan 22, 2018

The update just finished (cloned the repo, deleted ~/AIY-projects-python/ copied the repo to ~/AIY-projects-python/, then ran ~/AIY-projects-python/scripts/install-deps.sh), so I gave it a quick try.

It looks like some improvements were made to the object detector model in the upgrade - now I am getting 2-3 FPS with the same code:

(env) pi@aiyvisionpi:~/code $ python object_detection_camera_timer.py 
Object detection model loaded after 0.32370591163635254
kind=PERSON(1), score=0.672224, bbox=(343, 40, 731, 799)
Total process time: 0.41234612464904785 
kind=PERSON(1), score=0.802710, bbox=(349, 47, 727, 731)
Total process time: 0.3784775733947754 
kind=PERSON(1), score=0.830594, bbox=(349, 50, 722, 725)
Total process time: 0.4440138339996338 
kind=PERSON(1), score=0.821044, bbox=(349, 44, 718, 729)
Total process time: 0.4538760185241699 
kind=PERSON(1), score=0.836154, bbox=(342, 45, 725, 727)
Total process time: 0.37234044075012207 
kind=PERSON(1), score=0.826151, bbox=(351, 49, 710, 724)

The python process is now running at 75-85% of the CPU, a little less than last time too.

@dmitriykovalev

This comment has been minimized.

Copy link
Collaborator

dmitriykovalev commented Jan 22, 2018

@chadwallacehart, what's the value of duration_ms in your latest test?

@chadwallacehart

This comment has been minimized.

Copy link

chadwallacehart commented Jan 22, 2018

@dmitriykovalev - it is consistently 36 ms, just like before.

mbrooksx pushed a commit that referenced this issue Feb 27, 2018

Optimize python side of code to alleviate issue #259.
This gets average per inference time from 420ms to 155ms.
This also reduces CPU usage from 80% to 55%.
This has comparable performance as using numpy, thanks to Dmitry.

Change-Id: Ie3f2773e2725fc79deaa4ddd8d059941a2be300b
@weiranzhao

This comment has been minimized.

Copy link
Contributor

weiranzhao commented Feb 28, 2018

Chad, Thanks for bringing this to our attention. We improved our python side post processing to reduce latency to 155 ms per inference. This also reduces CPU usages. We are working on an improved protocol to reduce latency, it will be available at later release.

@chadwallacehart

This comment has been minimized.

Copy link

chadwallacehart commented Mar 3, 2018

@weiranzhao - thanks for the update. I just loaded your recent commit. Using my code above, I can confirm I see a faster inference inline with your measurement:

kind=PERSON(1), score=0.503174, bbox=(0, 133, 444, 781)
Total process time: 0.14915060997009277 seconds. Bonnet inference time: 36 ms 
kind=PERSON(1), score=0.700792, bbox=(156, 132, 707, 774)
Total process time: 0.1789381504058838 seconds. Bonnet inference time: 36 ms 
kind=PERSON(1), score=0.623835, bbox=(166, 120, 695, 787)
Total process time: 0.14641261100769043 seconds. Bonnet inference time: 36 ms 
kind=PERSON(1), score=0.618723, bbox=(166, 119, 697, 789)
Total process time: 0.15200567245483398 seconds. Bonnet inference time: 36 ms
@chadwallacehart

This comment has been minimized.

Copy link

chadwallacehart commented Mar 23, 2018

@weiranzhao, @dmitriykovalev, @PeterMalkin - just for reference, here is a write-up on my project: https://webrtchacks.com/aiy-vision-kit-uv4l-web-server/

And a summary of the performance:

Mode UV4L CPU Inference time FPS
face off 20% 0.737 ~13
face on 80% 0.743 ~13
object off 61% 0.138 ~7
object on 94% 0.37 ~3

I still have some work to do to reduce CPU usage. When I run it for a while and the CPU heats up, the subsequent throttling causes a big delay but it works well before that point.

Thank you for your help here. I will close this issue unless you think there is value in keeping it open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment