Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to get the predictions from the output_details #128

Open
azizHakim opened this issue Jul 5, 2020 · 1 comment
Open

how to get the predictions from the output_details #128

azizHakim opened this issue Jul 5, 2020 · 1 comment

Comments

@azizHakim
Copy link

After converting the yolov3.eights into tflite format I am getting the output_details looks like this:

[{'name': 'Identity', 'index': 0, 'shape': array([ 1, 22743, 4], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0)}, {'name': 'Identity_1', 'index': 1, 'shape': array([ 1, 22743, 80], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0)}]

It seems like I am getting 2 output lists with shape [1, 22743, 4] and [1, 22743, 80]
How do I get bounding boxes and labels from this output.
Also, is it the proper output shape? I was expecting 4 output lists, bbox, labels, scores, and number of bbox.

@raryanpur
Copy link

raryanpur commented Jul 25, 2020

I am not sure where the 22743 is coming from. In my case, the model outputs two tensors of shape [1, 10647, 4], and [1, 10647, 80]. My understanding of these two tensors is,

First tensor

[1] - don't worry about this

[10647] - the model generates predictions for 3 bounding boxes on 3 different grid patterns of 13 x 13, 26 x 26, and 52 x 52. The model is therefore generating 3 * (13 * 13 + 26 * 26 + 52 * 52) = 10647 bounding boxes for a given input image.

[4] - four elements of the bounding box vector: center x, center y, width, height. (0, 0) is the top left corner of the image.

Second tensor

[1] - don't worry about this

[10647] - the model generates one prediction for 3 bounding boxes on 3 different grid patterns of 13 x 13, 26 x 26, and 52 x 52. The model is therefore generating (13 * 13 + 26 * 26 + 52 * 52) = 10647 predictions for a given input image.

[80] - number of prediction classes per bounding box, in the case of the coco dataset this repo uses, there are 80 classes.

You can process the second tensor by thinking about it like a 851,760 element vector (10647 * 80). Each "stride" of 80 elements represents the class predictions for the bounding box, whose corresponding bounding box coordinates are in the first tensor.

To process this output into the familiar "boxes with coordinates and class predictions", one approach is to do score threshold filtering and non-max suppression. This is what detect.py does in this repo. This will get rid of class predictions below a certain score that you define, and then take the box with the highest score in a given geometric area per criteria defined by a non-max suppression algorithm (IoU, or Intersection over Union, is an algorithm used for this purpose). The resulting set of bounding boxes and class predictions (one class prediction per bounding box) is then the familiar "boxes with coordinates and class predictions" output.

See here for a helpful tutorial on how to think about this:
https://towardsdatascience.com/guide-to-car-detection-using-yolo-48caac8e4ded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants