how to get the predictions from the output_details #128

azizHakim · 2020-07-05T11:59:28Z

After converting the yolov3.eights into tflite format I am getting the output_details looks like this:

[{'name': 'Identity', 'index': 0, 'shape': array([ 1, 22743, 4], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0)}, {'name': 'Identity_1', 'index': 1, 'shape': array([ 1, 22743, 80], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0)}]

It seems like I am getting 2 output lists with shape [1, 22743, 4] and [1, 22743, 80]
How do I get bounding boxes and labels from this output.
Also, is it the proper output shape? I was expecting 4 output lists, bbox, labels, scores, and number of bbox.

The text was updated successfully, but these errors were encountered:

raryanpur · 2020-07-25T04:52:42Z

I am not sure where the 22743 is coming from. In my case, the model outputs two tensors of shape [1, 10647, 4], and [1, 10647, 80]. My understanding of these two tensors is,

First tensor

[1] - don't worry about this

[10647] - the model generates predictions for 3 bounding boxes on 3 different grid patterns of 13 x 13, 26 x 26, and 52 x 52. The model is therefore generating 3 * (13 * 13 + 26 * 26 + 52 * 52) = 10647 bounding boxes for a given input image.

[4] - four elements of the bounding box vector: center x, center y, width, height. (0, 0) is the top left corner of the image.

Second tensor

[1] - don't worry about this

[10647] - the model generates one prediction for 3 bounding boxes on 3 different grid patterns of 13 x 13, 26 x 26, and 52 x 52. The model is therefore generating (13 * 13 + 26 * 26 + 52 * 52) = 10647 predictions for a given input image.

[80] - number of prediction classes per bounding box, in the case of the coco dataset this repo uses, there are 80 classes.

You can process the second tensor by thinking about it like a 851,760 element vector (10647 * 80). Each "stride" of 80 elements represents the class predictions for the bounding box, whose corresponding bounding box coordinates are in the first tensor.

To process this output into the familiar "boxes with coordinates and class predictions", one approach is to do score threshold filtering and non-max suppression. This is what detect.py does in this repo. This will get rid of class predictions below a certain score that you define, and then take the box with the highest score in a given geometric area per criteria defined by a non-max suppression algorithm (IoU, or Intersection over Union, is an algorithm used for this purpose). The resulting set of bounding boxes and class predictions (one class prediction per bounding box) is then the familiar "boxes with coordinates and class predictions" output.

See here for a helpful tutorial on how to think about this:
https://towardsdatascience.com/guide-to-car-detection-using-yolo-48caac8e4ded

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to get the predictions from the output_details #128

how to get the predictions from the output_details #128

azizHakim commented Jul 5, 2020

raryanpur commented Jul 25, 2020 •

edited

Loading

how to get the predictions from the output_details #128

how to get the predictions from the output_details #128

Comments

azizHakim commented Jul 5, 2020

raryanpur commented Jul 25, 2020 • edited Loading

raryanpur commented Jul 25, 2020 •

edited

Loading