You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems like I am getting 2 output lists with shape [1, 22743, 4] and [1, 22743, 80]
How do I get bounding boxes and labels from this output.
Also, is it the proper output shape? I was expecting 4 output lists, bbox, labels, scores, and number of bbox.
The text was updated successfully, but these errors were encountered:
I am not sure where the 22743 is coming from. In my case, the model outputs two tensors of shape [1, 10647, 4], and [1, 10647, 80]. My understanding of these two tensors is,
First tensor
[1] - don't worry about this
[10647] - the model generates predictions for 3 bounding boxes on 3 different grid patterns of 13 x 13, 26 x 26, and 52 x 52. The model is therefore generating 3 * (13 * 13 + 26 * 26 + 52 * 52) = 10647 bounding boxes for a given input image.
[4] - four elements of the bounding box vector: center x, center y, width, height. (0, 0) is the top left corner of the image.
Second tensor
[1] - don't worry about this
[10647] - the model generates one prediction for 3 bounding boxes on 3 different grid patterns of 13 x 13, 26 x 26, and 52 x 52. The model is therefore generating (13 * 13 + 26 * 26 + 52 * 52) = 10647 predictions for a given input image.
[80] - number of prediction classes per bounding box, in the case of the coco dataset this repo uses, there are 80 classes.
You can process the second tensor by thinking about it like a 851,760 element vector (10647 * 80). Each "stride" of 80 elements represents the class predictions for the bounding box, whose corresponding bounding box coordinates are in the first tensor.
To process this output into the familiar "boxes with coordinates and class predictions", one approach is to do score threshold filtering and non-max suppression. This is what detect.py does in this repo. This will get rid of class predictions below a certain score that you define, and then take the box with the highest score in a given geometric area per criteria defined by a non-max suppression algorithm (IoU, or Intersection over Union, is an algorithm used for this purpose). The resulting set of bounding boxes and class predictions (one class prediction per bounding box) is then the familiar "boxes with coordinates and class predictions" output.
After converting the yolov3.eights into tflite format I am getting the output_details looks like this:
[{'name': 'Identity', 'index': 0, 'shape': array([ 1, 22743, 4], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0)}, {'name': 'Identity_1', 'index': 1, 'shape': array([ 1, 22743, 80], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0)}]
It seems like I am getting 2 output lists with shape [1, 22743, 4] and [1, 22743, 80]
How do I get bounding boxes and labels from this output.
Also, is it the proper output shape? I was expecting 4 output lists, bbox, labels, scores, and number of bbox.
The text was updated successfully, but these errors were encountered: