PyramidROIAlign shape question #919

happynewya · 2018-09-10T11:48:14Z

I have a question when I read codes of PyramidROIAlign class in model.py. The first dimension of pooled should be batch, but I didnt get a right result.
I constructed test inputs:

boxes = np.array([
    [
        [0.1, 0.1, 0.3, 0.5],
        [0.1, 0.2, 0.65, 0.9],
        [0.12, 0.32, 0.5, 0.4]
    ],
    [
        [0.12, 0.12, 0.3,   0.53],
        [0.12, 0.23, 0.365, 0.49],
        [0.1,  0.6,  0.4, 0.9]
    ]
], dtype=np.float32)
image_meta = compose_image_meta(123, [512,512,3], [1024,1024,3], (0,0,1023,1023), 2, [1,2,3])
image_meta = np.stack((image_meta,image_meta), axis=0)
P2 = np.random.rand(32, 32, 64)
P2 = np.stack((P2, P2))
P2 =K.variable(P2)
P3 = P2
P4 = P2
P5 = P2
feature_maps = [P2, P3, P4, P5]
inputs = [K.variable(boxes), K.variable(image_meta)] + feature_maps

and feed them into PyramidROIAlign Layer

prl = PyramidROIAlign([7, 7], name="roi_align_classifier")
x = prl(inputs)

I checked the outputs shape and found mismatch between x.shape and compute_output_shape()

x.shape
>>>(1, 6, 7, 7, 64)
prl.compute_output_shape(prl.input_shape)
>>>(2, 3, 7, 7, 64)

I reviewed the code inside PyramidROIAlign and found this. I thought it didn't "re-add" the batch dimension information back to pooled

 # Re-add the batch dimension
        pooled = tf.expand_dims(pooled, 0)

Should this code be modified to code below?

pooled = tf.reshape(pooled, (batchsize, boxesnum_per_img) + pooled.shape[2:])

The text was updated successfully, but these errors were encountered:

happynewya · 2018-09-10T12:07:05Z

I think this is a bug. But due to the bbox loss computation order, it wont cause exception or miscalculation.
To be confirmed after reading the code

hnyz979 · 2018-09-10T15:18:37Z

It is very strange that the x is a 5-dimensional tensor. It seems that this is caused by the KL.distributed function by the KERAS engine. I am not familiar with KERAS, so I changed all these to tensorflow. The first two dimensions are merged.

hejujie · 2018-09-11T05:30:07Z

I use this function in tensorflow, but I get the output with shape (?, 7, 7, 64), I replace the expand with reshape by pooled = tf.reshape(pooled, ([boxes.shape[0], boxes.shape[1]] + list(pooled.shape[1:]))), and get the output with shape (2, 200, 7, 7, 64)， where 2 is the batch size, 200 is the num_roi.

waleedka · 2018-09-21T09:43:02Z

@zhydong I think you're right. Good catch! Instead of returning a shape [batch, boxes, H, W, C], the PyramidROIAlign is returning [1, batch*boxes, H, W, C].

However, the output from this layer is immediately fed to a TimeDistributed layer, which in effect merge the first two dimensions (batch & boxes), then apply a convolution, and then separate the batch and boxes dimensions. And it's separating the dimensions correctly, which corrects the bug in the earlier line.

So, as far as I can tell, yes this is a bug, but it's not causing any issues in training or detection. I think it's better to fix it, though. If you submit a PR I'd be happy to merge it. Alternatively, I'll try to make time to fix and test it.

As discussed in: #919

waleedka · 2018-09-28T06:55:22Z

I pushed a fix for this, so I'm closing this issue.

* Small typo fix * loss weights * Fix multi-GPU training. A previous fix to let validation run across more than one batch caused an issue with multi-GPU training. The issue seems to be in how Keras averages loss and metric values, where it expects them to be scalars rather than arrays. This fix causes scalar outputs from a model to remain scalar in multi-GPU training. * Replace keep_dims with keepdims in TF calls. TF replaced keep_dims with keepdims a while ago and now shows a warning when using the old name. * Headline typo fix in README.md Fixed the typo in the headline of the README.md file. "Spash" should be "Splash" * Splash sample: fix filename and link to blog post * Update utils.py * Minor cleanup in compute_overlaps_masks() * Fix: color_splash when no masks are detected Reported here: matterport#500 * fix typo fix typo * fix "No such file or directory" if not use: "keras.callbacks.TensorBoard" * Allow dashes in model name. Print a message when re-starting from saved epoch * Fix problem with argmax on (0,0) arrays. Fix matterport#170 * Allow configuration of FPN layers size and top-down pyramid size * Allow custom backbone implementation through Config.BACKBONE This allows one to set a callable in Config.BACKBONE to use a custom backbone model. * modified comment for image augmentation line import to include correct 'pip3 install imgaug' instructions * Raise clear error if last training weights are not foundIf using the --weights=last (or --model=last) to resume trainingbut the weights are not found now it raises a clear error message. * Fix Keras engine topology to saving * Fix load_weights() for Keras versions before 2.2 Improve previous commit to not break on older versions of Keras. * Update README.md * Add custom callbacks to model training Add an optional parameter for calling a list of keras.callbacks to be add to the original list. * Add no augmentation sources Add the possibility to exclude some sources from augmentation by passing a list of sources. This is useful when you want to retrain a model having few images. * Improve previous commit to avoid mutable default arguments * Updated Coco Example * edit loss desc * spellcheck config.py * doublecheck on config.py * spellcheck utils.py * spellcheck visualize.py * Links to two more projects in README * Add Bibtex to README * make pre_nms_limit configurable * Make pre_nms_limit configurable * Made compatible to new version of VIA JSON format VIA has changed JSON formatting in later versions. Now instead of a dictionary, "regions" has a list, see the issue matterport#928 * Comments to explain VIA 2.0 JSON change * Fix the comment on output shape in RPN * Bugfix for MaskRCNN creating empty log_dir that breaks find_last() - Current implementation creates self.log_dir in set_log_dir() function, which creates an empty log directory if none exists. This causes find_last() to fail after creating a model because it finds this new empty directory instead of the previous training directory. - New implementation moves log_dir creation to the train() function to ensure it is only created when it will be used. * Added automated epoch recognition for Windows. (matterport#798) Unified regex expression for both, Linux and Windows. * Fixed tabbing issue in previous commit * bug fix: the output_shape of roi_gt_class_ids is incorrect * Bug fix: inspect_balloon_model.ipynb Fix bugs of not showing boxes in 1.b RPN Predictions. TF 1.9 introduces "ROI/rpn_non_max_suppression/NonMaxSuppressionV3:0", so the original code can't work. * Apply previous commit to the other notebooks * Fixed comment on GPU_COUNT (matterport#878) Fixed comment on GPU_COUNT * add IMAGE_CHANNEL_COUNT class variable to config to make it easier to use Mask_RCNN for non 3-channel images * Additional comments for the previous commit * Link to new projects in README * Tiny correction in README. * Adjust PyramidROIAlign layer shape comment For PyramidROIAlign's output shape, use pool_height and pool_width instead of height and width to avoid confusion with those of feature_maps. * fix output shape of fpn_classifier_graph 1. fix the comment on output shape in fpn_classifier_graph 2. unify NUM_CLASSES and num_classes to NUM_CLASSES 3. unify boxes, num_boxes, num_rois, roi_count to num_rois 4. use more specific POOL_SIZE and MASK_ POOL_SIZE to replace pool_height and pool_width * Fix PyramidROIAlign output shape As discussed in: matterport#919 * Fix comments in Detection Layer 1. fix description on window 2. fix output shape of detection layer * use smooth_l1_loss() to reduce code duplication * A wrapper for skimage resize() to avoid warnings skimage generates different warnings depending on the version. This wrapper function calls skimage.tranform.resize() with the right parameter for each version. * Remove unused method: append_data()

As discussed in: matterport#919

As discussed in: matterport/Mask_RCNN#919

happynewya mentioned this issue Sep 25, 2018

PyramidROIAlign output shape bug #971

Closed

waleedka added a commit that referenced this issue Sep 28, 2018

Fix PyramidROIAlign output shape

6ece2f8

As discussed in: #919

waleedka closed this as completed Sep 28, 2018

wmalara mentioned this issue Jan 4, 2019

how can i split batch and num_boxes after ROIAlign? #732

Open

Cpruce pushed a commit to Cpruce/Mask_RCNN that referenced this issue Jan 17, 2019

Fix PyramidROIAlign output shape

8af682b

As discussed in: matterport#919

aneeshchauhan pushed a commit to aneeshchauhan/Mask_RCNN that referenced this issue Jul 9, 2019

Fix PyramidROIAlign output shape

58333c4

As discussed in: matterport#919

withyou53 pushed a commit to withyou53/mask_r-cnn_for_object_detection that referenced this issue Sep 27, 2023

Fix PyramidROIAlign output shape

f8a9050

As discussed in: matterport/Mask_RCNN#919

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyramidROIAlign shape question #919

PyramidROIAlign shape question #919

happynewya commented Sep 10, 2018

happynewya commented Sep 10, 2018

hnyz979 commented Sep 10, 2018

hejujie commented Sep 11, 2018

waleedka commented Sep 21, 2018

waleedka commented Sep 28, 2018