Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyramidROIAlign shape question #919

Closed
happynewya opened this issue Sep 10, 2018 · 5 comments
Closed

PyramidROIAlign shape question #919

happynewya opened this issue Sep 10, 2018 · 5 comments

Comments

@happynewya
Copy link

I have a question when I read codes of PyramidROIAlign class in model.py. The first dimension of pooled should be batch, but I didnt get a right result.
I constructed test inputs:

boxes = np.array([
    [
        [0.1, 0.1, 0.3, 0.5],
        [0.1, 0.2, 0.65, 0.9],
        [0.12, 0.32, 0.5, 0.4]
    ],
    [
        [0.12, 0.12, 0.3,   0.53],
        [0.12, 0.23, 0.365, 0.49],
        [0.1,  0.6,  0.4, 0.9]
    ]
], dtype=np.float32)
image_meta = compose_image_meta(123, [512,512,3], [1024,1024,3], (0,0,1023,1023), 2, [1,2,3])
image_meta = np.stack((image_meta,image_meta), axis=0)
P2 = np.random.rand(32, 32, 64)
P2 = np.stack((P2, P2))
P2 =K.variable(P2)
P3 = P2
P4 = P2
P5 = P2
feature_maps = [P2, P3, P4, P5]
inputs = [K.variable(boxes), K.variable(image_meta)] + feature_maps

and feed them into PyramidROIAlign Layer

prl = PyramidROIAlign([7, 7], name="roi_align_classifier")
x = prl(inputs)

I checked the outputs shape and found mismatch between x.shape and compute_output_shape()

x.shape
>>>(1, 6, 7, 7, 64)
prl.compute_output_shape(prl.input_shape)
>>>(2, 3, 7, 7, 64)

I reviewed the code inside PyramidROIAlign and found this. I thought it didn't "re-add" the batch dimension information back to pooled

 # Re-add the batch dimension
        pooled = tf.expand_dims(pooled, 0)

Should this code be modified to code below?

pooled = tf.reshape(pooled, (batchsize, boxesnum_per_img) + pooled.shape[2:])
@happynewya
Copy link
Author

I think this is a bug. But due to the bbox loss computation order, it wont cause exception or miscalculation.
To be confirmed after reading the code

@hnyz979
Copy link

hnyz979 commented Sep 10, 2018

It is very strange that the x is a 5-dimensional tensor. It seems that this is caused by the KL.distributed function by the KERAS engine. I am not familiar with KERAS, so I changed all these to tensorflow. The first two dimensions are merged.

@hejujie
Copy link

hejujie commented Sep 11, 2018

I use this function in tensorflow, but I get the output with shape (?, 7, 7, 64), I replace the expand with reshape by pooled = tf.reshape(pooled, ([boxes.shape[0], boxes.shape[1]] + list(pooled.shape[1:]))), and get the output with shape (2, 200, 7, 7, 64), where 2 is the batch size, 200 is the num_roi.

@waleedka
Copy link
Collaborator

@zhydong I think you're right. Good catch! Instead of returning a shape [batch, boxes, H, W, C], the PyramidROIAlign is returning [1, batch*boxes, H, W, C].

However, the output from this layer is immediately fed to a TimeDistributed layer, which in effect merge the first two dimensions (batch & boxes), then apply a convolution, and then separate the batch and boxes dimensions. And it's separating the dimensions correctly, which corrects the bug in the earlier line.

So, as far as I can tell, yes this is a bug, but it's not causing any issues in training or detection. I think it's better to fix it, though. If you submit a PR I'd be happy to merge it. Alternatively, I'll try to make time to fix and test it.

waleedka added a commit that referenced this issue Sep 28, 2018
@waleedka
Copy link
Collaborator

I pushed a fix for this, so I'm closing this issue.

LexLuc added a commit to LexLuc/Mask_RCNN that referenced this issue Sep 28, 2018
* Small typo fix

* loss weights

* Fix multi-GPU training.

A previous fix to let validation run across more
than one batch caused an issue with multi-GPU
training. The issue seems to be in how Keras
averages loss and metric values, where it expects
them to be scalars rather than arrays. This fix
causes scalar outputs from a model to remain
scalar in multi-GPU training.

* Replace keep_dims with keepdims in TF calls.

TF replaced keep_dims with keepdims a while ago
and now shows a warning when using the old name.

* Headline typo fix in README.md

Fixed the typo in the headline of the README.md file. "Spash" should be "Splash"

* Splash sample: fix filename and link to blog post

* Update utils.py

* Minor cleanup in compute_overlaps_masks()

* Fix: color_splash when no masks are detected

Reported here: matterport#500

* fix typo

fix typo

* fix "No such file or directory" if not use: "keras.callbacks.TensorBoard"

* Allow dashes in model name.
Print a message when re-starting from saved epoch

* Fix problem with argmax on (0,0) arrays.

Fix matterport#170

* Allow configuration of FPN layers size and top-down pyramid size

* Allow custom backbone implementation through Config.BACKBONE

This allows one to set a callable in Config.BACKBONE to use a custom
backbone model.

* modified comment for image augmentation line import to include correct 'pip3 install imgaug' instructions

* Raise clear error if last training weights are not foundIf using the --weights=last (or --model=last) to resume trainingbut the weights are not found now it raises a clear error message.

* Fix Keras engine topology to saving

* Fix load_weights() for Keras versions before 2.2

Improve previous commit to not break on older versions of Keras.

* Update README.md

* Add custom callbacks to model training

Add an optional parameter for calling a list of keras.callbacks to be add to the original list.

* Add no augmentation sources

Add the possibility to exclude some sources from augmentation by passing a list of sources. This is useful when you want to retrain a model having few images.

* Improve previous commit to avoid mutable default arguments

* Updated Coco Example

* edit loss desc

* spellcheck config.py

* doublecheck on config.py

* spellcheck utils.py

* spellcheck visualize.py

* Links to two more projects in README

* Add Bibtex to README

* make pre_nms_limit configurable

* Make pre_nms_limit configurable

* Made compatible to new version of VIA JSON format

VIA has changed JSON formatting in later versions. Now instead of a dictionary, "regions" has a list, see the issue matterport#928

* Comments to explain VIA 2.0 JSON change

* Fix the comment on output shape in RPN

* Bugfix for MaskRCNN creating empty log_dir that breaks find_last()
- Current implementation creates self.log_dir in set_log_dir() function,
  which creates an empty log directory if none exists. This causes
  find_last() to fail after creating a model because it finds this new
  empty directory instead of the previous training directory.
- New implementation moves log_dir creation to the train() function to
  ensure it is only created when it will be used.

* Added automated epoch recognition for Windows. (matterport#798)

Unified regex expression for both, Linux and Windows.

* Fixed tabbing issue in previous commit

* bug fix: the output_shape of roi_gt_class_ids is incorrect

* Bug fix: inspect_balloon_model.ipynb

Fix bugs of not showing boxes in 1.b RPN Predictions.
TF 1.9 introduces "ROI/rpn_non_max_suppression/NonMaxSuppressionV3:0", so the original code can't work.

* Apply previous commit to the other notebooks

* Fixed comment on GPU_COUNT (matterport#878)

Fixed comment on GPU_COUNT

* add IMAGE_CHANNEL_COUNT class variable to config to make it easier to use Mask_RCNN for non 3-channel images

* Additional comments for the previous commit

* Link to new projects in README

* Tiny correction in README.

* Adjust PyramidROIAlign layer shape comment

For PyramidROIAlign's output shape, use pool_height and pool_width instead of height and width to avoid confusion with those of feature_maps.

* fix output shape of fpn_classifier_graph

1. fix the comment on output shape in fpn_classifier_graph
2. unify NUM_CLASSES and num_classes to NUM_CLASSES
3. unify boxes, num_boxes, num_rois, roi_count to num_rois
4. use more specific POOL_SIZE and MASK_ POOL_SIZE to replace pool_height and pool_width

* Fix PyramidROIAlign output shape

As discussed in: matterport#919

* Fix comments in Detection Layer

1. fix description on window
2. fix output shape of detection layer

* use smooth_l1_loss() to reduce code duplication

* A wrapper for skimage resize() to avoid warnings

skimage generates different warnings depending on the version. This wrapper function calls skimage.tranform.resize() with the right parameter for each version.

* Remove unused method: append_data()
LexLuc added a commit to LexLuc/Mask_RCNN that referenced this issue Sep 28, 2018
* Small typo fix

* loss weights

* Fix multi-GPU training.

A previous fix to let validation run across more
than one batch caused an issue with multi-GPU
training. The issue seems to be in how Keras
averages loss and metric values, where it expects
them to be scalars rather than arrays. This fix
causes scalar outputs from a model to remain
scalar in multi-GPU training.

* Replace keep_dims with keepdims in TF calls.

TF replaced keep_dims with keepdims a while ago
and now shows a warning when using the old name.

* Headline typo fix in README.md

Fixed the typo in the headline of the README.md file. "Spash" should be "Splash"

* Splash sample: fix filename and link to blog post

* Update utils.py

* Minor cleanup in compute_overlaps_masks()

* Fix: color_splash when no masks are detected

Reported here: matterport#500

* fix typo

fix typo

* fix "No such file or directory" if not use: "keras.callbacks.TensorBoard"

* Allow dashes in model name.
Print a message when re-starting from saved epoch

* Fix problem with argmax on (0,0) arrays.

Fix matterport#170

* Allow configuration of FPN layers size and top-down pyramid size

* Allow custom backbone implementation through Config.BACKBONE

This allows one to set a callable in Config.BACKBONE to use a custom
backbone model.

* modified comment for image augmentation line import to include correct 'pip3 install imgaug' instructions

* Raise clear error if last training weights are not foundIf using the --weights=last (or --model=last) to resume trainingbut the weights are not found now it raises a clear error message.

* Fix Keras engine topology to saving

* Fix load_weights() for Keras versions before 2.2

Improve previous commit to not break on older versions of Keras.

* Update README.md

* Add custom callbacks to model training

Add an optional parameter for calling a list of keras.callbacks to be add to the original list.

* Add no augmentation sources

Add the possibility to exclude some sources from augmentation by passing a list of sources. This is useful when you want to retrain a model having few images.

* Improve previous commit to avoid mutable default arguments

* Updated Coco Example

* edit loss desc

* spellcheck config.py

* doublecheck on config.py

* spellcheck utils.py

* spellcheck visualize.py

* Links to two more projects in README

* Add Bibtex to README

* make pre_nms_limit configurable

* Make pre_nms_limit configurable

* Made compatible to new version of VIA JSON format

VIA has changed JSON formatting in later versions. Now instead of a dictionary, "regions" has a list, see the issue matterport#928

* Comments to explain VIA 2.0 JSON change

* Fix the comment on output shape in RPN

* Bugfix for MaskRCNN creating empty log_dir that breaks find_last()
- Current implementation creates self.log_dir in set_log_dir() function,
  which creates an empty log directory if none exists. This causes
  find_last() to fail after creating a model because it finds this new
  empty directory instead of the previous training directory.
- New implementation moves log_dir creation to the train() function to
  ensure it is only created when it will be used.

* Added automated epoch recognition for Windows. (matterport#798)

Unified regex expression for both, Linux and Windows.

* Fixed tabbing issue in previous commit

* bug fix: the output_shape of roi_gt_class_ids is incorrect

* Bug fix: inspect_balloon_model.ipynb

Fix bugs of not showing boxes in 1.b RPN Predictions.
TF 1.9 introduces "ROI/rpn_non_max_suppression/NonMaxSuppressionV3:0", so the original code can't work.

* Apply previous commit to the other notebooks

* Fixed comment on GPU_COUNT (matterport#878)

Fixed comment on GPU_COUNT

* add IMAGE_CHANNEL_COUNT class variable to config to make it easier to use Mask_RCNN for non 3-channel images

* Additional comments for the previous commit

* Link to new projects in README

* Tiny correction in README.

* Adjust PyramidROIAlign layer shape comment

For PyramidROIAlign's output shape, use pool_height and pool_width instead of height and width to avoid confusion with those of feature_maps.

* fix output shape of fpn_classifier_graph

1. fix the comment on output shape in fpn_classifier_graph
2. unify NUM_CLASSES and num_classes to NUM_CLASSES
3. unify boxes, num_boxes, num_rois, roi_count to num_rois
4. use more specific POOL_SIZE and MASK_ POOL_SIZE to replace pool_height and pool_width

* Fix PyramidROIAlign output shape

As discussed in: matterport#919

* Fix comments in Detection Layer

1. fix description on window
2. fix output shape of detection layer

* use smooth_l1_loss() to reduce code duplication

* A wrapper for skimage resize() to avoid warnings

skimage generates different warnings depending on the version. This wrapper function calls skimage.tranform.resize() with the right parameter for each version.

* Remove unused method: append_data()
Cpruce pushed a commit to Cpruce/Mask_RCNN that referenced this issue Jan 17, 2019
aneeshchauhan pushed a commit to aneeshchauhan/Mask_RCNN that referenced this issue Jul 9, 2019
withyou53 pushed a commit to withyou53/mask_r-cnn_for_object_detection that referenced this issue Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants