You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question about the paper. You feed the output of ROIAlign into the matching network. I'm having trouble understanding figure 4. How is the input for the matching network of a single image an NxNx256 tensor? N is the number of garment classes, correct? The output of ROIAlign is either 7x7x256 or 14x14x256 (depending on if you take the bbox stream or mask stream). How are you getting NxN?
Thanks!
The text was updated successfully, but these errors were encountered:
N is the size of feature map of a ROI. Given a ROI, a fixed NxNxC feature map is extracted after ROIAlign to represent features of that ROI and is then fed to the matching network.
Still curious about the RoI features fed into the matching net.
In the mask head (link), it has the procedure:
backbone -> RoI Pooling -> 4x conv (feature extractor) -> 1x deconv + 1 conv (predictor)
So the RoI features fed into the match net should be the features after RoI Pooling. Am I correct?
Hey guys, fantastic work.
I have a question about the paper. You feed the output of ROIAlign into the matching network. I'm having trouble understanding figure 4. How is the input for the matching network of a single image an NxNx256 tensor? N is the number of garment classes, correct? The output of ROIAlign is either 7x7x256 or 14x14x256 (depending on if you take the bbox stream or mask stream). How are you getting NxN?
Thanks!
The text was updated successfully, but these errors were encountered: