Question about the design of Match-Net and the features fed in. #31

xwjabc · 2019-10-09T21:44:37Z

According to the paper, the feature extractor of match-net has 4-conv layers, one pooling layer and one fc layer. Are these layers:
-- Conv1: 3x3 conv - 256 channels -> ReLU
-- Conv2: 3x3 conv - 256 channels -> ReLU
-- Conv3: 3x3 conv - 1024 channels -> ReLU
-- Conv4: 3x3 conv - 1024 channels -> ReLU
-- Pooling: GlobalAvgPool
-- FC: 1024 to 256 channels (No ReLU)
Besides, the similarity learning net have:
-- Substraction (output 256 channels)
-- Element-wise square (output 256 channels)
-- FC: 256 to 1 channels (No ReLU)
-- Sigmoid function.
Am I correct?
In the mask head, it has the procedure:
backbone -> RoI Pooling -> 4x conv (feature extractor) -> 1x deconv + 1 conv (predictor)
So in the paper, for the experiments using mask features, the RoI features fed into the match net should be the features after RoI Pooling. Am I correct? Do we have individual RoI Pooling for match net or just re-use the RoI Pooled features from mask head?

geyuying · 2019-10-10T08:32:46Z

You are correct. Just re-use the RoI Pooled features from mask head because after the second stage, features from RoI Align already contain mask information. We tried using features from other layers, but got worse performance.

geyuying · 2019-10-10T08:36:59Z

-- Conv1: 3x3 conv - 256 channels -> ReLU
-- Conv2: 3x3 conv - 256 channels -> ReLU
-- Conv3: 3x3 conv - 256 channels -> ReLU
-- Conv4: 3x3 conv - 1024 channels -> ReLU
-- Pooling: GlobalAvgPool
--ReLU
-- FC: 1024 to 256 channels (No ReLU) +BN
Besides, the similarity learning net have:
-- Substraction (output 256 channels)
-- Element-wise square (output 256 channels)
-- FC: 256 to 2 channels (No ReLU)(The first channel means similarity, the second channel means difference. Positive pair label (1,0) ,negative pair label(0,1)
-- Softmax function.

xwjabc · 2019-10-11T04:11:13Z

Thank you for your great help! Besides, I have two more questions:

In the first version of the answer of the match network, I noticed that there are several tile operations:

INFO net.py: 263: self1 : (64, 256) => self_user : (8, 8, 256) ------- (op: Reshape)
INFO net.py: 263: self_user : (8, 8, 256) => self_user_ : (8, 8, 256) ------- (op: Transpose)
INFO net.py: 263: self_user_ : (8, 8, 256) => self_user_after : (64, 256) ------- (op: Reshape)
INFO net.py: 263: self_user_after : (64, 256) => self_user_after_ : (512, 256) ------- (op: Tile)
INFO net.py: 263: self2 : (64, 256) => self_shop_before : (64, 2048) ------- (op: Tile)
INFO net.py: 263: self_shop_before : (64, 2048) => self_shop : (512, 256) ------- (op: Reshape)

Could you explain a bit of the use of tile function?
Besides, I see the final output has shape (512, 2). However, according to the discussion, we should have 4096 pairs (512 positive pairs and 3584 negative pairs), which will lead to a shape of (4096, 2). I wonder the reason of such gap.

In the evaluation of the retrieval, does Match R-CNN compare the user instance with all shop instances, or only compare the user instance with the shop instances which has the same predicted class as the user instance?

geyuying · 2019-10-11T06:07:58Z

4019 is proper. In our experiment, in oder to reduce the number of pairs, we do not use all pairs.
compare the user instance with all shop instances

xwjabc · 2019-10-14T04:57:47Z

Thank you for your great help! In my current implementation, I use the mask features after RoIAlign in the mask branch. However, the number of instances in the mask features is limited (1~2 instances per gt garment (unique pair_id + style) in total at the beginning of the training). Thus, I wonder how you can generate 8 instances per image for the retrieval task? Thx!

joppichristian · 2020-01-14T14:32:00Z

4019 is proper. In our experiment, in oder to reduce the number of pairs, we do not use all pairs.

compare the user instance with all shop instances

How did you compare all the user instance with all shop instances? It means an enormous number of comparisons. I have 4x Titan RTX and tqdm estimates 6000 hours to complete the evaluation. Have I missed something?

xwjabc mentioned this issue Oct 30, 2019

Some questions about the Match-Net. #36

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the design of Match-Net and the features fed in. #31

Question about the design of Match-Net and the features fed in. #31

xwjabc commented Oct 9, 2019 •

edited

Loading

geyuying commented Oct 10, 2019

geyuying commented Oct 10, 2019

xwjabc commented Oct 11, 2019

geyuying commented Oct 11, 2019

xwjabc commented Oct 14, 2019

joppichristian commented Jan 14, 2020 •

edited

Loading

Question about the design of Match-Net and the features fed in. #31

Question about the design of Match-Net and the features fed in. #31

Comments

xwjabc commented Oct 9, 2019 • edited Loading

geyuying commented Oct 10, 2019

geyuying commented Oct 10, 2019

xwjabc commented Oct 11, 2019

geyuying commented Oct 11, 2019

xwjabc commented Oct 14, 2019

joppichristian commented Jan 14, 2020 • edited Loading

xwjabc commented Oct 9, 2019 •

edited

Loading

joppichristian commented Jan 14, 2020 •

edited

Loading