SipMask-VIS tracking implementation #50

ollefager · 2021-05-18T07:54:33Z

Hi,

In your paper you say that you match instances across frames similar to MaskTrack R-CNN. Where in your code do you perform this matching? and where can I adjust the parameters such as how many frames in the past to consider (beta in the MOTS paper) and the distance threshold (delta in the MOTS paper)?

JialeCao001 · 2021-05-18T10:11:19Z

@ollefager During Inference, we just compare the previous one frame with current frame. Please refer

SipMask/SipMask-VIS/mmdet/models/anchor_heads/sipmask_head.py

Line 619 in bc63fa9

det_labels = det_bboxes[1]

ollefager · 2021-05-18T13:24:45Z

Thank you! Hmm, maybe I'm misunderstanding you. Because from my results I see that a specific track_id can disappear for a couple of frames to then appear again. This seems to indicate that the tracks are being saved for a number of frames.

JialeCao001 · 2021-05-18T13:29:32Z

@ollefager The code save the last features of each previous object in memory.

ollefager · 2021-05-18T13:48:24Z

yes, so in each frame you also compare to the features of the objects in memory? If you only compare to the objects in the previous frame how could an object that has appeared earlier but does not appear in the previous frame, appear in the current frame?

JialeCao001 · 2021-05-18T13:53:32Z

Yes. You are right. We only save one feature for each existed object.

ollefager · 2021-05-18T14:03:38Z

Okay. So do you ever remove an object from memory? Like if it hasn't appeared for a number of frames?

ollefager · 2021-05-18T16:47:43Z

If you don't mind I have another question. When comparing feature vectors you seem to compute their dot product, this line

SipMask/SipMask-VIS/mmdet/models/anchor_heads/sipmask_head.py

Line 630 in bc63fa9

prod = torch.mm(det_roi_feats, torch.transpose(self.prev_roi_feats, 0, 1))

I don't really see how this measures their similarity however, as a feature vector with large values would score higher than a more similar feature vector with lower values, could you maybe explain?

JialeCao001 · 2021-05-19T01:31:29Z

The dot product is used to compute the feature similarity. L638-L643 select the best match.

ollefager · 2021-05-19T07:02:52Z

Okay, thank you very much for your time!

ollefager closed this as completed May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SipMask-VIS tracking implementation #50

SipMask-VIS tracking implementation #50

ollefager commented May 18, 2021

JialeCao001 commented May 18, 2021

ollefager commented May 18, 2021

JialeCao001 commented May 18, 2021

ollefager commented May 18, 2021

JialeCao001 commented May 18, 2021

ollefager commented May 18, 2021

ollefager commented May 18, 2021

JialeCao001 commented May 19, 2021

ollefager commented May 19, 2021

SipMask-VIS tracking implementation #50

SipMask-VIS tracking implementation #50

Comments

ollefager commented May 18, 2021

JialeCao001 commented May 18, 2021

ollefager commented May 18, 2021

JialeCao001 commented May 18, 2021

ollefager commented May 18, 2021

JialeCao001 commented May 18, 2021

ollefager commented May 18, 2021

ollefager commented May 18, 2021

JialeCao001 commented May 19, 2021

ollefager commented May 19, 2021