-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple objects when training #128
Comments
In inference time, we independently process each object as a binary mask and merge them at the end. One exception is that the sum of all "other" object masks would also be fed as a separate channel input to suppress the response in other object areas. This implementation detail follows from STM, and I think it generally helps. In training time, we pick at most two random objects for a single video snippet. This allows that extra channel to be learned. While technically it can be extended to any number of objects in training time, unfortunately, the current code is written in a way that this extension is not trivial (and I take the blame for that). STM uses three objects in training time, while we use two. This is mainly for computational and memory constraints. I do think the benefit of going from 1->2 objects in training is much larger than that of going from 2->3, i.e., with diminishing returns. |
Thank you very much, I can understand more clearly now. |
@kaylode Hi, I'm particularly interested in your question. What's the difference between training on masks with 2 objects and training with a single mask for each object? |
@hkchengrex Hi, what does it mean going from 1->2 objects in training ? Do you mean that if there is enough memory, training stcn with three targets will perform better? |
Looking forward to your reply! |
|
@hkchengrex You mean it's better to use a target when training? |
|
@longmalongma What I did was try splitting my multiclass masks into binary masks (single object with background) and let the model learn. But it just got worse than sampling TWO objects. In my opinion, using two classes at a time helps enhance the model's ability to distinguish between different ones, it is like introducing some constraints for the model to capture. |
@kaylode It's in the appendix -- F.3.2. |
@kaylode Ok, thank you very much! |
Hello @hkchengrex , first of all thanks for your dedicated projects, it really inspires me a lot. This is not a bug report, but a small question.
I have been running and modifying most of the your code base to adapt for my problem and it works acceptably. However, there is a part of the code that is still unclear to me:
As I understand (from the code and the paper), STCN trains as a binary segmentation task , but I also notice that in the code when training on VOS you use a second additional class to train as well (by randomly choice from list of classes, if I understand correctly). May I ask what is the meaning of this? I cannot find where it is mentioned in the paper. Is it possible to use more classes ? I have tried training on masks with 2 objects on a different dataset, and it seems that the result is far better than when training with a single mask for each object.
Thanks in advance
The text was updated successfully, but these errors were encountered: