Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking #6

Closed
rockbottom12 opened this issue Mar 17, 2018 · 1 comment
Closed

Tracking #6

rockbottom12 opened this issue Mar 17, 2018 · 1 comment

Comments

@rockbottom12
Copy link

rockbottom12 commented Mar 17, 2018

1). I am unable to get how are you tracking an object? I can't get the tracking part.
2). Please describe WeightedBCECriterion.lua file as I am unable to get the function of this.

@DjuLee
Copy link

DjuLee commented Mar 18, 2018

The AAAI paper might be helpful to help answer both your questions but here are some hopefully clarifying points:

  1. there is no explicit tracking in the traditional sense (where you track an object with its bounding box coordinates). There is no white circle coordinates to track, the network simply does pixel-wise occupancy prediction (and occupancy is represented in white). The tracking consists in the following: at training time the network learns to capture patterns of occupied cells over several frames (i.e dynamics of occupancy). It is then able to predict pixel-wise occupancy into the future/occlusion given those learned patterns. Again, there is no explicit understanding of objects, but the network is able to capture patterns of pixels moving together and though it is doing pixel-wise prediction (rather than object prediction in the form of bounding boxes) it is able to predict coherent occupancy grids because it is the easiest way for the network to make sense of the visible input.

  2. We wish to predict pixel-wise occupancy, which is either 0 or 1. So it's a binary output, and a relevant distribution to capture this output is the binary distribution. If we had ground truth for the entire output occupancy, we would use the Binary Cross Entropy (BCE) loss as provided by TORCH, calculated for every output pixel using the ground truth target occupancy. However, we do not have ground truth for the entire output, since we have only partial observations of the scene. We do not know what happens in natural occlusions, so we cannot calculate a loss on pixels that are occluded. In order to not penalise the network for its predictions on occluded cells/pixels, we decide to ignore these cells when calculating the BCE loss. We do this by masking our output prediction with the visibility grid - the visibility grid gives a 1 to the pixels that are visible (either occupied or free), and 0 to those occluded. In doing so, we are only considering the loss from the cells that are visible. This is why the file is named WeightedBCE. If you look at line 27, that is the BCE loss, where input corresponds to the network prediction, and target is the visible part of the ground truth. The masking of the prediction with the visibility mask happens on line 29 and 31, where the input (i.e network prediction) is dot multiplied by the visibility mask (weights). The additional eps (epsilon) value is there to make sure we don't calculate the log of an empty input. It's just for numerical reasons. A similar masking of the gradients occurs in the updatedGradInput function.

@rockbottom12 rockbottom12 changed the title Tracking without coordinates (How) Tracking Mar 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants