Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training process not utilizing a dynamically updated template #12

Closed
luowyang opened this issue Jun 12, 2021 · 3 comments
Closed

Training process not utilizing a dynamically updated template #12

luowyang opened this issue Jun 12, 2021 · 3 comments

Comments

@luowyang
Copy link

luowyang commented Jun 12, 2021

It seems that STARK doesn't mention anything about a dynamically updated template (DUT for short) during training procedure, is it a deliberate design or am I missing something?

I reckon that the DUT is actually something like a short-term memory, and it should not be treated equally as a normal template from the first frame by the transformer, so the DUT should be explicitly included in training. However, this is not how STARK has been implemented.

So I'm curious what's the intuition or reasoning behind STARK's current training protocol of dismissing the DUT?

@MasterBin-IIAU
Copy link
Collaborator

Hi, we use get_frame_ids_trident function to sample the initial template frame, search region frame, and the dynamic template frame. To be specific, we first randomly sample two frames as the initial template and the search region. The interval between them is ranged from [0, L] (L is the sequence length). Then we sample an extra frame as the dynamic template. The interval between the search frame and the dynamic frame is ranged from [0, max_interval] (here we set the default max_interval as 200)

@luowyang
Copy link
Author

Got it. So I guess the gradient of the loss function won't be backpropagated through time through dynamic templates, since they are sampled in an indifferentiable way. This is still confusing, during inference dynamic templates are chosen by the network, but during training they seem to be independent of other parts. Will such a gap affect accuracy?

@MasterBin-IIAU
Copy link
Collaborator

Yep, during the training stage, the dynamic template is sampled heuristically rather than using the network. Ideally, the training process should be completely consistent with the test process, which is in a sequential manner. However, due to the memory limit, we don't implement backpropagation through time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants