You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that STARK doesn't mention anything about a dynamically updated template (DUT for short) during training procedure, is it a deliberate design or am I missing something?
I reckon that the DUT is actually something like a short-term memory, and it should not be treated equally as a normal template from the first frame by the transformer, so the DUT should be explicitly included in training. However, this is not how STARK has been implemented.
So I'm curious what's the intuition or reasoning behind STARK's current training protocol of dismissing the DUT?
The text was updated successfully, but these errors were encountered:
Hi, we use get_frame_ids_trident function to sample the initial template frame, search region frame, and the dynamic template frame. To be specific, we first randomly sample two frames as the initial template and the search region. The interval between them is ranged from [0, L] (L is the sequence length). Then we sample an extra frame as the dynamic template. The interval between the search frame and the dynamic frame is ranged from [0, max_interval] (here we set the default max_interval as 200)
Got it. So I guess the gradient of the loss function won't be backpropagated through time through dynamic templates, since they are sampled in an indifferentiable way. This is still confusing, during inference dynamic templates are chosen by the network, but during training they seem to be independent of other parts. Will such a gap affect accuracy?
Yep, during the training stage, the dynamic template is sampled heuristically rather than using the network. Ideally, the training process should be completely consistent with the test process, which is in a sequential manner. However, due to the memory limit, we don't implement backpropagation through time.
It seems that STARK doesn't mention anything about a dynamically updated template (DUT for short) during training procedure, is it a deliberate design or am I missing something?
I reckon that the DUT is actually something like a short-term memory, and it should not be treated equally as a normal template from the first frame by the transformer, so the DUT should be explicitly included in training. However, this is not how STARK has been implemented.
So I'm curious what's the intuition or reasoning behind STARK's current training protocol of dismissing the DUT?
The text was updated successfully, but these errors were encountered: