We annotate the popular long-term tracking dataset, LTB50, with dense language descriptions. Based on this language-annotated dataset, we extend traditional Long-term visual Tracking (LT) to Long-term Vision-Language Tracking (LVLT).
We also provide an annotation toolkit, which is developed with the tkinter package.
python -m lib.gui
Text Box
: The upper one shows the last description, the lower one is used to annotate the current frame. You can fill the lower one with a language description and click thesave
button (or press theEnter
key).- key
Ctrl+Up
andCtrl+Down
(button|<
abd>|
): choose video - key
Ctrl+Left
andCtrl+Right
(button<
abd>
): choose frame - key
Shift+Left
andShift+Right
(button<<
abd>>
): fast-backward and fast-forward - key
Alt+Left
andAlt+Right
(button@<<
abd>>@
): to the last description, to the next description - key
Enter
(buttonSave
): save the description of current frame - key
Delete
(buttonClear
): clear the description of current frame
If you find this project useful in your research, please consider cite:
@article{DBLP:journals/corr/abs-1804-07056,
author = {Alan Lukezic and
Luka Cehovin Zajc and
Tom{\'{a}}s Voj{\'{\i}}r and
Jiri Matas and
Matej Kristan},
title = {Now you see me: evaluating performance in long-term visual tracking},
eprinttype = {arXiv},
eprint = {1804.07056},
}
- Now you see me: evaluating performance in long-term visual tracking.
Alan Lukežič, Luka Čehovin Zajc, Tomáš Vojíř, Jiří Matas, Matej Kristan. arXiv, 1804.07056.
LVLT is released under the GPL-3.0 License.