Awesome-Single-Object-Tracking

Review papers

Single Object Tracking Research: A Survey. Chinese Journal of Computers 2022. [Paper]
Know Your Surroundings: Exploiting Scene Information for Object Tracking. Arxiv 2020/03. [Paper]

RGB Tracking

RGB Tracking 2022

Learning Spatial-Frequency Transformer for Visual Object Tracking. [paper] [code]
- Chuanming Tang, Xiao Wang, Yuanchao Bai, Zhe Wu, Jianlin Zhang, Yongmei Huang. Arxiv 2022/08
- Keyword: Gaussian Spatial Prior; High-frequency Emphasis Attention; Siamese-like.
- Digest
  We propose a unified Spatial-Frequency Transformer that models the Gaussian spatial Prior and High-frequency emphasis Attention (GPHA) simultaneously. To be specific, Gaussian spatial prior is generated using dual Multi-Layer Perceptrons (MLPs) and injected into the similarity matrix produced by multiplying Query and Key features in self-attention. The output will be fed into a Softmax layer and then decomposed into two components, i.e., the direct signal and high-frequency signal. The low- and high-pass branches are rescaled and combined to achieve all-pass, therefore, the high-frequency features will be protected well in stacked self-attention layers. We further integrate the Spatial-Frequency Transformer into the Siamese tracking framework and propose a novel tracking algorithm, termed SFTransT. The cross-scale fusion based SwinTransformer is adopted as the backbone, and also a multi-head cross-attention module is used to boost the interaction between search and template features. The output will be fed into the tracking head for target localization.
- Framework
AVisT: A Benchmark for Visual Object Tracking in Adverse Visibility. [paper] [code]
- Mubashir Noman, Wafa Al Ghallabi, Daniya Najiha, Christoph Mayer, Akshay Dudhane, Martin Danelljan, Hisham Cholakkal, Salman Khan, Luc Van Gool, Fahad Shahbaz Khan. Arxiv 2022/08
- Keyword: Benchmark; Adverse Visbility; Severe Weather Conditions; Adverse Imaging Effects.
- Digest
  We introduce AVisT, a dedicated benchmark for visual tracking in diverse scenarios with adverse visibility. AVisT comprises 120 challenging sequences with 80k annotated frames, spanning 18 diverse scenarios broadly grouped into five attributes with 42 object categories. The key contribution of AVisT is diverse and challenging scenarios covering severe weather conditions such as, dense fog, heavy rain and sandstorm; obstruction effects including, fire, sun glare and splashing water; adverse imaging effects such as, low-light; target effects including, small targets and distractor objects along with camouflage. We further benchmark 17 popular and recent trackers on AVisT with detailed analysis of their tracking performance across attributes, demonstrating a big room for improvement in performance.
- Framework
HighlightNet: Highlighting Low-Light Potential Features for Real-Time UAV Tracking. [paper] [code]
- Changhong Fu, Haolin Dong, Junjie Ye, Guangze Zheng, Sihang Li, Jilin Zhao. IROS 2022
- Keyword: UAV Tracking; Low-light Environment.
- Digest
  Low-light environments have posed a formidable challenge for robust unmanned aerial vehicle (UAV) tracking even with state-of-the-art (SOTA) trackers since the potential image features are hard to extract under adverse light conditions. Besides, due to the low visibility, accurate online selection of the object also becomes extremely difficult for human monitors to initialize UAV tracking in ground control stations. To solve these problems, this work proposes a novel enhancer, i.e., HighlightNet, to light up potential objects for both human operators and UAV trackers. By employing Transformer, HighlightNet can adjust enhancement parameters according to global features and is thus adaptive for the illumination variation. Pixel-level range mask is introduced to make HighlightNet more focused on the enhancement of the tracking object and regions without light sources. Furthermore, a soft truncation mechanism is built to prevent background noise from being mistaken for crucial features.
- Framework
Towards Sequence-Level Training for Visual Tracking. [paper] [code]
- Minji Kim, Seungkwan Lee, Jungseul Ok, Bohyung Han, Minsu Cho. ECCV 2022
- Keyword: Aerial Tracking; UAV Tracking; Single Object Tracking.
- Digest
  Despite the extensive adoption of machine learning on the task of visual object tracking, recent learning-based approaches have largely overlooked the fact that visual tracking is a sequence-level task in its nature; they rely heavily on frame-level training, which inevitably induces inconsistency between training and testing in terms of both data distributions and task objectives. This work introduces a sequence-level training strategy for visual tracking based on reinforcement learning and discusses how a sequence-level design of data sampling, learning objectives, and data augmentation can improve the accuracy and robustness of tracking algorithms.
- Framework
Local Perception-Aware Transformer for Aerial Tracking. [paper] [code]
- Changhong Fu, Weiyu Peng, Sihang Li, Junjie Ye, Ziang Cao. Arxiv 2022/08
- Keyword: Aerial Tracking; UAV Tracking; Single Object Tracking.
- Digest
  However, the Transformer structure is lack of enough inductive bias. In addition, only focusing on encoding the global feature does harm to modeling local details, which restricts the capability of tracking in aerial robots. Specifically, with local-modeling to global-search mechanism, the proposed tracker replaces the global encoder by a novel local-recognition encoder. In the employed encoder, a local-recognition attention and a local element correction network are carefully designed for reducing the global redundant information interference and increasing local inductive bias. Meanwhile, the latter can model local object details precisely under aerial view through detail-inquiry net.
- Framework
AiATrack: Attention in Attention for Transformer Visual Tracking. [paper] [code]
- Shenyuan Gao, Chunluan Zhou, Chao Ma, Xinggang Wang, Junsong Yuan. ECCV 2022
- Keyword: Single Object Tracking; Attention in Attention.
- Digest
  The independent correlation computation in the attention mechanism could result in noisy and ambiguous attention weights, which inhibits further performance improvement. To address this issue, we propose an attention in attention (AiA) module, which enhances appropriate correlations and suppresses erroneous ones by seeking consensus among all correlation vectors. Our AiA module can be readily applied to both self-attention blocks and cross-attention blocks to facilitate feature aggregation and information propagation for visual tracking. Moreover, we propose a streamlined Transformer tracking framework, dubbed AiATrack, by introducing efficient feature reuse and target-background embeddings to make full use of temporal references.
- Framework
Towards Grand Unification of Object Tracking. [paper] [code]
- Bin Yan, Yi Jiang, Peize Sun, Dong Wang, Zehuan Yuan, Ping Luo, Huchuan Lu. ECCV 2022 Oral
- Keyword: Grand Unification; Object Tracking; Object Segmentation.
- Digest
  We present a unified method, termed Unicorn, that can simultaneously solve four tracking problems (SOT, MOT, VOS, MOTS) with a single network using the same model parameters. Due to the fragmented definitions of the object tracking problem itself, most existing trackers are developed to address a single or part of tasks and overspecialize on the characteristics of specific tasks. By contrast, Unicorn provides a unified solution, adopting the same input, backbone, embedding, and head across all tracking tasks. For the first time, we accomplish the great unification of the tracking network architecture and learning paradigm.
- Framework
Transformer Tracking with Cyclic Shifting Window Attention. [paper] [code]
- Zikai Song, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang. CVPR 2022
- Keyword: Pixel-to-pixel Attention; Visual Object Tracking.
- Digest
  Existing transformer-based approaches adopt the pixel-to-pixel attention strategy on flattened image features and unavoidably ignore the integrity of objects. In this paper, we propose a new transformer architecture with multi-scale cyclic shifting window attention for visual object tracking, elevating the attention from pixel to window level. The cross-window multi-scale attention has the advantage of aggregating attention at different scales and generates the best fine-scale match for the target object. Furthermore, the cyclic shifting strategy brings greater accuracy by expanding the window samples with positional information, and at the same time saves huge amounts of computational power by removing redundant calculations.
- Framework
SparseTT: Visual Tracking with Sparse Transformers. [paper] [code]
- Zhihong Fu, Zehua Fu, Qingjie Liu, Wenrui Cai, Yunhong Wang. IJCAI 2022
- Keyword: Sparse Attention.
- Digest
  Self-attention lacks focusing on the most relevant information in the search regions, making it easy to be distracted by background. In this paper, we relieve this issue with a sparse attention mechanism by focusing the most relevant information in the search regions, which enables a much accurate tracking. Furthermore, we introduce a double-head predictor to boost the accuracy of foreground-background classification and regression of target bounding boxes, which further improve the tracking performance.
- Framework
Efficient Visual Tracking via Hierarchical Cross-Attention Transformer. [paper] [code]
- Xin Chen, Dong Wang, Dongdong Li, Huchuan Lu. Arxiv 2022/03
- Keyword: Efficiency; UAV Tracking.
- Digest
  This development is mainly attributed to powerful networks (such as transformers) and additional modules (such as online update and refinement modules). However, less attention has been paid to tracking speed. Most state-of-the-art trackers are satisfied with the real-time speed on powerful GPUs. However, practical applications necessitate higher requirements for tracking speed, especially when edge platforms with limited resources are used. In this work, we present an efficient tracking method via a hierarchical cross-attention transformer named HCAT. Our model runs about 195 fps on GPU, 45 fps on CPU, and 55 fps on the edge AI platform of NVidia Jetson AGX Xavier.
- Framework
Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework. [paper] [code]
- Botao Ye, Hong Chang, Bingpeng Ma, Shiguang Shan. ECCV 2022
- Keyword: One-stream; Target-background Discriminability.
- Digest
  The current popular two-stream, two-stage tracking framework extracts the template and the search region features separately and then performs relation modeling, thus the extracted features lack the awareness of the target and have limited target-background discriminability. We propose a novel one-stream tracking (OSTrack) framework that unifies feature learning and relation modeling by bridging the template-search image pairs with bidirectional information flows. In this way, discriminative target-oriented features can be dynamically extracted by mutual guidance. Since no extra heavy relation modeling module is needed and the implementation is highly parallelized, the proposed tracker runs at a fast speed. To further improve the inference efficiency, an in-network candidate early elimination module is proposed based on the strong similarity prior calculated in the one-stream framework.
- Framework
Robust Visual Tracking by Segmentation. [paper] [code]
- Matthieu Paul, Martin Danelljan, Christoph Mayer, Luc Van Gool. ECCV 2022
- Keyword: Segmentation-centric Tracking; Target Representation.
- Digest
  Estimating the target extent poses a fundamental challenge in visual object tracking. Typically, trackers are box-centric and fully rely on a bounding box to define the target in the scene. In practice, objects often have complex shapes and are not aligned with the image axis. In these cases, bounding boxes do not provide an accurate description of the target and often contain a majority of background pixels. We propose a segmentation-centric tracking pipeline that not only produces a highly accurate segmentation mask, but also internally works with segmentation masks instead of bounding boxes. Thus, our tracker is able to better learn a target representation that clearly differentiates the target in the scene from background content. In order to achieve the necessary robustness for the challenging tracking scenario, we propose a separate instance localization component that is used to condition the segmentation decoder when producing the output mask.
- Framework
MixFormer: End-to-End Tracking with Iterative Mixed Attention. [paper] [code]
- Yutao Cui, Cheng Jiang, Limin Wang, Gangshan Wu. CVPR 2022 Oral
- Keyword: Mixed Attention Module.
- Digest
  To simplify the tracking pipeline and unify the process of feature extraction and target information integration, we present a compact tracking framework, termed as MixFormer, built upon transformers. Our core design is to utilize the flexibility of attention operations, and propose a Mixed Attention Module (MAM) for simultaneous feature extraction and target information integration. This synchronous modeling scheme allows to extract target-specific discriminative features and perform extensive communication between target and search area. Based on MAM, we build our MixFormer tracking framework simply by stacking multiple MAMs with progressive patch embedding and placing a localization head on top. In addition, to handle multiple target templates during online tracking, we devise an asymmetric attention scheme in MAM to reduce computational cost, and propose an effective score prediction module to select high-quality templates.
- Framework
WebUAV-3M: A Benchmark for Unveiling the Power of Million-Scale Deep UAV Tracking. [paper] [code]
- Chunhui Zhang, Guanjie Huang, Li Liu, Shan Huang, Yinan Yang, Xiang Wan, Shiming Ge, Dacheng Tao. Arxiv 2022/01
- Keyword: UAV Tracking.
- Digest
  In this work, we propose WebUAV-3M, the largest public UAV tracking benchmark to date, to facilitate both the development and evaluation of deep UAV trackers. WebUAV-3M contains over 3.3 million frames across 4,500 videos and offers 223 highly diverse target categories. Each video is densely annotated with bounding boxes by an efficient and scalable semiautomatic target annotation (SATA) pipeline. Importantly, to take advantage of the complementary superiority of language and audio, we enrich WebUAV-3M by innovatively providing both natural language specifications and audio descriptions. We believe that such additions will greatly boost future research in terms of exploring language features and audio cues for multimodal UAV tracking. In addition, a fine-grained UAV tracking-under-scenario constraint (UTUSC) evaluation protocol and seven challenging scenario subtest sets are constructed to enable the community to develop, adapt and evaluate various types of advanced trackers. We provide extensive evaluations and detailed analyses of 43 representative trackers and envision future research directions in the field of deep UAV tracking and beyond.
- Framework

RGB Tracking 2021

High-Performance Discriminative Tracking with Transformers. [paper]
- Bin Yu, Ming Tang, Linyu Zheng, Guibo Zhu, Jinqiao Wang, Hao Feng, Xuetao Feng, Hanqing Lu. ICCV 2021
- Keyword: Discriminative tracker.
- Digest
  In this paper, we present DTT, a novel single-object discriminative tracker, based on an encoder-decoder Transformer architecture. By self- and encoder-decoder attention mechanisms, our approach is able to exploit the rich scene information in an end-to-end manner, effectively removing the need for hand-designed discriminative models. In online tracking, given a new test frame, dense prediction is performed at all spatial positions. Not only location, but also bounding box of the target object is obtained in a robust fashion, streamlining the discriminative tracking pipeline.
- Framework
FEAR: Fast, Efficient, Accurate and Robust Visual Tracker. [paper] [code]
- Vasyl Borsuk, Roman Vei, Orest Kupyn, Tetiana Martyniuk, Igor Krashenyi, Jiři Matas. ECCV 2022
- Keyword: Model Efficiency; Pixel-wise block.
- Digest
  We present FEAR, a family of fast, efficient, accurate, and robust Siamese visual trackers. We present a novel and efficient way to benefit from dual-template representation for object model adaption, which incorporates temporal information with only a single learnable parameter. We further improve the tracker architecture with a pixel-wise fusion block. Employed with the lightweight backbone, the optimized version FEAR-XS offers more than 10 times faster tracking than current Siamese trackers while maintaining near state-of-the-art results. FEAR-XS tracker is 2.4x smaller and 4.3x faster than LightTrack with superior accuracy. In addition, we expand the definition of the model efficiency by introducing FEAR benchmark that assesses energy consumption and execution speed.
- Framework
Learning to Track Objects from Unlabeled Videos. [paper] [code]
- Jilai Zheng, Chao Ma, Houwen Peng, Xiaokang Yang. ICCV 2021
- Keyword: Unsupervised learning; Siamese network.
- Digest
  In this paper, we propose to learn an Unsupervised Single Object Tracker (USOT) from scratch. We identify that three major challenges, i.e., moving object discovery, rich temporal variation exploitation, and online update, are the central causes of the performance bottleneck of existing unsupervised trackers. To narrow the gap between unsupervised trackers and supervised counterparts, we propose an effective unsupervised learning approach composed of three stages. First, we sample sequentially moving objects with unsupervised optical flow and dynamic programming, instead of random cropping. Second, we train a naive Siamese tracker from scratch using single-frame pairs. Third, we continue training the tracker with a novel cycle memory learning scheme, which is conducted in longer temporal spans and also enables our tracker to update online.
- Framework
HiFT: Hierarchical Feature Transformer for Aerial Tracking. [paper] [code]
- Ziang Cao, Changhong Fu, Junjie Ye, Bowen Li, Yiming Li. ICCV 2021
- Keyword: Aerial Tracking; UAV Tracking.
- Digest
  In this work, we propose an efficient and effective hierarchical feature transformer (HiFT) for aerial tracking. Hierarchical similarity maps generated by multi-level convolutional layers are fed into the feature transformer to achieve the interactive fusion of spatial (shallow layers) and semantics cues (deep layers). Consequently, not only the global contextual information can be raised, facilitating the target search, but also our end-to-end architecture with the transformer can efficiently learn the interdependencies among multi-level features, thereby discovering a tracking-tailored feature space with strong discriminability.
- Framework
Learning Spatio-Temporal Transformer for Visual Tracking. [paper] [code]
- Bin Yan, Houwen Peng, Jianlong Fu, Dong Wang, Huchuan Lu. ICCV 2021
- Keyword: Spatio-Temporal; Corner prediction.
- Digest
  In this paper, we present a new tracking architecture with an encoder-decoder transformer as the key component. The encoder models the global spatio-temporal feature dependencies between target objects and search regions, while the decoder learns a query embedding to predict the spatial positions of the target objects. Our method casts object tracking as a direct bounding box prediction problem, without using any proposals or predefined anchors. With the encoder-decoder transformer, the prediction of objects just uses a simple fully-convolutional network, which estimates the corners of objects directly. The whole method is end-to-end, does not need any postprocessing steps such as cosine window and bounding box smoothing, thus largely simplifying existing tracking pipelines.
- Framework
Transformer Tracking. [paper] [code]
- Xin Chen, Bin Yan, Jiawen Zhu, Dong Wang, Xiaoyun Yang, Huchuan Lu. CVPR 2021
- Keyword: Attention-based Feature Fusion; Siamese-like.
- Digest
  The correlation operation is a simple fusion manner to consider the similarity between the template and the search region. However, the correlation operation itself is a local linear matching process, leading to lose semantic information and fall into local optimum easily, which may be the bottleneck of designing high-accuracy tracking algorithms. To address this issue, this work presents a novel attention-based feature fusion network, which effectively combines the template and search region features solely using attention. Specifically, the proposed method includes an ego-context augment module based on self-attention and a cross-feature augment module based on cross-attention. Finally, we present a Transformer tracking (named TransT) method based on the Siamese-like feature extraction backbone, the designed attention-based fusion mechanism, and the classification and regression head.
- Framework
Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking. [Paper] [Code]
- Ning Wang, Wengang Zhou, Jie Wang, Houqaing Li. CVPR 2021 Oral
- Keyword: Temporal Context; Siamese-like.
- Digest
  In this work, we bridge the individual video frames and explore the temporal contexts across them via a transformer architecture for robust object tracking. Different from classic usage of the transformer in natural language processing tasks, we separate its encoder and decoder into two parallel branches and carefully design them within the Siamese-like tracking pipelines. The transformer encoder promotes the target templates via attention-based feature reinforcement, which benefits the high-quality tracking model generation. The transformer decoder propagates the tracking cues from previous templates to the current frame, which facilitates the object searching process. Our transformer-assisted tracking framework is neat and trained in an end-to-end manner. With the proposed transformer, a simple Siamese matching approach is able to outperform the current top-performing trackers.
- Framework

Robust Tracking

Robust Tracking 2022

DIMBA: Discretely Masked Black-Box Attack in Single Object Tracking. [paper]
- Xiangyu Yin, Wenjie Ruan, Jonathan Fieldsend. Arxiv 2022/08
- Keyword: Black-Box Attack; Reinforcement Learning.
- Digest
  Despite extensive studies focusing on the robustness of image, audio, and NLP, works on adversarial examples of visual object tracking -- especially in a black-box manner -- are quite lacking. In this paper, we propose a novel adversarial attack method to generate noises for single object tracking under black-box settings, where perturbations are merely added on initial frames of tracking sequences, which is difficult to be noticed from the perspective of a whole video clip. Specifically, we divide our algorithm into three components and exploit reinforcement learning for localizing important frame patches precisely while reducing unnecessary computational queries overhead. Compared to existing techniques, our method requires fewer queries on initialized frames of a video to manipulate competitive or even better attack performance.
- Framework
Efficient universal shuffle attack for visual object tracking. [paper]
- Siao Liu, Zhaoyu Chen, Wei Li, Jiwei Zhu, Jiafeng Wang, Wenqiang Zhang, Zhongxue Gan. ICASSP 2022
- Keyword: Shuffle Attack.
- Digest
  Previous work only generates the video-specific perturbations, which restricts its application scenarios. In addition, existing attacks are difficult to implement in reality due to the real-time of tracking and the re-initialization mechanism. To address these issues, we propose an offline universal adversarial attack called Efficient Universal Shuffle Attack. It takes only one perturbation to cause the tracker malfunction on all videos. To improve the computational efficiency and attack performance, we propose a greedy gradient strategy and a triple loss to efficiently capture and attack model-specific feature representations through the gradients.
- Framework
Ad2Attack: Adaptive Adversarial Attack on Real-Time UAV Tracking. [paper] [code]
- Changhong Fu, Sihang Li, Xinnan Yuan, Junjie Ye, Ziang Cao, Fangqiang Ding. ICRA 2022
- Keyword: Adversarial Attack; UAV Tracking.
- Digest
  Adding imperceptible perturbations can easily fool the tracker and cause tracking failures. This risk is often overlooked and rarely researched at present. Therefore, to help increase awareness of the potential risk and the robustness of UAV tracking, this work proposes a novel adaptive adversarial attack approach, i.e., Ad2Attack, against UAV object tracking. Specifically, adversarial examples are generated online during the resampling of the search patch image, which leads trackers to lose the target in the following frames. Ad2Attack is composed of a direct downsampling module and a super-resolution upsampling module with adaptive stages. A novel optimization function is proposed for balancing the imperceptibility and efficiency of the attack.
- Framework
Few-Shot Backdoor Attacks on Visual Object Tracking. [paper] [code]
- Yiming Li, Haoxiang Zhong, Xingjun Ma, Yong Jiang, Shu-Tao Xia. ICLR 2022
- Keyword: Backdoor Attack; Few-shot Learning.
- Digest
  We reveal such a threat where an adversary can easily implant hidden backdoors into VOT models by tempering with the training process. Specifically, we propose a simple yet effective few-shot backdoor attack (FSBA) that optimizes two losses alternately: 1) a feature loss defined in the hidden feature space, and 2) the standard tracking loss. We show that, once the backdoor is embedded into the target model by our FSBA, it can trick the model to lose track of specific objects even when the trigger only appears in one or a few frames. We examine our attack in both digital and physical-world settings and show that it can significantly degrade the performance of state-of-the-art VOT trackers. We also show that our attack is resistant to potential defenses, highlighting the vulnerability of VOT models to potential backdoor attacks.
- Framework

Robust Tracking 2021

IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking. [paper] [code]
- Shuai Jia, Yibing Song, Chao Ma, Xiaokang Yang. CVPR 2021
- Keyword: IoU Attack; Black-box Attack.
- Digest
  We propose a decision-based black-box attack method for visual object tracking. In contrast to existing black-box adversarial attack methods that deal with static images for image classification, we propose IoU attack that sequentially generates perturbations based on the predicted IoU scores from both current and historical frames. By decreasing the IoU scores, the proposed attack method degrades the accuracy of temporal coherent bounding boxes (i.e., object motions) accordingly. In addition, we transfer the learned perturbations to the next few frames to initialize temporal motion attack. We validate the proposed IoU attack on state-of-the-art deep trackers (i.e., detection based, correlation filter based, and long-term trackers).
- Framework

Robust Tracking 2020

Efficient Adversarial Attacks for Visual Object Tracking. [paper]
- Siyuan Liang, Xingxing Wei, Siyuan Yao, Xiaochun Cao. ECCV 2020
- Keyword: Adversarial Attack; Efficiency.
- Digest
  In this paper, we analyze the weakness of object trackers based on the Siamese network and then extend adversarial examples to visual object tracking. We present an end-to-end network FAN (Fast Attack Network) that uses a novel drift loss combined with the embedded feature loss to attack the Siamese network based trackers. Under a single GPU, FAN is efficient in the training speed and has a strong attack performance. The FAN can generate an adversarial example at 10ms, achieve effective targeted attack (at least 40% drop rate on OTB) and untargeted attack (at least 70% drop rate on OTB).
- Framework

Point Clouds Tracking

Point Clouds Tracking 2022

Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer. [paper] [code]
- Zhipeng Luo, Changqing Zhou, Liang Pan, Gongjie Zhang, Tianrui Liu, Yueru Luo, Haiyu Zhao, Ziwei Liu, Shijian Lu. Arxiv 2022/08
- Keyword: 3D SOT; Bird's-Eye View.
- Digest
  PTTR consists of three novel designs. 1) Instead of random sampling, we design Relation-Aware Sampling to preserve relevant points to the given template during subsampling. 2) We propose a Point Relation Transformer for effective feature aggregation and feature matching between the template and search region. 3) Based on the coarse tracking results, we employ a novel Prediction Refinement Module to obtain the final refined prediction through local feature pooling. In addition, motivated by the favorable properties of the Bird's-Eye View (BEV) of point clouds in capturing object motion, we further design a more advanced framework named PTTR++, which incorporates both the point-wise view and BEV representation to exploit their complementary effect in generating high-quality tracking results.
- Framework
3D Siamese Transformer Network for Single Object Tracking on Point Clouds. [paper] [code]
- Le Hui, Lingpeng Wang, Linghua Tang, Kaihao Lan, Jin Xie, Jian Yang. ECCV 2022
- Keyword: 3D SOT; Point Clouds; Siamese-like.
- Digest
  In this paper, we explicitly use Transformer to form a 3D Siamese Transformer network for learning robust cross correlation between the template and the search area of point clouds. Specifically, we develop a Siamese point Transformer network to learn shape context information of the target. Its encoder uses self-attention to capture non-local information of point clouds to characterize the shape information of the object, and the decoder utilizes cross-attention to upsample discriminative point features. After that, we develop an iterative coarse-to-fine correlation network to learn the robust cross correlation between the template and the search area. It formulates the cross-feature augmentation to associate the template with the potential target in the search area via cross attention.
- Framework
Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds. [paper] [code]
- Chaoda Zheng, Xu Yan, Haiming Zhang, Baoyuan Wang, Shenghui Cheng, Shuguang Cui, Zhen Li. CVPR 2022 Oral
- Keyword: 3D SOT; Point Clouds; Motion-centric.
- Digest
  LiDAR point clouds are usually textureless and incomplete, which hinders effective appearance matching. Besides, previous methods greatly overlook the critical motion clues among targets. In this work, beyond 3D Siamese tracking, we introduce a motion-centric paradigm to handle 3D SOT from a new perspective. Following this paradigm, we propose a matching-free two-stage tracker M^2-Track. At the 1^st-stage, M^2-Track localizes the target within successive frames via motion transformation. Then it refines the target box through motion-assisted shape completion at the 2^nd-stage.
- Framework

Point Clouds Tracking 2021

Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds. [paper] [code]
- Chaoda Zheng, Xu Yan, Jiantao Gao, Weibing Zhao, Wei Zhang, Zhen Li, Shuguang Cui. ICCV 2021
- Keyword: 3D SOT; Point Clouds; Box-aware Feature Enhancement.
- Digest
  Due to the common occlusion in LiDAR scans, it is non-trivial to conduct accurate feature comparisons on severe sparse and incomplete shapes. In this work, we exploit the ground truth bounding box given in the first frame as a strong cue to enhance the feature description of the target object, enabling a more accurate feature comparison in a simple yet effective way. In particular, we first propose the BoxCloud, an informative and robust representation, to depict an object using the point-to-box relation. We further design an efficient box-aware feature fusion module, which leverages the aforementioned BoxCloud for reliable feature matching and embedding. Integrating the proposed general components into an existing model P2B, we construct a superior box-aware tracker (BAT).
- Framework

Point Clouds Tracking 2020

P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds. [paper] [code]
- Haozhe Qi, Chen Feng, Zhiguo Cao, Feng Zhao, Yang Xiao. CVPR 2020 Oral
- Keyword: 3D SOT; Point Clouds; Point-to-box.
- Digest
  A novel point-to-box network termed P2B is proposed in an end-to-end learning manner. Our main idea is to first localize potential target centers in 3D search area embedded with target information. Then point-driven 3D target proposal and verification are executed jointly. In this way, the time-consuming 3D exhaustive search can be avoided. Specifically, we first sample seeds from the point clouds in template and search area respectively. Then, we execute permutation-invariant feature augmentation to embed target clues from template into search area seeds and represent them with target-specific features. Consequently, the augmented search area seeds regress the potential target centers via Hough voting. The centers are further strengthened with seed-wise targetness scores. Finally, each center clusters its neighbors to leverage the ensemble power for joint 3D target proposal and verification.
- Framework

Point Clouds Tracking 2019

Leveraging Shape Completion for 3D Siamese Tracking. [paper] [code]
- Silvio Giancola, Jesus Zarzar, Bernard Ghanem. CVPR 2019
- Keyword: 3D SOT; Point Clouds; Shape Completion; Siamese-like.
- Digest
  In this paper, we investigate the versatility of Shape Completion for 3D Object Tracking in LIDAR point clouds. We design a Siamese tracker that encodes model and candidate shapes into a compact latent representation. We regularize the encoding by enforcing the latent representation to decode into an object model shape. We observe that 3D object tracking and 3D shape completion complement each other. Learning a more meaningful latent representation shows better discriminatory capabilities, leading to improved tracking performance.
- Framework

Multi-modal Tracking

Vision-Language Tracking

Divert More Attention to Vision-Language Tracking. [paper] [code]
- Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing. NeurIPS 2022
- Keyword: Make ConvNet Great Again; Modality Mixer.
- Digest
  In this paper, we demonstrate that the Transformer-reliance is not necessary and the pure ConvNets are still competitive and even better yet more economical and friendly in achieving SOTA tracking. Our solution is to unleash the power of multimodal vision-language (VL) tracking, simply using ConvNets. The essence lies in learning novel unified-adaptive VL representations with our modality mixer (ModaMixer) and asymmetrical ConvNet search. We show that our unified-adaptive VL representation, learned purely with the ConvNets, is a simple yet strong alternative to Transformer visual features, by unbelievably improving a CNN-based Siamese tracker by 14.5% in SUC on challenging LaSOT (50.7% > 65.2%), even outperforming several Transformer-based SOTA trackers. Besides empirical results, we theoretically analyze our approach to evidence its effectiveness. By revealing the potential of VL representation, we expect the community to divert more attention to VL tracking and hope to open more possibilities for future tracking beyond Transformer.
- Framework

RGB-Depth Tracking

Visual Prompt Multi-Modal Tracking. [paper] [code]
- Jiawen Zhu, Simiao Lai, Xin Chen, Dong Wang, Huchuan Lu. CVPR 2023
- Keyword: Prompt, Multi-Modal.
- Digest
  To inherit the powerful representations of the foundation model, a natural modus operandi for multi-modal tracking is full fine-tuning on the RGB-based parameters. Albeit effective, this manner is not optimal due to the scarcity of downstream data and poor transferability, etc. In this paper, inspired by the recent success of the prompt learning in language models, we develop Visual Prompt multi-modal Tracking (ViPT), which learns the modal-relevant prompts to adapt the frozen pre-trained foundation model to various downstream multimodal tracking tasks. ViPT finds a better way to stimulate the knowledge of the RGB-based model that is pre-trained at scale, meanwhile only introducing a few trainable parameters (less than 1% of model parameters). ViPT outperforms the full fine-tuning paradigm on multiple downstream tracking tasks including RGB+Depth, RGB+Thermal, and RGB+Event tracking.
- Framework
RGBD1K: A Large-scale Dataset and Benchmark for RGB-D Object Tracking. [paper] [project]
- Xue-Feng Zhu, Tianyang Xu, Zhangyong Tang, Zucheng Wu, Haodong Liu, Xiao Yang, Xiao-Jun Wu, Josef Kittler. AAAI 2023
- Keyword: Large-scal RGB-D Dataset.
- Digest
  To address the dataset deficiency issue, a new RGB-D dataset named RGBD1K is released in this paper. The RGBD1K contains 1,050 sequences with about 2.5M frames in total. To demonstrate the benefits of training on a larger RGB-D data set in general, and RGBD1K in particular, we develop a transformer-based RGB-D tracker, named SPT, as a baseline for future visual object tracking studies using the new dataset.
- Framework
Prompting for Multi-Modal Tracking. [paper] [result]
- Jinyu Yang, Zhe Li, Feng Zheng, Aleš Leonardis, Jingkuan Song. ACMMM 2022
- Keyword: Prompt, Multi-Modal, Early Fusion.
- Digest
  Multi-modal tracking gains attention due to its ability to be more accurate and robust in complex scenarios compared to traditional RGB-based tracking. Its key lies in how to fuse multi-modal data and reduce the gap between modalities. However, multi-modal tracking still severely suffers from data deficiency, thus resulting in the insufficient learning of fusion modules. Instead of building such a fusion module, in this paper, we provide a new perspective on multi-modal tracking by attaching importance to the multi-modal visual prompts. We design a novel multi-modal prompt tracker (ProTrack), which can transfer the multi-modal inputs to a single modality by the prompt paradigm. By best employing the tracking ability of pre-trained RGB trackers learning at scale, our ProTrack can achieve high-performance multi-modal tracking by only altering the inputs, even without any extra training on multi-modal data. Extensive experiments on 5 benchmark datasets demonstrate the effectiveness of the proposed ProTrack.
- Framework
DepthTrack : Unveiling the Power of RGBD Tracking. [paper] [result]
- Song Yan, Jinyu Yang, Jani Käpylä, Feng Zheng, Aleš Leonardis, Joni-Kristian Kämäräinen. ICCV 2021
- Keyword: Multi-Modal, Late Fusion.
- Digest
  The best RGBD trackers are extensions of the state-of-the-art deep RGB trackers. They are trained with RGB data and the depth channel is used as a sidekick for subtleties such as occlusion detection. This can be explained by the fact that there are no sufficiently large RGBD datasets to 1) train deep depth trackers and to 2) challenge RGB trackers with sequences for which the depth cue is essential. This work introduces a new RGBD tracking dataset - Depth-Track - that has twice as many sequences (200) and scene types (40) than in the largest existing dataset, and three times more objects (90). In addition, the average length of the sequences (1473), the number of deformable objects (16) and the number of annotated tracking attributes (15) have been increased. Furthermore, by running the SotA RGB and RGBD trackers on DepthTrack, we propose a new RGBD tracking baseline, namely DeT, which reveals that deep RGBD tracking indeed benefits from genuine training data.
- Framework

RGB-Thermal Tracking

Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline. [paper] [code]
- Pengyu Zhang, Jie Zhao, Dong Wang, Huchuan Lu, Xiang Ruan. CVPR 2022
- Keyword: Large-scal RGB-T Dataset.
- Digest
  In this paper, we construct a large-scale benchmark with high diversity for visible-thermal UAV tracking (VTUAV), including 500 sequences with 1.7 million high-resolution (1920 × 1080 pixels) frame pairs. In addition, comprehensive applications (short-term tracking, long-term tracking and segmentation mask prediction) with diverse categories and scenes are considered for exhaustive evaluation. Moreover, we provide a coarse-to-fine attribute annotation, where frame-level attributes are provided to exploit the potential of challenge-specific trackers. In addition, we design a new RGB-T baseline, named Hierarchical Multi-modal Fusion Tracker (HMFT), which fuses RGB-T data in various levels.
- Framework
Siamese
- 2021
  - STMTrack: Fu, Zhihong and Liu, Qingjie and Fu, Zehua and Wang, Yunhong. STMTrack: Template-Free Visual Tracking With Space-Time Memory Networks. In CVPR 2021. [Paper] [Code]
  - LightTrack: Yan, Bin and Peng, Houwen and Wu, Kan and Wang, Dong and Fu, Jianlong and Lu, Huchuan. LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search. In CVPR 2021. [Paper] [Code]
  - Alpha-Refine: Yan, Bin and Zhang, Xinyu and Wang, Dong and Lu, Huchuan and Yang, Xiaoyun. Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation. In CVPR 2021. [Paper] [Code]
- 2020
  - Siam R-CNN: Paul Voigtlaender, Jonathon Luiten, Philip H.S. Torr, Bastian Leibe. Siam R-CNN: Visual Tracking by Re-Detection. In CVPR 2020. Paper [Code]
- 2019
  - SiamRpn++: Li, Bo, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, and Junjie Yan. Siamrpn++: Evolution of siamese visual tracking with very deep networks. In CVPR 2019. [Paper] [Project]
- 2017
  - CFNet: Valmadre, Jack, Luca Bertinetto, Joao Henriques, Andrea Vedaldi, and Philip HS Torr. End-to-end representation learning for correlation filter based tracking. In CVPR 2017. [Paper] [Project]
- 2016
  - SiamFC: Bertinetto, Luca, Jack Valmadre, Joao F. Henriques, Andrea Vedaldi, and Philip HS Torr. Fully-convolutional siamese networks for object tracking. In ECCV 2016. [Paper] [Project]
  RGB Paper
  
  CVPR2020

MAML: Guangting Wang, Chong Luo, Xiaoyan Sun, Zhiwei Xiong, Wenjun Zeng.
"Tracking by Instance Detection: A Meta-Learning Approach." CVPR (2020 Oral). [paper]
Siam R-CNN: Paul Voigtlaender, Jonathon Luiten, Philip H.S. Torr, Bastian Leibe.
"Siam R-CNN: Visual Tracking by Re-Detection." CVPR (2020). [paper] [code]
D3S: Alan Lukežič, Jiří Matas, Matej Kristan.
"D3S – A Discriminative Single Shot Segmentation Tracker." CVPR (2020). [paper] [code]
PrDiMP: Martin Danelljan, Luc Van Gool, Radu Timofte.
"Probabilistic Regression for Visual Tracking." CVPR (2020). [paper] [code]
ROAM: Tianyu Yang, Pengfei Xu, Runbo Hu, Hua Chai, Antoni B. Chan.
"ROAM: Recurrently Optimizing Tracking Model." CVPR (2020). [paper]
AutoTrack: Yiming Li, Changhong Fu, Fangqiang Ding, Ziyuan Huang, Geng Lu.
"AutoTrack: Towards High-Performance Visual Tracking for UAV with Automatic Spatio-Temporal Regularization." CVPR (2020). [paper] [code]
SiamBAN: Zedu Chen, Bineng Zhong, Guorong Li, Shengping Zhang, Rongrong Ji.
"Siamese Box Adaptive Network for Visual Tracking." CVPR (2020). [paper] [code]
SiamAttn: Yuechen Yu, Yilei Xiong, Weilin Huang, Matthew R. Scott.
"Deformable Siamese Attention Networks for Visual Object Tracking." CVPR (2020). [paper]
CGACD: Fei Du, Peng Liu, Wei Zhao, Xianglong Tang.
"Correlation-Guided Attention for Corner Detection Based Visual Tracking." CVPR (2020).

AAAI 2020

SiamFC++: Yinda Xu, Zeyu Wang, Zuoxin Li, Ye Yuan, Gang Yu.
"SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines." AAAI (2020). [paper] [code]

ICCV2019

DiMP: Goutam Bhat, Martin Danelljan, Luc Van Gool, Radu Timofte.
"Learning Discriminative Model Prediction for Tracking." ICCV (2019 oral). [paper] [code]
GradNet: Peixia Li, Boyu Chen, Wanli Ouyang, Dong Wang, Xiaoyun Yang, Huchuan Lu.
"GradNet: Gradient-Guided Network for Visual Object Tracking." ICCV (2019 oral). [paper] [code]
MLT: Janghoon Choi, Junseok Kwon, Kyoung Mu Lee.
"Deep Meta Learning for Real-Time Target-Aware Visual Tracking." ICCV (2019). [paper]
SPLT: Bin Yan, Haojie Zhao, Dong Wang, Huchuan Lu, Xiaoyun Yang
"'Skimming-Perusal' Tracking: A Framework for Real-Time and Robust Long-Term Tracking." ICCV (2019). [paper] [code]
ARCF: Ziyuan Huang, Changhong Fu, Yiming Li, Fuling Lin, Peng Lu.
"Learning Aberrance Repressed Correlation Filters for Real-Time UAV Tracking." ICCV (2019). [paper] [code]
Lianghua Huang, Xin Zhao, Kaiqi Huang.
"Bridging the Gap Between Detection and Tracking: A Unified Approach." ICCV (2019). [paper]
UpdateNet: Lichao Zhang, Abel Gonzalez-Garcia, Joost van de Weijer, Martin Danelljan, Fahad Shahbaz Khan.
"Learning the Model Update for Siamese Trackers." ICCV (2019). [paper] [code]
PAT: Rey Reza Wiyatno, Anqi Xu.
"Physical Adversarial Textures That Fool Visual Object Tracking." ICCV (2019). [paper]
GFS-DCF: Tianyang Xu, Zhen-Hua Feng, Xiao-Jun Wu, Josef Kittler.
"Joint Group Feature Selection and Discriminative Filter Learning for Robust Visual Object Tracking." ICCV (2019). [paper] [code]
CDTB: Alan Lukežič, Ugur Kart, Jani Käpylä, Ahmed Durmush, Joni-Kristian Kämäräinen, Jiří Matas, Matej Kristan.

"CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark." ICCV (2019). [paper]
VOT2019: Kristan, Matej, et al.
"The Seventh Visual Object Tracking VOT2019 Challenge Results." ICCV workshops (2019). [paper]

CVPR2019

SiamMask: Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu, Philip H.S. Torr.
"Fast Online Object Tracking and Segmentation: A Unifying Approach." CVPR (2019). [paper] [project] [code]
SiamRPN++: Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, Junjie Yan.
"SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks." CVPR (2019 oral). [paper] [project]
ATOM: Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, Michael Felsberg.
"ATOM: Accurate Tracking by Overlap Maximization." CVPR (2019 oral). [paper] [code]
SiamDW: Zhipeng Zhang, Houwen Peng.
"Deeper and Wider Siamese Networks for Real-Time Visual Tracking." CVPR (2019 oral). [paper] [code]
GCT: Junyu Gao, Tianzhu Zhang, Changsheng Xu.
"Graph Convolutional Tracking." CVPR (2019 oral). [paper] [code]
ASRCF: Kenan Dai, Dong Wang, Huchuan Lu, Chong Sun, Jianhua Li.
"Visual Tracking via Adaptive Spatially-Regularized Correlation Filters." CVPR (2019 oral). [paper] [code]
UDT: Ning Wang, Yibing Song, Chao Ma, Wengang Zhou, Wei Liu, Houqiang Li.
"Unsupervised Deep Tracking." CVPR (2019). [paper] [code]
TADT: Xin Li, Chao Ma, Baoyuan Wu, Zhenyu He, Ming-Hsuan Yang.
"Target-Aware Deep Tracking." CVPR (2019). [paper] [project] [code]
C-RPN: Heng Fan, Haibin Ling.
"Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking." CVPR (2019). [paper]
SPM: Guangting Wang, Chong Luo, Zhiwei Xiong, Wenjun Zeng.
"SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking." CVPR (2019). [paper]
OTR: Ugur Kart, Alan Lukezic, Matej Kristan, Joni-Kristian Kamarainen, Jiri Matas.
"Object Tracking by Reconstruction with View-Specific Discriminative Correlation Filters." CVPR (2019). [paper] [code]
RPCF: Yuxuan Sun, Chong Sun, Dong Wang, Huchuan Lu, You He.
"ROI Pooled Correlation Filters for Visual Tracking." CVPR (2019). [paper]
LaSOT: Heng Fan, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, Haibin Ling.
"LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking." CVPR (2019). [paper] [project]

AAAI2019

LDES: Yang Li, Jianke Zhu, Steven C.H. Hoi, Wenjie Song, Zhefeng Wang, Hantang Liu.
"Robust Estimation of Similarity Transformation for Visual Object Tracking." AAAI (2019). [paper] [code]

NIPS2018

DAT: Shi Pu, Yibing Song, Chao Ma, Honggang Zhang, Ming-Hsuan Yang.
"Deep Attentive Tracking via Reciprocative Learning." NIPS (2018). [paper] [project] [code]

ECCV2018

UPDT: Goutam Bhat, Joakim Johnander, Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg.
"Unveiling the Power of Deep Tracking." ECCV (2018). [paper]
DaSiamRPN: Zheng Zhu, Qiang Wang, Bo Li, Wu Wei, Junjie Yan, Weiming Hu.
"Distractor-aware Siamese Networks for Visual Object Tracking." ECCV (2018). [paper] [github]
SACF: Mengdan Zhang, Qiang Wang, Junliang Xing, Jin Gao, Peixi Peng, Weiming Hu, Steve Maybank.
"Visual Tracking via Spatially Aligned Correlation Filters Network." ECCV (2018). [paper]
RTINet: Yingjie Yao, Xiaohe Wu, Lei Zhang, Shiguang Shan, Wangmeng Zuo.
"Joint Representation and Truncated Inference Learning for Correlation Filter based Tracking." ECCV (2018). [paper]
Meta-Tracker: Eunbyung Park, Alexander C. Berg.
"Meta-Tracker: Fast and Robust Online Adaptation for Visual Object Trackers." [paper] [github]
DSLT: Xiankai Lu, Chao Ma*, Bingbing Ni, Xiaokang Yang, Ian Reid, Ming-Hsuan Yang.
"Deep Regression Tracking with Shrinkage Loss." ECCV (2018). [paper] [github]
DRL-IS: Liangliang Ren, Xin Yuan, Jiwen Lu, Ming Yang, Jie Zhou.
"Deep Reinforcement Learning with Iterative Shift for Visual Tracking." ECCV (2018). [paper]
RT-MDNet: Ilchae Jung, Jeany Son, Mooyeol Baek, Bohyung Han.
"Real-Time MDNet." ECCV (2018). [paper]
ACT: Boyu Chen, Dong Wang, Peixia Li, Huchuan Lu.
"Real-time 'Actor-Critic' Tracking." ECCV (2018). [paper] [github]
StructSiam: Yunhua Zhang, Lijun Wang, Dong Wang, Mengyang Feng, Huchuan Lu, Jinqing Qi.
"Structured Siamese Network for Real-Time Visual Tracking." ECCV (2018). [paper]
MemTrack: Tianyu Yang, Antoni B. Chan.
"Learning Dynamic Memory Networks for Object Tracking." ECCV (2018). [paper]
SiamFC-tri: Xingping Dong, Jianbing Shen.
"Triplet Loss in Siamese Network for Object Tracking." ECCV (2018). [paper] [github]
OxUvA long-term dataset+benchmark: Jack Valmadre, Luca Bertinetto, João F. Henriques, Ran Tao, Andrea Vedaldi, Arnold Smeulders, Philip Torr, Efstratios Gavves.
"Long-term Tracking in the Wild: a Benchmark." ECCV (2018). [paper] [project]
TrackingNet: Matthias Müller, Adel Bibi, Silvio Giancola, Salman Al-Subaihi, Bernard Ghanem.
"TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild." ECCV (2018). [paper] [project]

CVPR2018

VITAL: Yibing Song, Chao Ma, Xiaohe Wu, Lijun Gong, Linchao Bao, Wangmeng Zuo, Chunhua Shen, Rynson Lau, and Ming-Hsuan Yang. "VITAL: VIsual Tracking via Adversarial Learning." CVPR (2018 Spotlight). [project] [paper] [github]
LSART: Chong Sun, Dong Wang, Huchuan Lu, Ming-Hsuan Yang. "Learning Spatial-Aware Regressions for Visual Tracking." CVPR (2018 Spotlight). [paper]
SiamRPN: Bo Li, Wei Wu, Zheng Zhu, Junjie Yan. "High Performance Visual Tracking with Siamese Region Proposal Network." CVPR (2018 Spotlight). [paper]
TRACA: Jongwon Choi, Hyung Jin Chang, Tobias Fischer, Sangdoo Yun, Kyuewang Lee, Jiyeoup Jeong, Yiannis Demiris, Jin Young Choi. "Context-aware Deep Feature Compression for High-speed Visual Tracking." CVPR (2018). [project] [paper]
RASNet: Qiang Wang, Zhu Teng, Junliang Xing, Jin Gao, Weiming Hu, Stephen Maybank. "Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking." CVPR 2018. [paper]
SA-Siam: Anfeng He, Chong Luo, Xinmei Tian, Wenjun Zeng. "A Twofold Siamese Network for Real-Time Object Tracking." CVPR (2018). [paper]
STRCF: Feng Li, Cheng Tian, Wangmeng Zuo, Lei Zhang, Ming-Hsuan Yang. "Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking." CVPR (2018). [paper] [github]
FlowTrack: Zheng Zhu, Wei Wu, Wei Zou, Junjie Yan. "End-to-end Flow Correlation Tracking with Spatial-temporal Attention." CVPR (2018). [paper]
DEDT: Kourosh Meshgi, Shigeyuki Oba, Shin Ishii. "Efficient Diverse Ensemble for Discriminative Co-Tracking." CVPR (2018). [paper]
SINT++: Xiao Wang, Chenglong Li, Bin Luo, Jin Tang. "SINT++: Robust Visual Tracking via Adversarial Positive Instance Generation." CVPR (2018). [paper]
DRT: Chong Sun, Dong Wang, Huchuan Lu, Ming-Hsuan Yang. "Correlation Tracking via Joint Discrimination and Reliability Learning." CVPR (2018). [paper]
MCCT: Ning Wang, Wengang Zhou, Qi Tian, Richang Hong, Meng Wang, Houqiang Li. "Multi-Cue Correlation Filters for Robust Visual Tracking." CVPR (2018). [paper] [github]
MKCF: Ming Tang, Bin Yu, Fan Zhang, Jinqiao Wang. "High-speed Tracking with Multi-kernel Correlation Filters." CVPR (2018). [paper]
HP: Xingping Dong, Jianbing Shen, Wenguan Wang, Yu, Liu, Ling Shao, and Fatih Porikli. "Hyperparameter Optimization for Tracking with Continuous Deep Q-Learning." CVPR (2018). [paper]

NIPS2017

HART: Adam R. Kosiorek, Alex Bewley, Ingmar Posner. "Hierarchical Attentive Recurrent Tracking." NIPS (2017). [paper] [github]

ICCV2017

CREST: Yibing Song, Chao Ma, Lijun Gong, Jiawei Zhang, Rynson Lau, Ming-Hsuan Yang. "CREST: Convolutional Residual Learning for Visual Tracking." ICCV (2017 Spotlight). [paper] [project] [github]
EAST: Chen Huang, Simon Lucey, Deva Ramanan. "Learning Policies for Adaptive Tracking with Deep Feature Cascades." ICCV (2017 Spotlight). [paper] [supp]
PTAV: Heng Fan and Haibin Ling. "Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking." ICCV (2017). [paper] [supp] [project] [code]
BACF: Hamed Kiani Galoogahi, Ashton Fagg, Simon Lucey. "Learning Background-Aware Correlation Filters for Visual Tracking." ICCV (2017). [paper] [supp] [code] [project]
TSN: Zhu Teng, Junliang Xing, Qiang Wang, Congyan Lang, Songhe Feng and Yi Jin. "Robust Object Tracking based on Temporal and Spatial Deep Networks." ICCV (2017). [paper]
p-tracker: James Supančič, III; Deva Ramanan. "Tracking as Online Decision-Making: Learning a Policy From Streaming Videos With Reinforcement Learning." ICCV (2017). [paper] [supp]
DSiam: Qing Guo; Wei Feng; Ce Zhou; Rui Huang; Liang Wan; Song Wang. "Learning Dynamic Siamese Network for Visual Object Tracking." ICCV (2017). [paper] [github]
SP-KCF: Xin Sun; Ngai-Man Cheung; Hongxun Yao; Yiluan Guo. "Non-Rigid Object Tracking via Deformable Patches Using Shape-Preserved KCF and Level Sets." ICCV (2017). [paper]
UCT: Zheng Zhu, Guan Huang, Wei Zou, Dalong Du, Chang Huang. "UCT: Learning Unified Convolutional Networks for Real-Time Visual Tracking." ICCV workshop (2017). [paper]
Tobias Bottger, Patrick Follmann. "The Benefits of Evaluating Tracker Performance Using Pixel-Wise Segmentations." ICCV workshop (2017). [paper]
CFWCR: Zhiqun He, Yingruo Fan, Junfei Zhuang, Yuan Dong, HongLiang Bai. "Correlation Filters With Weighted Convolution Responses." ICCV workshop (2017). [paper] [github]
IBCCF: Feng Li, Yingjie Yao, Peihua Li, David Zhang, Wangmeng Zuo, Ming-Hsuan Yang. "Integrating Boundary and Center Correlation Filters for Visual Tracking With Aspect Ratio Variation." ICCV workshop (2017). [paper] [github]
RFL: Tianyu Yang, Antoni B. Chan. "Recurrent Filter Learning for Visual Tracking." ICCV workshop (2017). [paper]

CVPR2017

ECO: Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, Michael Felsberg. "ECO: Efficient Convolution Operators for Tracking." CVPR (2017). [paper] [supp] [project] [github]
CFNet: Jack Valmadre, Luca Bertinetto, João F. Henriques, Andrea Vedaldi, Philip H. S. Torr. "End-to-end representation learning for Correlation Filter based tracking." CVPR (2017). [paper] [supp] [project] [github]
CACF: Matthias Mueller, Neil Smith, Bernard Ghanem. "Context-Aware Correlation Filter Tracking." CVPR (2017 oral). [paper] [supp] [project] [code]
RaF: Le Zhang, Jagannadan Varadarajan, Ponnuthurai Nagaratnam Suganthan, Narendra Ahuja and Pierre Moulin "Robust Visual Tracking Using Oblique Random Forests." CVPR (2017). [paper] [supp] [project] [code]
MCPF: Tianzhu Zhang, Changsheng Xu, Ming-Hsuan Yang. "Multi-Task Correlation Particle Filter for Robust Object Tracking." CVPR (2017). [paper] [project] [code]
ACFN: Jongwon Choi, Hyung Jin Chang, Sangdoo Yun, Tobias Fischer, Yiannis Demiris, and Jin Young Choi. "Attentional Correlation Filter Network for Adaptive Visual Tracking." CVPR (2017). [paper] [supp] [project] [test code] [training code]
LMCF: Mengmeng Wang, Yong Liu, Zeyi Huang. "Large Margin Object Tracking with Circulant Feature Maps." CVPR (2017). [paper] [zhihu]
ADNet: Sangdoo Yun, Jongwon Choi, Youngjoon Yoo, Kimin Yun, Jin Young Choi. "Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning." CVPR (2017 Spotlight). [paper] [supp] [project]
CSR-DCF: Alan Lukežič, Tomáš Vojíř, Luka Čehovin, Jiří Matas, Matej Kristan. "Discriminative Correlation Filter with Channel and Spatial Reliability." CVPR (2017). [paper] [supp] [code]
BranchOut: Bohyung Han, Jack Sim, Hartwig Adam. "BranchOut: Regularization for Online Ensemble Tracking with Convolutional Neural Networks." CVPR (2017). [paper]
AMCT: Donghun Yeo, Jeany Son, Bohyung Han, Joonhee Han. "Superpixel-based Tracking-by-Segmentation using Markov Chains." CVPR (2017). [paper]
SANet: Heng Fan, Haibin Ling. "SANet: Structure-Aware Network for Visual Tracking." CVPRW (2017). [paper] [project] [code]

ECCV2016

SiameseFC: Luca Bertinetto, Jack Valmadre, João F. Henriques, Andrea Vedaldi, Philip H.S. Torr. "Fully-Convolutional Siamese Networks for Object Tracking." ECCV workshop (2016). [paper] [project] [github]
GOTURN: David Held, Sebastian Thrun, Silvio Savarese. "Learning to Track at 100 FPS with Deep Regression Networks." ECCV (2016). [paper] [project] [github]
C-COT: Martin Danelljan, Andreas Robinson, Fahad Khan, Michael Felsberg. "Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking." ECCV (2016). [paper] [project] [github]
CF+AT: Adel Bibi, Matthias Mueller, and Bernard Ghanem. "Target Response Adaptation for Correlation Filter Tracking." ECCV (2016). [paper] [project] [github]
Yao Sui, Ziming Zhang, Guanghui Wang, Yafei Tang, Li Zhang. "Real-Time Visual Tracking: Promoting the Robustness of Correlation Filter Learning." ECCV (2016). [paper]
Yao Sui, Guanghui Wang, Yafei Tang, Li Zhang. "Tracking Completion." ECCV (2016). [paper]

CVPR2016

MDNet: Nam, Hyeonseob, and Bohyung Han. "Learning Multi-Domain Convolutional Neural Networks for Visual Tracking." CVPR (2016). [paper] [VOT_presentation] [project] [github]
SINT: Ran Tao, Efstratios Gavves, Arnold W.M. Smeulders. "Siamese Instance Search for Tracking." CVPR (2016). [paper] [project]
SCT: Jongwon Choi, Hyung Jin Chang, Jiyeoup Jeong, Yiannis Demiris, and Jin Young Choi. "Visual Tracking Using Attention-Modulated Disintegration and Integration." CVPR (2016). [paper] [project]
STCT: Lijun Wang, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "STCT: Sequentially Training Convolutional Networks for Visual Tracking." CVPR (2016). [paper] [github]
SRDCFdecon: Martin Danelljan, Gustav Häger, Fahad Khan, Michael Felsberg. "Adaptive Decontamination of the Training Set: A Unified Formulation for Discriminative Visual Tracking." CVPR (2016). [paper] [project]
HDT: Yuankai Qi, Shengping Zhang, Lei Qin, Hongxun Yao, Qingming Huang, Jongwoo Lim, Ming-Hsuan Yang. "Hedged Deep Tracking." CVPR (2016). [paper] [project]
Staple: Luca Bertinetto, Jack Valmadre, Stuart Golodetz, Ondrej Miksik, Philip H.S. Torr. "Staple: Complementary Learners for Real-Time Tracking." CVPR (2016). [paper] [project] [github]
EBT: Gao Zhu, Fatih Porikli, and Hongdong Li. "Beyond Local Search: Tracking Objects Everywhere with Instance-Specific Proposals." CVPR (2016). [paper] [exe]
DLSSVM: Jifeng Ning, Jimei Yang, Shaojie Jiang, Lei Zhang and Ming-Hsuan Yang. "Object Tracking via Dual Linear Structured SVM and Explicit Feature Map." CVPR (2016). [paper] [code] [project]

NIPS2016

Learnet: Luca Bertinetto, João F. Henriques, Jack Valmadre, Philip H. S. Torr, Andrea Vedaldi. "Learning feed-forward one-shot learners." NIPS (2016). [paper]

ICCV2015

FCNT: Lijun Wang, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." ICCV (2015). [paper] [project] [github]
SRDCF: Martin Danelljan, Gustav Häger, Fahad Khan, Michael Felsberg. "Learning Spatially Regularized Correlation Filters for Visual Tracking." ICCV (2015). [paper] [project]
CF2: Chao Ma, Jia-Bin Huang, Xiaokang Yang and Ming-Hsuan Yang. "Hierarchical Convolutional Features for Visual Tracking." ICCV (2015) [paper] [project] [github]
Naiyan Wang, Jianping Shi, Dit-Yan Yeung and Jiaya Jia. "Understanding and Diagnosing Visual Tracking Systems." ICCV (2015). [paper] [project] [code]\
DeepSRDCF: Martin Danelljan, Gustav Häger, Fahad Khan, Michael Felsberg. "Convolutional Features for Correlation Filter Based Visual Tracking." ICCV workshop (2015). [paper] [project]
RAJSSC: Mengdan Zhang, Junliang Xing, Jin Gao, Xinchu Shi, Qiang Wang, Weiming Hu. "Joint Scale-Spatial Correlation Tracking with Adaptive Rotation Estimation." ICCV workshop (2015). [paper] [poster]

CVPR2015

MUSTer: Zhibin Hong, Zhe Chen, Chaohui Wang, Xue Mei, Danil Prokhorov, Dacheng Tao. "MUlti-Store Tracker (MUSTer): A Cognitive Psychology Inspired Approach to Object Tracking." CVPR (2015). [paper] [project]
LCT: Chao Ma, Xiaokang Yang, Chongyang Zhang, Ming-Hsuan Yang. "Long-term Correlation Tracking." CVPR (2015). [paper] [project] [github]
DAT: Horst Possegger, Thomas Mauthner, and Horst Bischof. "In Defense of Color-based Model-free Tracking." CVPR (2015). [paper] [project] [code]
RPT: Yang Li, Jianke Zhu and Steven C.H. Hoi. "Reliable Patch Trackers: Robust Visual Tracking by Exploiting Reliable Patches." CVPR (2015). [paper] [github]

ICML2015

CNN-SVM: Seunghoon Hong, Tackgeun You, Suha Kwak and Bohyung Han. "Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network ." ICML (2015) [paper] [project]

BMVC2014

DSST: Martin Danelljan, Gustav Häger, Fahad Shahbaz Khan and Michael Felsberg. "Accurate Scale Estimation for Robust Visual Tracking." BMVC (2014). [paper] [PAMI] [project]

ECCV2014

MEEM: Jianming Zhang, Shugao Ma, and Stan Sclaroff. "MEEM: Robust Tracking via Multiple Experts using Entropy Minimization." ECCV (2014). [paper] [project]
TGPR: Jin Gao, Haibin Ling, Weiming Hu, Junliang Xing. "Transfer Learning Based Visual Tracking with Gaussian Process Regression." ECCV (2014). [paper] [project]
STC: Kaihua Zhang, Lei Zhang, Ming-Hsuan Yang, David Zhang. "Fast Tracking via Spatio-Temporal Context Learning." ECCV (2014). [paper] [project]
SAMF: Yang Li, Jianke Zhu. "A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration." ECCV workshop (2014). [paper] [github]

NIPS2013

DLT: Naiyan Wang and Dit-Yan Yeung. "Learning A Deep Compact Image Representation for Visual Tracking." NIPS (2013). [paper] [project] [code]

PAMI & IJCV & TIP

AOGTracker: Tianfu Wu , Yang Lu and Song-Chun Zhu. "Online Object Tracking, Learning and Parsing with And-Or Graphs." TPAMI (2017). [paper] [project] [github]
MCPF: Tianzhu Zhang, Changsheng Xu, Ming-Hsuan Yang. " Learning Multi-task Correlation Particle Filters for Visual Tracking." TPAMI (2017). [[paper]] [project] [code]
RSST: Tianzhu Zhang, Changsheng Xu, Ming-Hsuan Yang. " Robust Structural Sparse Tracking." TPAMI (2017). [[paper]] [project] [code]
fDSST: Martin Danelljan, Gustav Häger, Fahad Khan, Michael Felsberg. "Discriminative Scale Space Tracking." TPAMI (2017). [paper] [project] [code]
KCF: João F. Henriques, Rui Caseiro, Pedro Martins, Jorge Batista. "High-Speed Tracking with Kernelized Correlation Filters." TPAMI (2015). [paper] [project]
CLRST: Tianzhu Zhang, Si Liu, Narendra Ahuja, Ming-Hsuan Yang, Bernard Ghanem.
"Robust Visual Tracking Via Consistent Low-Rank Sparse Learning." IJCV (2015). [paper] [project] [code]
DNT: Zhizhen Chi, Hongyang Li, Huchuan Lu, Ming-Hsuan Yang. "Dual Deep Network for Visual Tracking." TIP (2017). [paper]
DRT: Junyu Gao, Tianzhu Zhang, Xiaoshan Yang, Changsheng Xu. "Deep Relative Tracking." TIP (2017). [paper]
BIT: Bolun Cai, Xiangmin Xu, Xiaofen Xing, Kui Jia, Jie Miao, Dacheng Tao. "BIT: Biologically Inspired Tracker." TIP (2016). [paper] [project] [github]
CNT: Kaihua Zhang, Qingshan Liu, Yi Wu, Minghsuan Yang. "Robust Visual Tracking via Convolutional Networks Without Training." TIP (2016). [paper] [code]

ArXiv

MLT: Janghoon Choi, Junseok Kwon, Kyoung Mu Lee. "Deep Meta Learning for Real-Time Visual Tracking based on Target-Specific Feature Space." arXiv (2017). [paper]
STECF: Yang Li, Jianke Zhu, Wenjie Song, Zhefeng Wang, Hantang Liu, Steven C. H. Hoi. "Robust Estimation of Similarity Transformation for Visual Object Tracking with Correlation Filters." arXiv (2017). [paper]
PAWSS: Xiaofei Du, Alessio Dore, Danail Stoyanov. "Patch-based adaptive weighting with segmentation and scale (PAWSS) for visual tracking." arXiv (2017). [paper]
SFT: Zhen Cui, You yi Cai, Wen ming Zheng, Jian Yang. "Spectral Filter Tracking." arXiv (2017). [paper]
Re3: Daniel Gordon, Ali Farhadi, Dieter Fox. "Re3 : Real-Time Recurrent Regression Networks for Object Tracking." arXiv (2017). [paper]
DCFNet: Qiang Wang, Jin Gao, Junliang Xing, Mengdan Zhang, Weiming Hu. "DCFNet: Discriminant Correlation Filters Network for Visual Tracking." arXiv (2017). [paper] [code]
TCNN: Hyeonseob Nam, Mooyeol Baek, Bohyung Han. "Modeling and Propagating CNNs in a Tree Structure for Visual Tracking." arXiv (2016). [paper] [code]
RDT: Janghoon Choi, Junseok Kwon, Kyoung Mu Lee. "Visual Tracking by Reinforced Decision Making." arXiv (2017). [paper]
MSDAT: Xinyu Wang, Hanxi Li, Yi Li, Fumin Shen, Fatih Porikli . "Robust and Real-time Deep Tracking Via Multi-Scale Domain Adaptation." arXiv (2017). [paper]
RLT: Da Zhang, Hamid Maei, Xin Wang, Yuan-Fang Wang. "Deep Reinforcement Learning for Visual Object Tracking in Videos." arXiv (2017). [paper]
SCF: Wangmeng Zuo, Xiaohe Wu, Liang Lin, Lei Zhang, Ming-Hsuan Yang. "Learning Support Correlation Filters for Visual Tracking." arXiv (2016). [paper] [project]
CRT: Kai Chen, Wenbing Tao. "Convolutional Regression for Visual Tracking." arXiv (2016). [paper]
BMR: Kaihua Zhang, Qingshan Liu, and Ming-Hsuan Yang. "Visual Tracking via Boolean Map Representations." arXiv (2016). [paper]
YCNN: Kai Chen, Wenbing Tao. "Once for All: a Two-flow Convolutional Neural Network for Visual Tracking." arXiv (2016). [paper]
ROLO: Guanghan Ning, Zhi Zhang, Chen Huang, Zhihai He, Xiaobo Ren, Haohong Wang. "Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking." arXiv (2016). [paper] [project] [github]
RATM: Samira Ebrahimi Kahou, Vincent Michalski, Roland Memisevic. "RATM: Recurrent Attentive Tracking Model." arXiv (2015). [paper] [github]
SO-DLT: Naiyan Wang, Siyi Li, Abhinav Gupta, Dit-Yan Yeung. "Transferring Rich Feature Hierarchies for Robust Visual Tracking." arXiv (2015). [paper] [code]
DMSRDCF: Susanna Gladh, Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg. "Deep Motion Features for Visual Tracking." ICPR Best Paper (2016). [paper]

Multi-Modal
- RGB-D
  - 2021
    - DeT: Song Yan, Jinyu Yang, Jani Käpylä, Feng Zheng, Aleš Leonardis, Joni-Kristian Kämäräinen. DepthTrack : Unveiling the Power of RGBD Tracking. In ICCV, 2021. [Paper]
    - TSDM: Pengyao Zhao, Quanli Liu, Wei Wang and Qiang Guo. TSDM: Tracking by SiamRPN++ with a Depth-refiner and a Mask-generator. In ICPR, 2021. [Paper] [Code]
    - 3s-RGBD: Feng Xiao, Qiuxia Wu, Han Huang. Single-scale siamese network based RGB-D object tracking with adaptive bounding boxes. In Neurocomputing, 2021. [Paper]
  - 2020
    - DAL: Yanlin Qian, Alan Lukezic, Matej Kristan, Joni-Kristian Kämäräinen, Jiri Matas. DAL : A deep depth-aware long-term tracker. In ICPR, 2020. [Code]
    - RF-CFF: Yong Wang, Xian Wei, Hao Shen, Lu Ding, Jiuqing Wan. Robust fusion for RGB-D tracking using CNN features. In Applied Soft Computing Journal, 2020. [Paper]
    - SiamOC: Wenli Zhang, Kun Yang, Yitao Xin, Rui Meng. An Occlusion-Aware RGB-D Visual Object Tracking Method Based on Siamese Network. In ICSP, 2020. [Paper]
    - WCO: Weichun Liu, Xiaoan Tang, Chengling Zhao. Robust RGBD Tracking via Weighted Convlution Operators. In Sensors 20(8), 2020. [Paper]
  - 2019
    - 3DMS: Alexander Gutev, Carl James Debono. Exploiting Depth Information to Increase Object Tracking Robustness. In ICST 2019. [Paper]
    - CA3DMS: Ye Liu, Xiao-Yuan Jing, Jianhui Nie, Hao Gao, Jun Liu, Guo-Ping Jiang. Context-Aware Three-Dimensional Mean-Shift With Occlusion Handling for Robust Object Tracking in RGB-D Videos. In TMM 21(3), 2019. [Paper] [Code]
    - Depth-CCF: Guanqun Li, Lei Huang, Peichang Zhang, Qiang Li, YongKai Huo. Depth Information Aided Constrained correlation Filter for Visual Tracking. In GSKI 2018. [Paper]
    - ECO_TA: Yangliu Kuai, Gongjian Wen, Dongdong Li, Jingjing Xiao. Target-Aware Correlation Filter Tracking in RGBD Videos. In Sensors 19(20), 2019. [Paper]
    - H-FCN: Ming-xin Jiang, Chao Deng, Jing-song Shan, Yuan-yuan Wang, Yin-jie Jia, Xing Sun. Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking. In Information Fusion [Paper]
    - OTR: Ugur Kart, Alan Lukezic, Matej Kristan, Joni-Kristian Kamarainen, Jiri Matas. Object Tracking by Reconstruction with View-Specific Discriminative Correlation Filters. In CVPR 2019. [Paper] [Code]
    - RGBD-OD: Yujun Xie, Yao Lu, Shuang Gu. RGB-D Object Tracking with Occlusion Detection. In CIS 2019. [Paper]
  - 2018
    - CSR-rgbd: Uğur Kart, Joni-Kristian Kämäräinen, Jiří Matas. How to Make an RGBD Tracker? In ECCV Workshop 2018. [Paper] [Code]
    - DM-DCF: Uğur Kart, Joni-Kristian Kämäräinen, Jiří Matas, Lixin Fan, Francesco Cricri. Depth Masked Discriminative Correlation Filter. In ICPR 2018. [Paper]
    - OACPF: Yayu Zhai, Ping Song, Zonglei Mou, Xiaoxiao Chen, Xiongjun Liu. Occlusion-Aware Correlation Particle FilterTarget Tracking Based on RGBD Data. In Access (6), 2018. [Paper]
    - RT-KCF: Han Zhang, Meng Cai, Jianxun Li. A Real-time RGB-D tracker based on KCF. In CCDC 2018. [Paper]
    - SEOH: Jiaxu Leng, Ying Liu. Real-Time RGB-D Visual Tracking With ScaleEstimation and Occlusion Handling. In Access (6), 2018. [Paper]
    - STC: Jingjing Xiao, Rustam Stolkin, Yuqing Gao, Aleš Leonardis. Robust Fusion of Color and Depth Data for RGB-D Target Tracking Using Adaptive Range-Invariant Depth Models and Spatio-Temporal Consistency Constraints. In TC 48(8) 2018. [Paper] [Code]
  - 2017
    - ODIOT: Wei-Long Zheng, Shan-Chun Shen, Bao-Liang Lu. Online Depth Image-Based Object Tracking with Sparse Representation and Object Detection. In Neural Process Letters, 2017. [Paper]
    - ROTSL: Zi-ang Ma, Zhi-yu Xiang. Robust Object Tracking with RGBD-based Sparse Learning. In ITEE (18), 2017. [Paper]
  - 2016
    - DLS: Ning An, Xiao-Guang Zhao, Zeng-Guang Hou. Online RGB-D Tracking via Detection-Learning-Segmentation. In ICPR 2016. [Paper]
    - DS-KCF_shape: Sion Hannuna, Massimo Camplani, Jake Hall, Majid Mirmehdi, Dima Damen, Tilo Burghardt, Adeline Paiement, Lili Tao. DS-KCF: A Real-time Tracker for RGB-D Data. In RTIP (16), 2016. [Paper] [Code]
    - 3D-T: Adel Bibi, Tianzhu Zhang, Bernard Ghanem. 3D Part-Based Sparse Tracker with Automatic Synchronization and Registration. In CVPR 2016. [Paper] [Code]
    - OAPF: Kourosh Meshgia, Shin-ichi Maedaa, Shigeyuki Obaa, Henrik Skibbea, Yu-zhe Lia, Shin Ishii. Occlusion Aware Particle Filter Tracker to Handle Complex and Persistent Occlusions. In CVIU (150), 2016. [Paper]
  - 2015
    - CDG: Huizhang Shi, Changxin Gao, Nong Sang. Using Consistency of Depth Gradient to Improve Visual Tracking in RGB-D sequences. In CAC, 2015. [Paper]
    - DS-KCF: Massimo Camplani, Sion Hannuna, Majid Mirmehdi, Dima Damen, Adeline Paiement, Lili Tao, Tilo Burghardt. Real-time RGB-D Tracking with Depth Scaling Kernelised Correlation Filters and Occlusion Handling. In BMVC, 2015. [Paper] [Code]
    - DOHR: Ping Ding, Yan Song. Robust Object Tracking Using Color and Depth Images with a Depth Based Occlusion Handling and Recovery. In FSKD, 2015. [Paper]
    - ISOD: Yan Chen, Yingju Shen, Xin Liu, Bineng Zhong. 3D Object Tracking via Image Sets and Depth-Based Occlusion Detection. In SP (112), 2015. [Paper]
    - OL3DC: Bineng Zhong, Yingju Shen, Yan Chen, Weibo Xie, Zhen Cui, Hongbo Zhang, Duansheng Chen ,Tian Wang, Xin Liu, Shujuan Peng, Jin Gou, Jixiang Du, Jing Wang, Wenming Zheng. Online Learning 3D Context for Robust Visual Tracking. In Neurocomputing (151), 2015. [Paper]
  - 2014
    - MCBT: Qi Wang, Jianwu Fang, Yuan Yuan. Multi-Cue Based Tracking. In Neurocomputing (131), 2014. [Paper]
  - 2013
    - PT: Shuran Song, Jianxiong Xiao. Tracking Revisited using RGBD Camera: Unified Benchmark and Baselines. In ICCV, 2013. [Paper] [Project]
  - 2012
    - AMCT: Germán Martín García, Dominik Alexander Klein, Jörg Stückler, Simone Frintrop, Armin B. Cremers. Adaptive Multi-cue 3D Tracking of Arbitrary Objects. In JDOS, 2012. [Paper]
- RGB-NL
  - 2021
    - TNL2K: Wang, Xiao and Shu, Xiujun and Zhang, Zhipeng and Jiang, Bo and Wang, Yaowei and Tian, Yonghong and Wu, Feng. Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark. In CVPR 2021. [Paper] [Project]
    - SNLT: Feng, Qi and Ablavsky, Vitaly and Bai, Qinxun and Sclaroff, Stan. Siamese Natural Language Tracker: Tracking by Natural Language Descriptions with Siamese Trackers. In CVPR 2021. [Paper] [Code]
  - 2020
    - GTI: Yang, Zhengyuan, Tushar Kumar, Tianlang Chen, Jingsong Su, and Jiebo Luo. Grounding-tracking-integration. In TCSV 2020. [Paper]
  - 2018
    - DAT: Wang, Xiao, Chenglong Li, Rui Yang, Tianzhu Zhang, Jin Tang, and Bin Luo. Describe and attend to track: Learning natural language guided structural representation and visual attention for object tracking. In arXiv 2018. [Paper]
  - 2017
    - TNLS: Li, Zhenyang, Ran Tao, Efstratios Gavves, Cees GM Snoek, and Arnold WM Smeulders. "Tracking by natural language specification." In CVPR 2017. [Paper]
- RGB-Event
  - Survey
    - Gallego, Guillermo, Tobi Delbrück, Garrick Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger et al. Event-based vision: A survey. In TPAMI 2020. [Paper]
  - 2022
    - VisEvent: Wang, Xiao, Jianing Li, Lin Zhu, Zhipeng Zhang, Zhe Chen, Xin Li, Yaowei Wang, Yonghong Tian, and Feng Wu. VisEvent: Reliable Object Tracking via Collaboration of Frame and Event Flows. In AAAI 2022. [Paper] [Project]
  - 2021
    - JEFE: Zhang, Jiqing, Xin Yang, Yingkai Fu, Xiaopeng Wei, Baocai Yin, and Bo Dong. Object Tracking by Jointly Exploiting Frame and Event Domain. In ICCV 2021. [Paper]
- RGB-T
  - 2022
    - VTUAV: Pengyu Zhang, Jie Zhao, Dong Wang, Huchuan Lu, Xiang Ruan. Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline", In CVPR 2022. [Paper] [Project]
  - 2021
    - CBPNet: Qin Xu, Yiming Mei, Jinpei Liu, Chenglong Li. Multimodal Cross-Layer Bilinear Pooling for RGBT Tracking", In TMM 2021. [Paper]
- Metrics
- Benchmark Results
  - VOT16
  - VOT17
  - VOT18
  - VOT19
  - VOT20
  - VOT21
- Toolbox
- Course

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
Figure		Figure
README.md		README.md

memoryunreal/awesome-single-object-tracking

Folders and files

Latest commit

History

Repository files navigation

Awesome-Single-Object-Tracking

Table of Contents

Review papers

RGB Tracking

RGB Tracking 2022

RGB Tracking 2021

Robust Tracking

Robust Tracking 2022

Robust Tracking 2021

Robust Tracking 2020

Point Clouds Tracking

Point Clouds Tracking 2022

Point Clouds Tracking 2021

Point Clouds Tracking 2020

Point Clouds Tracking 2019

Multi-modal Tracking

Vision-Language Tracking

RGB-Depth Tracking

RGB-Thermal Tracking

Siamese

RGB Paper

CVPR2020

AAAI 2020

ICCV2019

CVPR2019

AAAI2019

NIPS2018

ECCV2018

CVPR2018

NIPS2017

ICCV2017

CVPR2017

ECCV2016

CVPR2016

NIPS2016

ICCV2015

CVPR2015

ICML2015

BMVC2014

ECCV2014

NIPS2013

PAMI & IJCV & TIP

ArXiv

Multi-Modal

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages