Multiple Planar Object Tracking

Zhicheng Zhang, Shengzhe Liu, Jufeng Yang

Key motivation: Tracking both location and pose of multiple planar objects (MPOT) is of great significance to numerous real-world applications, including industrial, education, geometry, art, and our daily life.

This repository contains the official implementation of our work in ICCV 2023. MPOT-3K dataset and the pytorch code for tracking-by-reasoning framework PRTrack are released. More details can be viewed in our paper.

Publication

Multiple Planar Object Tracking
Zhicheng Zhang, Shengzhe Liu, Jufeng Yang
Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
[PDF] [Video] [Project Page] [Github] [MPOT-3K Dataset] [Demo]

Multiple Planar Object Tracking
- Publication
ABSTRACT
DEPENDENCY
MPOT-3K DATASET
SCRIPTS of PRTrack
- Preparation
- Train
- Resume
- Test
- Evaluation
- Demo
REFERENCE
CITATION

ABSTRACT

Tracking both location and pose of multiple planar objects (MPOT) is of great significance to numerous real-world applications. The greater degree-of-freedom of planar objects compared with common objects makes MPOT far more challenging than well-studied object tracking, especially when occlusion occurs. To address this challenging task, we are inspired by amodal perception that humans jointly track visible and invisible parts of the target, and propose a tracking framework that unifies appearance perception and occlusion reasoning. Specifically, we present a dual-branch network to track the visible part of planar objects, including vertexes and mask. Then, we develop an occlusion area localization strategy to infer the invisible part, i.e., the occluded region, followed by a two-stream attention network finally refining the prediction. To alleviate the lack of data in this field, we build the first large-scale benchmark dataset, namely MPOT-3K. It consists of 3,717 planar objects from 356 videos and contains 148,896 frames together with 687,417 annotations. The collected planar objects have 9 motion patterns and the videos are shot in 6 types of indoor and outdoor scenes.

DEPENDENCY

Recommended Environment

CUDA 11.1
Python 3.7
Pytorch 1.8.1
numpy 1.19.5
apex 0.1

You can prepare your environment by running the following lines.

Automatic Install

We prepare a frozen conda environment env that can be directly copied.

conda env create -f ./env.yaml

If it doesn't work, try installing the environment manually.

Manual Install

create a virtual environment

conda create -n prtrack python=3.6.13

install torch and other

pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt

install apex, please refer to their GitHub page here for more details.

conda install -c conda-forge cudatoolkit-dev
git clone https://github.com/NVIDIA/apex
cd apex
git checkout f3a960f80244cf9e80558ab30f7f7e8cbf03c0a0
python setup.py install --cuda_ext --cpp_ext

install DALI

pip install nvidia-pyindex
pip install nvidia-dali

We also provide our environment as a reference at env.

MPOT-3K DATASET

If you need the MPOT-3K dataset for academic purposes, please download the application form and fill out the request information, then send it to gloryzzc6@sina.com. We will be sure to process your application as soon as possible. Please make sure that the email used comes from your educational institution.

Data Source

The collected scenes include lib, gallery, house, streetview, buildings, village. Besides, the videos are shoot under nine motion patterns that involve camera motion and target movement as follows:

Id	Motion Pattern
1	Far-near Movement
2	In-plane Rotation
3	Out-plane Rotation
4	In-plane Movement
5	Motion Blur
6	Camera Occlusion
7	Unconstrained
8	Moving Objects
9	Moving Occlusions

Data Format

MPOT-3k
├── list #splits
│   ├── test.txt
│   ├── train.txt
│   └── val.txt
├── train 
│   ├── buildings1-1
│   ├── ...
│   └── village7-9
├── test
│   ├── buildings5-1
│   ├── ...
│   └── village2-9
└── val 
    ├── buildings4-1
    ├── ...
    └── gallery2-9
        ├── gobjs #planar objects
        ├── gt #annotations
        │   ├── gt_init.txt #instances (initial frame)
        │   ├── gt_obj_init.txt #objects (initial frame)
        │   ├── gt_obj.txt #ground truth
        │   └── objects.txt
        ├── seq1 #images
        └── seqinfo.ini #video information

Annotations

In gt_obj.txt, the annotation in each line includes:

1: frame id
2: instance id 3-10: 4 points
11: class id 12: object id

A simple example is listed below.

Frame	instance	point1_x	point1_y	point2_x	point2_y	point3_x	point3_y	point4_x	point4_y	class	object
1	1	90.0	197.3	196.0	191.3	217.3	426.0	112.0	441.3	6	1
2	1	87.6	197.4	193.6	191.4	214.9	426.1	109.6	441.4	6	1

SCRIPTS of PRTrack

Preparation

Dataset: After obtaining the MPOT-3K, follow the command below to extract data files from a set of zip files.

zip mpot-3k.zip -s=0 --out mpot_single.zip
unzip -d ./ mpot_single.zip

Adjust the local data directory in data.py by replacing ROOT with the parent directory of MPOT-3K. For example, adjust the following lines:

ROOT = '/mnt/sda/zzc/data/track'

Pre-trained Model: Download the pretrained model from google drive checkpoint and place it at './ckpt/'.

Train

You can easily train and evaluate the model by running the script below.

You can include more details such as epoch, milestone, learning_rate, etc. Please refer to config_train.yaml.

python train.py --cfg ./configs/config_train.yaml

Resume

Please set the path of the intermediate checkpoint in config_train.yaml for resume.

resume: './ckpt/prtrack_r50/recurrent_epoch_10.pth.tar'

Test

Our provided script can be applied for using the trained model to track planar objects and get the corresponding prediction results.

python test.py --cfg ./configs/config_test.yaml

Evaluation

You can evaluate the model by running the command below. The trained model can be found via Baidu Netdisk baidu and Google Drive google. More details can be viewed in eval.

cd evaluation/MPOT
python evalMPOT.py

Demo

We built an online demo on Gradio here.

REFERENCE

We referenced the repos below for the code.

CITATION

If you find this repo useful in your project or research, please consider citing the relevant publication.

@inproceedings{zhang2023multiple,
  title={Multiple Planar Object Tracking},
  author={Zhang, Zhichang and Liu, Shengzhe and Yang, Jufeng},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
assets		assets
configs		configs
evaluation		evaluation
libs		libs
script		script
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
env.yaml		env.yaml
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

nku-zhichengzhang/MPOT

Folders and files

Latest commit

History

Repository files navigation