This is a PyToch implementation of Video Text Tracking With a Spatio-Temporal Complementary Model.
Part of the code is inherited from DB and SiamMask.
- Release code
- Document for Installation
- Document for training and testing
- Python 3.6
- PyTorch >= 1.2
- GCC 5.5
- CUDA 9.2
conda create --name scm python=3.6
conda activate scm
# install PyTorch with cuda-9.2
conda install pytorch==1.5.0 torchvision==0.6.0 cudatoolkit=9.2 -c pytorch
# python dependencies
pip install -r requirement.txt
# clone repo
git clone https://github.com/lsabrinax/VideoTextSCM
cd VideoTextSCM/
# build deformable convolution opertor
cd assets/ops/dcn/
python setup.py build_ext --inplace
The root of the dataset directory can be VideoTextSCM/datasets/
.
Download the converted ground-truth and data list Baidu Drive(download code: 0e8b), Google Drive. The images of each dataset can be obtained from official website.
run the below command to get the tracking results and submit the results to official website to get the performance
CUDA_VISIBLE_DEVICES=0 python demo_textboxPP.py --input-root path-to-test-dataset --output-root path-to-save-result --sub-res --dataset icdar --weight-path path-to-embedding-model --scm-config path-to-scm-config --scm-weight-path path-to-scm-model
#download the pre-trained model
cd VideoTextSCM/scm/experiments/siammask_sharp
wget http://www.robots.ox.ac.uk/~qwang/SiamMask_VOT.pth
#train the model
cd VideoTextSCM
CUDA_VISIBLE_DEVICES=0,1,2,3 python train_scm.py --save-dir path-to-save-scm-model --pretrained \
./scm/experiments/siammask_sharp/SiamMask_VOT.pth --config ./scm/experiments/siammask_sharp/config_icdar.json \
--batch 256 --epochs 20
Download totaltext_resnet50 Baidu Drive (download code: p6u3), Google Drive.
cd db_model & mkdir weights # put totaltext_resnet50 in db_model/weights
#train embedding
cd VideoTextSCM
CUDA_VISIBLE_DEVICES=0 python train_embedding.py --exp_name model-name --batch_size 3 --num_workers 8 --lr 0.0005
Please cite the related works in your publications if it helps your research:
@article{gao2021video,
title={Video Text Tracking With a Spatio-Temporal Complementary Model},
author={Gao, Yuzhe and Li, Xing and Zhang, Jiajian and Zhou, Yu and Jin, Dian and Wang, Jing and Zhu, Shenggao and Bai, Xiang},
journal={IEEE Transactions on Image Processing},
volume={30},
pages={9321--9331},
year={2021},
publisher={IEEE}
}