PyTorch Implementation of paper:
LocVTP: Video-Text Pre-training for Temporal Localization (ECCV 2022)
Meng Cao, Tianyu Yang, Junwu Weng, Can Zhang, Jue Wang and Yuexian Zou*.
- We released the uncleaned code to meet ECCV requirements, i.e. the code should be presented before the camera ready DDL. The cleaned up code will be provided asap.
- Download HowTo100M and put them in the 'datasets/howto' directory.
# 64 V100 GPUs required
master_addr=127.0.0.3
master_port=29502
model=howto
config_file=configs/$model\.yaml
output_dir=outputs/$model
python -m torch.distributed.launch \
--nproc_per_node=$gpun \
--master_addr $master_addr \
--master_port $master_port \
train_net.py \
--skip-test \
--config-file $config_file \
OUTPUT_DIR $output_dir \
SOLVER.BATCH_SIZE 128 \
TEST.BATCH_SIZE 4 \
SOLVER.TEST_PERIOD -1 \
DATALOADER.NUM_WORKERS 4 \
INPUT.NUM_SEGMENTS 5 \
MODEL.CLIP.LOSS.SHIFTWEIGHT 1 \
MODEL.CLIP.LOSS.FGWEIGHT 10 \
MODEL.CLIP.LOSS.FGPOSWEIGHT 1000 \
SOLVER.LR 0.001
# find all configs in configs/
model=howto
config_file=configs/$model.yaml
# select weight to evaluate
weight_file=outputs/$model/model_10e.pth
# set your gpu id
gpus=0,1,2,3,4,5,6,7
# number of gpus
gpun=8
# batch size
batch_size=16
master_addr=127.0.0.1
master_port=29501
# 1. Create feature directory
if [ ! -d "outputs/$model/vidfeats" ]; then
mkdir outputs/$model/vidfeats
fi
# 2. This command generate hdf5 file for each single process
python -m torch.distributed.launch \
--nproc_per_node=$gpun --master_addr $master_addr --master_port $master_port \
featExtract.py --config-file $config_file --ckpt $weight_file \
--feat_dir outputs/$model/vidfeats --feat_name $model"_c3d_features.hdf5" TEST.BATCH_SIZE $batch_size
# 3. Then conduct concatHDF5.py to geneate the final file.
Python concatHDF5.py
- [ ] More dataset support
- [ ] More backbone support
- [x] C3D
- [ ] S3D
- [ ] ViT + TimesFormer
- [ ] Downstream task support
Please [★star] this repo and [cite] the following paper if you feel our work useful to your research:
@inproceedings{cao2022locvtp,
title = {LocVTP: Video-Text Pre-training for Temporal Localization},
author = {Cao, Meng and Yang, Tianyu and Weng, Junwu and Zhang, Can and Wang, Jue and Zou, Yuexian},
booktitle = {European Conference on Computer Vision},
year = {2022}
}
@article{cao2022locvtp,
title={LocVTP: Video-Text Pre-training for Temporal Localization},
author={Cao, Meng and Yang, Tianyu and Weng, Junwu and Zhang, Can and Wang, Jue and Zou, Yuexian},
journal={arXiv preprint arXiv:2207.10362},
year={2022}
}