Skip to content

hughshaoqz/3rd-Youtube8M-TM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BERT and Frame-Level Models for Large-scale Video Segment Classificationwith Test-time Augmentation

This repo contains the code of the 9th position at The 3rd YouTube-8M Video Understanding Challenge on Kaggle.

Our best final submission mode is a weighted ensemble of 9 models, which reaches MAP 0.7871 on the private leaderboard. The weights are tuned by Baysian Optimization. For the code of Baysian Optimization and ensemble method, please refer to Ensemble.ipynb file. The details of each model can be found below:

Model Weight (%) Public MAP Private MAP
GatedNetVLAD(H1024C16) TTA[-1, 1] 0.22 0.7629 0.7556
BERT(L2H12) TTA[-1, 1] 2.81 0.7729 0.7680
BERTCrossMean(L2H8) TTA[-1, 1] 4.90 0.7748 0.7680
BERT(L3H12) TTA[-1, 1] 13.88 0.7751 0.7688
BERTCrossAttn(L2H8) TTA[-1, 1] 5.97 0.7758 0.7692
BERTFirst(L2H12) TTA[-2, 1] 19.67 0.7792 0.7707
MixNeXtVLAD(Iter 300) TTA[-1, 1] 20.03 0.7809 0.7711
MixNeXtVLAD(Iter 60) TTA[0, 2] 17.26 0.7796 0.7721
BERTMean(L2H12) TTA[-2, 1] 15.46 0.7802 0.7725
Ensemble of all above 100.00 0.7944 0.7871

The code is based on the Google Youtube-8M starter code. Please refer to their README to see the needed dependencies in detail. In general, you need tensorflow 1.4, numpy 1.14 and pandas 0.24 installed. the model is trained and evaluated on YouTube-8M dataset.

Hardware Requirement: The model has been run on a single NVIDIA RTX 2080Ti 11 GB GPU locally, and standard_p100 (NVIDIA TESLA P100 16 GB GPU) on Google Cloud AI Platform.

Train a Single Model

The following are the examples of training a single model locally or on the Google Cloud AI Platform. The model we show below is BERTMean(L2H12) TTA[-2, 1], which got MAP 0.7725 on the final private leaderboard. For the training details of other models, please refer to shell_script directory.

There mainly 4 steps for the whole process to get final prediction of a single model: Pre-training, Fine-tuning, Evaluation, and Inference.

We first pre-train the model on total 6M video-level label training partition and almost all validation partition data except the 300 holdout files for 200k steps with batch size 128 and initial learning rate 1e-4. Learning rate is exponentially decreased by a factor of 0.9 every 2M examples. After the pre-training, we fine-tune the model on validation partition files with segment-level label except the pre-holdout 300 files for local validation. We fine-tune the model for another 20k steps and pick the checkpoint with the highest MAP on local validation data. Finally, we evaluate the picked checkpoint, and inference prediction based on it.

Google Cloud AI Platform

We recommend using Google Cloud AI Platform to run the code due to the large size of the dataset. If use Google Cloud AI Platform, you could access the dataset in the google public bucket. You can initialize the parameters as followes:

# Since this example is for BERTMean(L2H12) TTA[-2, 1], we choose MODEL=Bert, bert_hidden=2, bert_head=12, pooling=mean
BUCKET_NAME=YOUR_BUCKET_NAME

MODEL=Bert
bert_hidden=2
bert_head=12
pool='mean'
ckpt=200000

train_data_pattern='gs://youtube8m-ml-us-east1/2/frame/train/train*.tfrecord'
valid_data_pattern='gs://youtube8m-ml-us-east1/3/frame/validate/validate*.tfrecord'
train_valid_data_pattern='gs://youtube8m-ml-us-east1/2/frame/validate/validate*.tfrecord'
test_data_pattern='gs://youtube8m-ml/3/frame/test/test*.tfrecord'

TRAIN_DIR="${MODEL}_bhidden${bert_hidden}_bhead${bert_head}_${pool}"

1 Pre-Training

JOB_NAME=yt8m_train_$(date +%Y%m%d_%H%M%S); 
gcloud --verbosity=debug ai-platform jobs \
submit training $JOB_NAME \
--package-path=src --module-name=src.train \
--staging-bucket=$BUCKET_NAME --region=us-central1 \
--config=src/cloudml-gpu.yaml \
-- --train_data_pattern=${train_data_pattern} --train_all --train_valid_data_pattern=${train_valid_data_pattern} --google_cloud=True \
--model=${MODEL} \
--bert_hidden_layer=${bert_hidden} --bert_attention_heads=${bert_head} --pooling_strategy=${pool} --sample_random_frames=False \
--frame_features --feature_names="rgb,audio" --feature_sizes='1024,128' \
--batch_size=128 --base_learning_rate=0.0001 --moe_l2=1e-6 --iterations=5 --learning_rate_decay=0.9 \
--export_model_steps=1000 --num_epochs=10000000 --max_steps=200000 \
--bert_config_file=$BUCKET_NAME/bert_config.json \
--train_dir=$BUCKET_NAME/${TRAIN_DIR}
#--start_new_model

2.a Fine-tuning

JOB_NAME=yt8m_train_$(date +%Y%m%d_%H%M%S); 
gcloud --verbosity=debug ai-platform jobs \
submit training $JOB_NAME \
--package-path=src --module-name=src.train \
--staging-bucket=$BUCKET_NAME --region=us-central1 \
--config=src/cloudml-gpu.yaml \
-- --train_data_pattern=${valid_data_pattern} \
--train_dir=$BUCKET_NAME/${TRAIN_DIR} --google_cloud=True \
--model=${MODEL} --bert_hidden_layer=${bert_hidden} --bert_attention_heads=${bert_head} --bert_config_file=$BUCKET_NAME/bert_config.json --segment_labels=True \
--pooling_strategy=${pool} --sample_random_frames=False \
--frame_features --feature_names="rgb,audio" --feature_sizes='1024,128' \
--batch_size=128 --base_learning_rate=0.0001 --moe_l2=1e-6 --iterations=5 --learning_rate_decay=0.9 \
--export_model_steps=200 --num_epochs=10000000 --max_steps=220000

2.b Evaluation when Fine-tuning

ckpt=200000
JOB_TO_EVAL=${TRAIN_DIR}
JOB_NAME=yt8m_eval_$(date +%Y%m%d_%H%M%S); 
gcloud --verbosity=debug ai-platform jobs \
submit training $JOB_NAME \
--package-path=src --module-name=src.eval \
--staging-bucket=$BUCKET_NAME --region=us-central1 \
--config=src/cloudml-gpu.yaml \
-- --eval_data_pattern='gs://youtube8m-ml-us-east1/3/frame/validate/validate*.tfrecord' --train_dir=$BUCKET_NAME/${JOB_TO_EVAL} --google_cloud=True \
--model=${MODEL} --bert_hidden_layer=${bert_hidden} --bert_attention_heads=${bert_head} --bert_config_file=$BUCKET_NAME/bert_config.json --segment_labels \
--pooling_strategy=${pool} --sample_random_frames=False \
--norun_once --top_k=1000 --batch_size=512 --moe_l2=1e-6 --iterations=5 --checkpoint_file=${ckpt}

3 Evaluation for Inference

ckpt=YOUR_BEST_CKPT
JOB_TO_EVAL=${TRAIN_DIR}
JOB_NAME=yt8m_eval_$(date +%Y%m%d_%H%M%S); 
gcloud --verbosity=debug ai-platform jobs \
submit training $JOB_NAME \
--package-path=src --module-name=src.eval \
--staging-bucket=$BUCKET_NAME --region=us-central1 \
--config=src/cloudml-gpu.yaml \
-- --eval_data_pattern='gs://youtube8m-ml-us-east1/3/frame/validate/validate*.tfrecord' --train_dir=$BUCKET_NAME/${JOB_TO_EVAL} --google_cloud=True \
--model=${MODEL} --bert_hidden_layer=${bert_hidden} --bert_attention_heads=${bert_head} --bert_config_file=$BUCKET_NAME/bert_config.json --segment_labels \
--pooling_strategy=${pool} --sample_random_frames=False \
--run_once=True --top_k=1000 --batch_size=512 --moe_l2=1e-6 --iterations=5 --checkpoint_file=${ckpt}

4.a Inference for OOF

# Since we use TTA, it is required to inference couple times with different shifts. For example, if model is TTA[-2,1], you need to inference 4 times, one for --shift=-2, one for --shift=-1, one for --shift=0, one for --shift=1 
JOB_TO_EVAL=${TRAIN_DIR} \
JOB_NAME=yt8m_inference_$(date +%Y%m%d_%H%M%S); \
gcloud --verbosity=debug ai-platform jobs \
submit training $JOB_NAME \
--package-path=src --module-name=src.inference \
--staging-bucket=$BUCKET_NAME --region=us-central1 \
--config=src/cloudml-gpu.yaml \
-- --input_data_pattern=${valid_data_pattern} \
--google_cloud=True \
--segment_labels \
--segment_label_ids_file=$BUCKET_NAME/segment_label_ids.csv \
--top_k 1000 --num_readers 10 --batch_size=512 --segment_max_pred=100000 --shift=-2 \
--train_dir=$BUCKET_NAME/${JOB_TO_EVAL} \
--output_file=$BUCKET_NAME/${JOB_TO_EVAL}/predictions_finetune_oof_shift-2.csv

JOB_TO_EVAL=${TRAIN_DIR} \
JOB_NAME=yt8m_inference_$(date +%Y%m%d_%H%M%S); \
gcloud --verbosity=debug ai-platform jobs \
submit training $JOB_NAME \
--package-path=src --module-name=src.inference \
--staging-bucket=$BUCKET_NAME --region=us-central1 \
--config=src/cloudml-gpu.yaml \
-- --input_data_pattern=${valid_data_pattern} \
--google_cloud=True \
--segment_labels \
--segment_label_ids_file=$BUCKET_NAME/segment_label_ids.csv \
--top_k 1000 --num_readers 10 --batch_size=512 --segment_max_pred=100000 --shift=-1 \
--train_dir=$BUCKET_NAME/${JOB_TO_EVAL} \
--output_file=$BUCKET_NAME/${JOB_TO_EVAL}/predictions_finetune_oof_shift-1.csv

JOB_TO_EVAL=${TRAIN_DIR} \
JOB_NAME=yt8m_inference_$(date +%Y%m%d_%H%M%S); \
gcloud --verbosity=debug ai-platform jobs \
submit training $JOB_NAME \
--package-path=src --module-name=src.inference \
--staging-bucket=$BUCKET_NAME --region=us-central1 \
--config=src/cloudml-gpu.yaml \
-- --input_data_pattern=${valid_data_pattern} \
--google_cloud=True \
--segment_labels \
--segment_label_ids_file=$BUCKET_NAME/segment_label_ids.csv \
--top_k 1000 --num_readers 10 --batch_size=512 --segment_max_pred=100000 --shift=1 \
--train_dir=$BUCKET_NAME/${JOB_TO_EVAL} \
--output_file=$BUCKET_NAME/${JOB_TO_EVAL}/predictions_finetune_oof_shift1.csv

JOB_TO_EVAL=${TRAIN_DIR} \
JOB_NAME=yt8m_inference_$(date +%Y%m%d_%H%M%S); \
gcloud --verbosity=debug ai-platform jobs \
submit training $JOB_NAME \
--package-path=src --module-name=src.inference \
--staging-bucket=$BUCKET_NAME --region=us-central1 \
--config=src/cloudml-gpu.yaml \
-- --input_data_pattern=${valid_data_pattern} \
--google_cloud=True \
--segment_labels \
--segment_label_ids_file=$BUCKET_NAME/segment_label_ids.csv \
--top_k 1000 --num_readers 10 --batch_size=512 --segment_max_pred=100000 \
--train_dir=$BUCKET_NAME/${JOB_TO_EVAL} \
--output_file=$BUCKET_NAME/${JOB_TO_EVAL}/predictions_finetune_oof.csv

4.b Inference for Test

JOB_TO_EVAL=${TRAIN_DIR} \
JOB_NAME=yt8m_inference_$(date +%Y%m%d_%H%M%S); \
gcloud --verbosity=debug ai-platform jobs \
submit training $JOB_NAME \
--package-path=src --module-name=src.inference \
--staging-bucket=$BUCKET_NAME --region=us-central1 \
--config=src/cloudml-gpu.yaml \
-- --input_data_pattern=${test_data_pattern} \
--google_cloud=True \
--segment_labels \
--segment_label_ids_file=$BUCKET_NAME/segment_label_ids.csv \
--top_k 1000 --num_readers 10 --batch_size=512 --segment_max_pred=100000 --shift=-2 \
--train_dir=$BUCKET_NAME/${JOB_TO_EVAL} \
--output_file=$BUCKET_NAME/${JOB_TO_EVAL}/predictions_finetune_shift-2.csv

JOB_TO_EVAL=${TRAIN_DIR} \
JOB_NAME=yt8m_inference_$(date +%Y%m%d_%H%M%S); \
gcloud --verbosity=debug ai-platform jobs \
submit training $JOB_NAME \
--package-path=src --module-name=src.inference \
--staging-bucket=$BUCKET_NAME --region=us-central1 \
--config=src/cloudml-gpu.yaml \
-- --input_data_pattern=${test_data_pattern} \
--google_cloud=True \
--segment_labels \
--segment_label_ids_file=$BUCKET_NAME/segment_label_ids.csv \
--top_k 1000 --num_readers 10 --batch_size=512 --segment_max_pred=100000 --shift=-1 \
--train_dir=$BUCKET_NAME/${JOB_TO_EVAL} \
--output_file=$BUCKET_NAME/${JOB_TO_EVAL}/predictions_finetune_shift-1.csv

JOB_TO_EVAL=${TRAIN_DIR} \
JOB_NAME=yt8m_inference_$(date +%Y%m%d_%H%M%S); \
gcloud --verbosity=debug ai-platform jobs \
submit training $JOB_NAME \
--package-path=src --module-name=src.inference \
--staging-bucket=$BUCKET_NAME --region=us-central1 \
--config=src/cloudml-gpu.yaml \
-- --input_data_pattern=${test_data_pattern} \
--google_cloud=True \
--segment_labels \
--segment_label_ids_file=$BUCKET_NAME/segment_label_ids.csv \
--top_k 1000 --num_readers 10 --batch_size=512 --segment_max_pred=100000 --shift=1 \
--train_dir=$BUCKET_NAME/${JOB_TO_EVAL} \
--output_file=$BUCKET_NAME/${JOB_TO_EVAL}/predictions_finetune_shift1.csv

JOB_TO_EVAL=${TRAIN_DIR} \
JOB_NAME=yt8m_inference_$(date +%Y%m%d_%H%M%S); \
gcloud --verbosity=debug ai-platform jobs \
submit training $JOB_NAME \
--package-path=src --module-name=src.inference \
--staging-bucket=$BUCKET_NAME --region=us-central1 \
--config=src/cloudml-gpu.yaml \
-- --input_data_pattern=${test_data_pattern} \
--google_cloud=True \
--segment_labels \
--segment_label_ids_file=$BUCKET_NAME/segment_label_ids.csv \
--top_k 1000 --num_readers 10 --batch_size=512 --segment_max_pred=100000 \
--train_dir=$BUCKET_NAME/${JOB_TO_EVAL} \
--output_file=$BUCKET_NAME/${JOB_TO_EVAL}/predictions_finetune.csv

Locally

In order to train the model locally, you need to download all required files from YouTube-8M. You can initialize the parameters as followes:

# Since this example is for BERTMean(L2H12) TTA[-2, 1], we choose MODEL=Bert, bert_hidden=2, bert_head=12, pooling=mean
BUCKET_NAME=YOUR_BUCKET_NAME
MODEL=Bert
bert_hidden=2
bert_head=12
pooling=mean

# Specify your data path. We use both train and valid for pre-train, so you also need to specify the train_valid_data_pattern, which is the validation data we used for training.
train_data_pattern=./inputs/data/frame/2/train/train*.tfrecord
train_valid_data_pattern=./inputs/data/frame/2/validate/validate*.tfrecord
valid_data_pattern=./inputs/data/frame/3/validate_data/validate*.tfrecord
test_data_pattern=./inputs/data/frame/3/test/test*.tfrecord

TRAIN_DIR=${MODEL}_${bert_hidden}_${bert_head}_${polling}

1 Pre-Training

python src/train.py --train_data_pattern=${train_data_pattern} --train_all --train_valid_data_pattern=${train_valid_data_pattern} --google_cloud=True \
--model=${MODEL} \
--bert_hidden_layer=${bert_hidden} --bert_attention_heads=${bert_head} --pooling_strategy=${pool} --sample_random_frames=False \
--frame_features --feature_names="rgb,audio" --feature_sizes='1024,128' \
--batch_size=128 --base_learning_rate=0.0001 --moe_l2=1e-6 --iterations=5 --learning_rate_decay=0.9 \
--export_model_steps=1000 --num_epochs=10000000 --max_steps=200000 \
--bert_config_file=$BUCKET_NAME/bert_config.json \
--train_dir=$BUCKET_NAME/${TRAIN_DIR}

2.a Fine-tuning

python src/train.py --train_data_pattern=${valid_data_pattern} \
--train_dir=$BUCKET_NAME/${TRAIN_DIR} --google_cloud=True \
--model=${MODEL} --bert_hidden_layer=${bert_hidden} --bert_attention_heads=${bert_head} --bert_config_file=$BUCKET_NAME/bert_config.json --segment_labels=True \
--pooling_strategy=${pool} --sample_random_frames=False \
--frame_features --feature_names="rgb,audio" --feature_sizes='1024,128' \
--batch_size=128 --base_learning_rate=0.0001 --moe_l2=1e-6 --iterations=5 --learning_rate_decay=0.9 \
--export_model_steps=200 --num_epochs=10000000 --max_steps=220000

2.b Evaluation when Fine-tuning

ckpt=200000
python src/eval.py --eval_data_pattern='gs://youtube8m-ml-us-east1/3/frame/validate/validate*.tfrecord' --train_dir=$BUCKET_NAME/${JOB_TO_EVAL} --google_cloud=True \
--model=${MODEL} --bert_hidden_layer=${bert_hidden} --bert_attention_heads=${bert_head} --bert_config_file=$BUCKET_NAME/bert_config.json --segment_labels \
--pooling_strategy=${pool} --sample_random_frames=False \
--norun_once --top_k=1000 --batch_size=512 --moe_l2=1e-6 --iterations=5 --checkpoint_file=${ckpt}

3 Evaluation for Inference

ckpt=YOUR_BEST_CKPT
python src/eval.py --eval_data_pattern='gs://youtube8m-ml-us-east1/3/frame/validate/validate*.tfrecord' --train_dir=$BUCKET_NAME/${JOB_TO_EVAL} --google_cloud=True \
--model=${MODEL} --bert_hidden_layer=${bert_hidden} --bert_attention_heads=${bert_head} --bert_config_file=$BUCKET_NAME/bert_config.json --segment_labels \
--pooling_strategy=${pool} --sample_random_frames=False \
--run_once=True --top_k=1000 --batch_size=512 --moe_l2=1e-6 --iterations=5 --checkpoint_file=${ckpt}

4.a Inference for OOF

# Since we use TTA, it is required to inference couple times with different shifts. For example, if model is TTA[-2,1], you need to inference 4 times, one for --shift=-2, one for --shift=-1, one for --shift=0, one for --shift=1 
 
python src/inference.py --input_data_pattern=${valid_data_pattern} \
--google_cloud=True \
--segment_labels \
--segment_label_ids_file=$BUCKET_NAME/segment_label_ids.csv \
--top_k 1000 --num_readers 10 --batch_size=512 --segment_max_pred=100000 --shift=-2 \
--train_dir=$BUCKET_NAME/${JOB_TO_EVAL} \
--output_file=$BUCKET_NAME/${JOB_TO_EVAL}/predictions_finetune_oof_shift-2.csv

python src/inference.py --input_data_pattern=${valid_data_pattern} \
--google_cloud=True \
--segment_labels \
--segment_label_ids_file=$BUCKET_NAME/segment_label_ids.csv \
--top_k 1000 --num_readers 10 --batch_size=512 --segment_max_pred=100000 --shift=-1 \
--train_dir=$BUCKET_NAME/${JOB_TO_EVAL} \
--output_file=$BUCKET_NAME/${JOB_TO_EVAL}/predictions_finetune_oof_shift-1.csv

python src/inference.py --input_data_pattern=${valid_data_pattern} \
--google_cloud=True \
--segment_labels \
--segment_label_ids_file=$BUCKET_NAME/segment_label_ids.csv \
--top_k 1000 --num_readers 10 --batch_size=512 --segment_max_pred=100000 --shift=1 \
--train_dir=$BUCKET_NAME/${JOB_TO_EVAL} \
--output_file=$BUCKET_NAME/${JOB_TO_EVAL}/predictions_finetune_oof_shift1.csv

python src/inference.py --input_data_pattern=${valid_data_pattern} \
--google_cloud=True \
--segment_labels \
--segment_label_ids_file=$BUCKET_NAME/segment_label_ids.csv \
--top_k 1000 --num_readers 10 --batch_size=512 --segment_max_pred=100000 \
--train_dir=$BUCKET_NAME/${JOB_TO_EVAL} \
--output_file=$BUCKET_NAME/${JOB_TO_EVAL}/predictions_finetune_oof.csv

4.b Inference for Test

python src/inference.py --input_data_pattern=${test_data_pattern} \
--google_cloud=True \
--segment_labels \
--segment_label_ids_file=$BUCKET_NAME/segment_label_ids.csv \
--top_k 1000 --num_readers 10 --batch_size=512 --segment_max_pred=100000 --shift=-2 \
--train_dir=$BUCKET_NAME/${JOB_TO_EVAL} \
--output_file=$BUCKET_NAME/${JOB_TO_EVAL}/predictions_finetune_shift-2.csv

python src/inference.py --input_data_pattern=${test_data_pattern} \
--google_cloud=True \
--segment_labels \
--segment_label_ids_file=$BUCKET_NAME/segment_label_ids.csv \
--top_k 1000 --num_readers 10 --batch_size=512 --segment_max_pred=100000 --shift=-1 \
--train_dir=$BUCKET_NAME/${JOB_TO_EVAL} \
--output_file=$BUCKET_NAME/${JOB_TO_EVAL}/predictions_finetune_shift-1.csv

python src/inference.py --input_data_pattern=${test_data_pattern} \
--google_cloud=True \
--segment_labels \
--segment_label_ids_file=$BUCKET_NAME/segment_label_ids.csv \
--top_k 1000 --num_readers 10 --batch_size=512 --segment_max_pred=100000 --shift=1 \
--train_dir=$BUCKET_NAME/${JOB_TO_EVAL} \
--output_file=$BUCKET_NAME/${JOB_TO_EVAL}/predictions_finetune_shift1.csv

python src/inference.py --input_data_pattern=${test_data_pattern} \
--google_cloud=True \
--segment_labels \
--segment_label_ids_file=$BUCKET_NAME/segment_label_ids.csv \
--top_k 1000 --num_readers 10 --batch_size=512 --segment_max_pred=100000 \
--train_dir=$BUCKET_NAME/${JOB_TO_EVAL} \
--output_file=$BUCKET_NAME/${JOB_TO_EVAL}/predictions_finetune.csv

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published