In the rapidly evolving field of computer vision, the ability to accurately predict and segment future frames in video sequences presents both significant challenges and opportunities for advancements in various applications, from autonomous driving to video surveillance. This repo contains the codes required to replicate the work by Team 14 in the DS-GA 1008 Final Competition, where our objective was to leverage deep learning models to generate the semantic segmentation mask of the last frame based on the first 11 frames of video sequences.
First, add the provided dataset
folder to this directory
First, create and activate the FutureGAN-conda environment:
$ conda env create -f FutureGAN.yml
$ source activate FutureGAN
To train the model on the "train" dataset:
$ python train.py --data_root='<path_to_train_folder>' --nframes_in=11 --nframes_pred=11
To train the model on the "unlabeled" dataset:
$ python train.py --data_root='<path_to_unlabeled_folder>' --nframes_in=11 --nframes_pred=11
When training on HPC, we ran into an issue where the images are not sorted in chronological order of the video. This was not an issue on Lightning Studio. If you run into this issue, you may run the following script to rename the images so that they are sorted in chronological order:
$ python rename.py --dir='<path_to_training_videos>'
To evaluate the model on the "val" dataset:
$ python eval.py --model_path='<path_to_generator_ckpt>' --data_root='<path_to_val_folder>' \
--test_dir='./validation_result' --nframes_pred=11 --nframes_in=11 --resl=128 \
--metrics='mse' --metrics='psnr'
The path to the trained models are
- 128x128 trained on "labeled" dataset:
logs\final\ckpts\gen_E121_I40020_R128x128_final.pth.tar
- 32x32 trained on "unlabled" dataset:
logs\final\ckpts\gen_E80_I119057_R32x32_stab.pth.tar
To duplicate frames in the "hidden" folders:
$ python dup.py
To evaluate the model on the "hidden" dataset:
$ python eval.py --model_path='<path_to_generator_ckpt>' --data_root='<path_to_hidden_folder>' \
--test_dir='./hidden_result' --nframes_pred=11 --nframes_in=11
To resize the 22nd predicted frames and save as input for U-Net:
$ python resize.py --source_dir='<path_to_prediction_folder>' --destination_dir='<path_to_save_22nd_frames>'
The train.ipynb
notebook has all the required code for building and training a model. Trained model checkpoints will appear in the models
folder. The trained model for submission is labeled submission_model.pt
while all other trained models will appear as model_{epoch}.pt
.
Hidden Mask Prediction
The predict_hidden.ipynb
notebook is designed to process the output images generated by the FutureGAN model and apply the U-Net model to predict masks for these images. This notebook serves as the final step in evaluating the model's performance on hidden or unlabeled data.
- Set Paths: Ensure that the paths to the FutureGAN output and the U-Net model weights are correctly set in the notebook. These paths must be updated to reflect the locations where the generated images and the model weights are stored on your system.
- Run Notebook: Run the notebook cells sequentially to generate predictions for the hidden masks. The model will process each image and apply the trained U-Net model to predict the corresponding mask.
- Save Output: The output will be saved in a
.pt
file, which contains all the predicted masks for the hidden dataset. The path for saving this file can also be adjusted as needed within the notebook. - Visualize Predictions: Additionally, the notebook provides visualizations of some predicted masks at the end, allowing for a quick qualitative assessment of the model's performance on unseen data.
Before running the notebook, ensure that environment has been set up through the .yml
file in the FutureGAN
folder.