This is a Pytorch implementation of MIMO is All You Need , a Transformer-based architexture for video prediction as described in the following paper:
MIMO Is All You Need : A Strong Multi-In-Multi-Out Baseline for Video Prediction, by Shuliang Ning, Mengcheng Lan, Yanran Li, Chaofeng Chen, Qian Chen, Xunlai Chen, Xiaoguang Han and Shuguang Cui.
pip install -r requirment.txt
We conduct experiments on four video datasets: MNIST (passwd:lnnj), Human3.6M, Weather, and KITTI (passwd:bfar).
For video format datasets, we extract frames from original video clips.
Use the train.py scipt to train the model. To train the default model on Moving MNIST dataset, you need to download the MNIST dataset, and change data directory in --root
, then just run:
python train.py
To train on your own dataset, just change the dataloader.
The check point will be saved in --save_dir
and the generated frames will be saved in the --gen_frm_dir
folder.
The pretrain model for MNIST is Here (passwd:chpo)
The comparison between MIMO-VP and other two methods.
30 frames are predicted given the last 10 frames.
To be release.