Online Model Distillation for Efficient Video Inference
This public repository is currently a work in progress. Features currently supported include the JITNet architecture, pretraining on the COCO dataset, and model timing.
Please install the following (Our versions shown in parentheses):
- Python 3, recommended through Anaconda (Python 3.6.5, Anaconda 4.5.11)
- CUDA and cuDNN (CUDA 9.2, cuDNN 7.3, NVIDIA-396 driver)
- Tensorflow (Tensorflow 1.10.1. We recommend to build from source for higher performance)
Clone this repository, then initialize submodules with
git submodule update --init --recursive.
Pretraining on the COCO Dataset
To pretrain JITNet on the COCO Dataset, first download and set up the COCO-stuff dataset from https://github.com/nightrome/cocostuff (TODO: detailed instructions)
TODO: detailed instructions on pretraining using the script.
Timing the JITNet model
We include a timing script to determine JITNet inference time with different architecture setups and hardware/software configurations. Run it with
python utils/time_models.py. The script times the default JITNet architecture setup: this can be changed by changing the
arguments at the end of the script.
The Long Video Streams (LVS) Dataset
The Long Video Streams Dataset can be found at https://olimar.stanford.edu/hdd/lvsdataset/. Each folder contains one stream, consisting of multiple video chunks and corresponding Mask R-CNN predictions. You can download an individual stream, or the whole dataset, using wget. For instance, this will download a stream into lvsdataset/ (use the same command without the stream to download the whole dataset):
wget -e robots=off -r -nH --cut-dirs 1 --no-check-certificate -np 'https://olimar.stanford.edu/hdd/lvsdataset/<stream>/'
Ravi Teja Mullapudi (email@example.com)
Steven Chen (firstname.lastname@example.org)