-
Notifications
You must be signed in to change notification settings - Fork 5
iftekharnaim/GenerativeVideoToTextAlignment
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This directory contains the source codes and datasets for unsupervised alignment of natural language instructions with corresponding video segments. The mathematical models are explained in our AAAI-2014 paper[1]. Data: ===== The directory "dataset_aaai14" contains the datasets used for our AAAI-14 paper. There are 3 directories: protocols, video_blobs, and vision_3d_coord. Directory “protocols” - contains the raw text and parses for 3 wetlab protocols: CELL, LLGM, and YPAD. Directory “video_blobs” - contains the annotations of the set of blobs touched by hands in the videos. There are total 6 videos (2 per protocol). The files that contains “vision” in their name are generated by automated tracking. The files that do not have “vision” in their name (and upper case) are the manual tracking data via Anvil. Directory “vision_3d_coord” - X, Y, Z coordinates for the tracked objects. Code: ===== This implementation includes the source codes for 4 different alignment models as described in [1]. The instructions for running each of the methods are explained below: Run HMM1 model (monotonic) ========================== On Anvil data: > python hmm_multivideos.py False True On Vision data: > python hmm_multivideos.py False False Run HMM2 (non-monotonic) ======================== On Anvil data: > python hmm_multivideos.py True True On Vision data: > python hmm_multivideos.py True False Run LHMM1 (monotonic, unobserved) ================================ On Anvil data: > python latent_hmm_multivideos.py False True On Vision data: > python latent_hmm_multivideos.py False False Run LHMM2 (non-monotonic, unobserved) ================================ On Anvil data: > python latent_hmm_multivideos.py True True On Vision data: > python latent_hmm_multivideos.py True False NOTE: ====== We extended the generative alignment model using discriminative Latent CRF model in [2]. Please note that the code in this repository does NOT include the source codes for the discriminative models. We’ll soon release that source codes for the discriminative models separately. References: ========== [1] Unsupervised Alignment of Natural Language Instructions with Video Segments , Iftekhar Naim, Young Chol Song, Qiguang Liu, Henry Kautz, Jiebo Luo, Daniel Gildea, in Proceedings of AAAI 2014. [2] Discriminative Unsupervised Alignment of Natural Language Instructions with Corresponding Video Segments , Iftekhar Naim, Young C. Song, Qiguang Liu, Liang Huang, Henry Kautz, Jiebo Luo and Daniel Gildea, in Proceedings of NAACL, 2015.
About
A hierarchical generative model for aligning video segments with corresponding text descriptions without manual supervision.
Resources
Stars
Watchers
Forks
Releases
No releases published