Skip to content

Ravindu-Yasas-Nagasinghe/KEPP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KEPP: Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos-CVPR 2024

paper

This repository gives the official implementation of KEPP:Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos (CVPR 2024)

In our project, we explore the capability of an agent to construct a logical sequence of action steps, thereby assembling a strategic procedural plan. This plan is crucial for navigating from an initial visual observation to a target visual outcome, as depicted in real-life instructional videos. Existing works have attained partial success by extensively leveraging various sources of information available in the datasets, such as heavy intermediate visual observations, procedural names, or natural language step-by-step instructions, for features or supervision signals. However, the task remains formidable due to the implicit causal constraints in the sequencing of steps and the variability inherent in multiple feasible plans. To tackle these intricacies that previous efforts have overlooked, we propose to enhance the agent's capabilities by infusing it with procedural knowledge. This knowledge, sourced from training procedure plans and structured as a directed weighted graph, equips the agent to better navigate the complexities of step sequencing and its potential variations. We coin our approach KEPP, a novel Knowledge-Enhanced Procedure Planning system, which harnesses a probabilistic procedural knowledge graph extracted from training data, effectively acting as a comprehensive textbook for the training domain. Experimental evaluations across three widely-used datasets under settings of varying complexity reveal that KEPP attains superior, state-of-the-art results while requiring only minimal supervision. The main architecture of our model is as follows.

kepp (2)_page-0001

Contents

  1. Setup
  2. Data Preparation
  3. Train Step model
  4. Generate paths from procedure knowlege graph
  5. Inference

Setup

In a conda env with cuda available, run:

pip install -r requirements.txt

Data Preparation

CrossTask

  1. Download datasets&features
cd {root}/dataset/crosstask
bash download.sh
  1. move your datasplit files and action one-hot coding file to {root}/dataset/crosstask/crosstask_release/
mv *.json crosstask_release
mv actions_one_hot.npy crosstask_release

COIN

  1. Download datasets&features
cd {root}/dataset/coin
bash download.sh

NIV

  1. Download datasets&features
cd {root}/dataset/NIV
bash download.sh

Train Step model

  1. First generate the training and testing dataset json files. You can modify the dataset, train steps, horizon(prediction length), json files savepath etc. in args.py. Set the --json_path_train, and --json_path_val in args.py as the dataset json file paths.
cd {root}/step
python loading_data.py 

Dimensions for different datasets are listed below:

Dataset observation_dim action_dim class_dim
CrossTask 1536(how) 9600(base) 105 18
COIN 1536 778 180
NIV 1536 48 5
  1. Train the step model
python main_distributed.py --multiprocessing-distributed --num_thread_reader=8 --cudnn_benchmark=1 --pin_memory --checkpoint_dir=whl --resume --batch_size=256 --batch_size_val=256 --evaluate

The trained models will be saved in {root}/step/save_max.

  1. Generate first and last action predictions for train and test dataset.
  • Modify the checkpoint path(L329) as the evaluated model(in save_max) in inference.py.
  • Modify the --json_path_val , --steps_path , and --step_model_output arguments in args.py to generate step predicted dataset json file paths for train and test datasets seperately. Run following command for train and test datasets seperately by modifying as afore mentioned.
python inference.py --multiprocessing-distributed --num_thread_reader=8 --cudnn_benchmark=1 --pin_memory --checkpoint_dir=whl --resume --batch_size=256 --batch_size_val=256 --evaluate > output.txt

Generate paths from procedure knowlege graph

  1. Train the graph for the relavent dataset (Not compulsory)
cd {root}/PKG
python graph_creation.py
Select mode "train_out_n" 

Trained graphs for CrossTask, COIN, NIV datasets are available on cd {root}/PKG/graphs. Change (L13) graph_save_path of graph_creation.py to load procedure knowledge graphs trained on different datasets.

  1. Obtain PKG conditions for train and test datasets.
  • Modify line 540 of graph_creation.py as the output of step model (--step_model_output).
  • Modify line 568 of graph_creation.py to set the output path for the generated procedure knowlwdge graph conditioned train and test dataset json files.
  • run the following for both train and test dataset files generated from the step model by modifying graph_creation.py file as afore mentioned.
python graph_creation.py
Select mode "validate"

Train plan model

  1. Modify the json_path_train and json_path_val arguments of args.py in plan model as the outputs generated from procedure knowlwdge graph for train and test data respectively.

Modify the parameter --num_seq_PKG in args.py to match the generated amount of PKG conditions. (Modify --num_seq_LLM to the same number as well if LLM conditions are not used seperately.)

cd {root}/plan
python main_distributed.py --multiprocessing-distributed --num_thread_reader=8 --cudnn_benchmark=1 --pin_memory --checkpoint_dir=whl --resume --batch_size=256 --batch_size_val=256 --evaluate

Inference

For Metrics ​Modify the max checkpoint path(L339) as the evaluated model in inference.py and run:

python inference.py --multiprocessing-distributed --num_thread_reader=8 --cudnn_benchmark=1 --pin_memory --checkpoint_dir=whl --resume --batch_size=256 --batch_size_val=256 --evaluate > output.txt

Results of given checkpoints:

dataset SR mAcc MIoU
Crosstask_T=4 21.02 56.08 64.15
COIN_T=4 15.63 39.53 53.27
NIV_T=4 22.71 41.59 91.49

Here we present the qualitative examples of our proposed method. Intermediate steps are padded in the step model because it only predicts the start and end actions.

Checkpoint links will be uploaded soon

Citation

@misc{nagasinghe2024use,
      title={Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos}, 
      author={Kumaranage Ravindu Yasas Nagasinghe and Honglu Zhou and Malitha Gunawardhana and Martin Renqiang Min and Daniel Harari and Muhammad Haris Khan},
      year={2024},
      eprint={2403.02782},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

In case of any query, create issue or contact ravindunagasinghe1998@gmail.com

Acknowledgement

This codebase is built on PDPP

About

[CVPR 2024] KEPP: Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published