# Demo of ActionCommonsense

**Purpose**: Run our Action Comet model for custom image input. Specifically,\
**Input**: Image (we currently limit to actions/objects typically found in kitchen)\
**Output**: Text describing\
(i) **actions** that are happening in the image and objects that participate in action (e.g. boiling egg)\
(ii) **pre-conditions** for the action happening in the image (e.g. water is a pre-condition for boiling)\
(iii) most likely **before and after** (effects) scenarios (e.g. contents of egg will become solid)\
-----------\
You need to swich between two python notebooks for this demo\
(i) **Detectron_demo notebook**: Takes user specified image, detect objects and extract features from it\
(ii) **ActionComet_demo notebook**: Takes image features and generate inference

# Step 1: Detectron_demo notebook
Note: GPU required to run this script.

Simply run cells 1-9 to set up proper environment and define useful functions.

Then upload any image of your choice (we currently limit to actions/objects typically found in kitchen) under Google Drive directory action-comet/action_images/. Then provide name of your uploaded image file under 'filename' variable in cell 10.

Run cell 10 to obtain following for your uploaded image (i) detected objects in the image (ii) extracted image features (iii) .json and .pkl files containing image metadata, which are required by inference module.

# Step 2: ActionCommonsense_demo notebook
Note: GPU is not required but highly recommended to run this script.

Simply run cells 1-4 to set up proper environment.

Run cell 5 to obtain inference.

In [None]:
# cell 1
# mount google drive and move to the directory visual-comet
from google.colab import drive
import os
os.chdir('/content/drive/MyDrive/action-commonsense/')

In [None]:
# cell 2
# installation of python 3.6 (it is mandatory to have this version of python in order to run following script)

# this script will ask for a user prompt after a while with following header
# Press <enter> to keep the current choice[*], or type selection number:
# Enter the section number that points to python3.6
!sudo apt-get update -y
!sudo apt-get install python3.6
!sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.6 1
!sudo update-alternatives --config python3
!python --version
!sudo apt-get install python3.6-distutils
!wget https://bootstrap.pypa.io/get-pip.py
# uncomment above command only when you do installation for the first time
!python get-pip.py
!sudo apt install python3-pip
!python -m pip install --upgrade pip

Get:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Get:2 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
Hit:3 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [419 kB]
Get:5 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]
Hit:6 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Get:7 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [108 kB]
Hit:8 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Get:9 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [799 kB]
Get:10 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [1,082 kB]
Hit:11 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Get:12 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages [842 kB]
Get:13 http://security.ubu

In [None]:
# cell 3
# check if python3.6 is correctly installed
!python3.6 -V

/bin/bash: line 1: python3.6: command not found


In [None]:
# cell 4
# install necessary packages/libraries
!python3.6 -m pip install -r requirements.txt

/bin/bash: line 1: python3.6: command not found


Inference using checkpoint

== Ex: Peel a banana

intent ~ high-level goal (for doing an action)\
before ~ precondition\
after ~ effect


In [None]:
import torch
d = torch.load(os.path.join('/content/drive/MyDrive/action-commonsense/experiments/image-inference-80000-ckpt', 'pytorch_model.bin'))
w = d['transformer.detector.regularizing_predictor.weight']
print(w)
print(w.size())
b = d['transformer.detector.regularizing_predictor.bias']
cuda0 = torch.device('cuda:0')
x = torch.tensor([0.],device=cuda0)
#print(x)
w1 = torch.tensor(w[-1],device=cuda0)
print(w1.size())
w1 = w1.view(1,2048)
print(w1)
upw = len(torch.cat((b, x), 0))
upb = len(torch.cat((w, w1), 0))
d['transformer.detector.regularizing_predictor.weight'] = upw
d['transformer.detector.regularizing_predictor.bias'] = upb

tensor([[ 0.0045, -0.0208, -0.0156,  ..., -0.0238, -0.0116,  0.0357],
        [ 0.0083,  0.0123, -0.0151,  ...,  0.0099, -0.0083, -0.0354],
        [ 0.0196,  0.0299, -0.0498,  ...,  0.0026,  0.0021,  0.0211],
        ...,
        [-0.0207, -0.0084, -0.0213,  ...,  0.0153, -0.0061, -0.0061],
        [-0.0043, -0.0021, -0.0004,  ...,  0.0230,  0.0675,  0.0035],
        [-0.0008, -0.0196, -0.0121,  ...,  0.0265, -0.0174, -0.0033]],
       device='cuda:0')
torch.Size([91, 2048])
torch.Size([2048])
tensor([[-0.0008, -0.0196, -0.0121,  ...,  0.0265, -0.0174, -0.0033]],
       device='cuda:0')


  w1 = torch.tensor(w[-1],device=cuda0)


In [None]:
# cell 5
# script to run predictiontions from checkpoint for custom
!rm /content/drive/MyDrive/action-commonsense/visualcomet_data/cached_lm_max_seq_len_128_mode_inference_include_text_true_generation
!python3.6 ./scripts/run_generation.py --data_dir /content/drive/MyDrive/action-commonsense/visualcomet_data/ --model_name_or_path /content/drive/MyDrive/action-commonsense/experiments/image-inference-80000-ckpt --split custom

/bin/bash: line 1: python3.6: command not found


In [None]:
!python3.6 ./scripts/evaluate_generation.py --gens_file /content/drive/MyDrive/action-commonsense/experiments/image-inference-80000-ckpt/val_sample_1_num_5_top_k_0_top_p_0.9.json --refs_file /content/drive/MyDrive/action-commonsense/visualcomet_data/val_annots.json

0it [00:00, ?it/s]6it [00:00, 33644.15it/s]
PTBTokenizer tokenized 464 tokens at 1041.86 tokens per second.
PTBTokenizer tokenized 170 tokens at 642.45 tokens per second.
{'testlen': 141, 'reflen': 137, 'guess': [141, 111, 81, 54], 'correct': [29, 7, 3, 1]}
ratio: 1.0291970802844586
Bleu_1 0.20567375886378955
Bleu_2 0.11388773957510763
Bleu_3 0.07831832563023393
Bleu_4 0.054613386280059045
METEOR 0.08404953397155623
CIDEr 0.14590907228953712
Saving to: /content/drive/MyDrive/action-commonsense/experiments/image-inference-80000-ckpt/val_sample_1_num_5_top_k_0_top_p_0.9.evaluate.json


== Ex: Cut a cake

In [None]:
# cell 5
# script to run predictiontions from checkpoint for custom
!rm /content/drive/MyDrive/visual-comet/visualcomet_data/cached_lm_max_seq_len_128_mode_inference_include_text_true_generation
!python3.6 ./scripts/run_generation.py --data_dir /content/drive/MyDrive/visual-comet/visualcomet_data/ --model_name_or_path /content/drive/MyDrive/visual-comet/experiments/image-inference-80000-ckpt --split val

06/12/2023 22:38:40 - INFO - transformers.configuration_utils -   loading configuration file /content/drive/MyDrive/visual-comet/experiments/image-inference-80000-ckpt/config.json
06/12/2023 22:38:40 - INFO - transformers.configuration_utils -   Model config GPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2VisionAttentiveLMHead"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "do_sample": false,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "eos_token_ids": 0,
  "finetuning_task": null,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1"
  },
  "initializer_range": 0.02,
  "is_decoder": false,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1
  },
  "layer_norm_epsilon": 1e-05,
  "length_penalty": 1.0,
  "max_event": 39,
  "max_inference": 23,
  "max_length": 20,
  "max_place": 22,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_layer": 12,
  "n_positions": 1024,
  "num_beams": 1,
  "num_labels": 2,
  "num_return_

== Ex: Wash carrots

In [None]:
# cell 5
# script to run predictiontions from checkpoint for custom
!rm /content/drive/MyDrive/visual-comet/visualcomet_data/cached_lm_max_seq_len_128_mode_inference_include_text_true_generation
!python3.6 ./scripts/run_generation.py --data_dir /content/drive/MyDrive/visual-comet/visualcomet_data/ --model_name_or_path /content/drive/MyDrive/visual-comet/experiments/image-inference-80000-ckpt --split val

06/12/2023 22:42:53 - INFO - transformers.configuration_utils -   loading configuration file /content/drive/MyDrive/visual-comet/experiments/image-inference-80000-ckpt/config.json
06/12/2023 22:42:53 - INFO - transformers.configuration_utils -   Model config GPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2VisionAttentiveLMHead"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "do_sample": false,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "eos_token_ids": 0,
  "finetuning_task": null,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1"
  },
  "initializer_range": 0.02,
  "is_decoder": false,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1
  },
  "layer_norm_epsilon": 1e-05,
  "length_penalty": 1.0,
  "max_event": 39,
  "max_inference": 23,
  "max_length": 20,
  "max_place": 22,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_layer": 12,
  "n_positions": 1024,
  "num_beams": 1,
  "num_labels": 2,
  "num_return_

== Ex: Cut an apple

In [None]:
# cell 5
# script to run predictiontions from checkpoint for custom
!rm /content/drive/MyDrive/visual-comet/visualcomet_data/cached_lm_max_seq_len_128_mode_inference_include_text_true_generation
!python3.6 ./scripts/run_generation.py --data_dir /content/drive/MyDrive/visual-comet/visualcomet_data/ --model_name_or_path /content/drive/MyDrive/visual-comet/experiments/image-inference-80000-ckpt --split val

06/12/2023 22:46:43 - INFO - transformers.configuration_utils -   loading configuration file /content/drive/MyDrive/visual-comet/experiments/image-inference-80000-ckpt/config.json
06/12/2023 22:46:43 - INFO - transformers.configuration_utils -   Model config GPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2VisionAttentiveLMHead"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "do_sample": false,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "eos_token_ids": 0,
  "finetuning_task": null,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1"
  },
  "initializer_range": 0.02,
  "is_decoder": false,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1
  },
  "layer_norm_epsilon": 1e-05,
  "length_penalty": 1.0,
  "max_event": 39,
  "max_inference": 23,
  "max_length": 20,
  "max_place": 22,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_layer": 12,
  "n_positions": 1024,
  "num_beams": 1,
  "num_labels": 2,
  "num_return_

== Ex: Boil broccoli

In [None]:
# cell 5
# script to run predictiontions from checkpoint for custom
!rm /content/drive/MyDrive/visual-comet/visualcomet_data/cached_lm_max_seq_len_128_mode_inference_include_text_true_generation
!python3.6 ./scripts/run_generation.py --data_dir /content/drive/MyDrive/visual-comet/visualcomet_data/ --model_name_or_path /content/drive/MyDrive/visual-comet/experiments/image-inference-80000-ckpt --split val

06/12/2023 22:50:42 - INFO - transformers.configuration_utils -   loading configuration file /content/drive/MyDrive/visual-comet/experiments/image-inference-80000-ckpt/config.json
06/12/2023 22:50:42 - INFO - transformers.configuration_utils -   Model config GPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2VisionAttentiveLMHead"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "do_sample": false,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "eos_token_ids": 0,
  "finetuning_task": null,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1"
  },
  "initializer_range": 0.02,
  "is_decoder": false,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1
  },
  "layer_norm_epsilon": 1e-05,
  "length_penalty": 1.0,
  "max_event": 39,
  "max_inference": 23,
  "max_length": 20,
  "max_place": 22,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_layer": 12,
  "n_positions": 1024,
  "num_beams": 1,
  "num_labels": 2,
  "num_return_

== Ex: Pour drink

In [None]:
 # cell 5
# script to run predictiontions from checkpoint for custom
!rm /content/drive/MyDrive/visual-comet/visualcomet_data/cached_lm_max_seq_len_128_mode_inference_include_text_true_generation
!python3.6 ./scripts/run_generation.py --data_dir /content/drive/MyDrive/visual-comet/visualcomet_data/ --model_name_or_path /content/drive/MyDrive/visual-comet/experiments/image-inference-80000-ckpt --split val

06/12/2023 22:58:32 - INFO - transformers.configuration_utils -   loading configuration file /content/drive/MyDrive/visual-comet/experiments/image-inference-80000-ckpt/config.json
06/12/2023 22:58:32 - INFO - transformers.configuration_utils -   Model config GPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2VisionAttentiveLMHead"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "do_sample": false,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "eos_token_ids": 0,
  "finetuning_task": null,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1"
  },
  "initializer_range": 0.02,
  "is_decoder": false,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1
  },
  "layer_norm_epsilon": 1e-05,
  "length_penalty": 1.0,
  "max_event": 39,
  "max_inference": 23,
  "max_length": 20,
  "max_place": 22,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_layer": 12,
  "n_positions": 1024,
  "num_beams": 1,
  "num_labels": 2,
  "num_return_