# This notebook contains deliverables of MisalignSRL dataset
> Author: Yayuan Li (yayuanli@umich.edu)

## update (refert to meeting 20240717)
- download the new index file from `https://prism.eecs.umich.edu/yayuanli/z_web/dataset/Ego4D_Mistake/misalignsrl_more_samples_same_split_combinedwords_wohand_objectstatesafe.parquet`
- set `parquet_path` and read/run this section to demonstrate
- downstream usage of this index file should be compatible with previous delivered version.

In [1]:
# load original data
import pandas as pd

# load parquet to pandas DF
parquet_path = '/home/yayuanli/fun/mistake_detection/fine_grained_action_mistake_detection/dataset/ego4d_fho_main/misalignsrl_more_samples_same_split_combinedwords_wohand_objectstatesafe.parquet'
misalignsrl = pd.read_parquet(parquet_path)

### update 1: make sure misalignsrl samples are object-state-safe
For a misaligned video sample `A`, make sure the target narration `B` did not happened before in the parent long video (`C`) of `A`.

Otherwise, although the content of `A` itself is misaligned with `B`, in the background, the object state may show `B` has been finished.

*Following are the examples of the filtered object-state-unsafe samples. This is just for testing/demonstration. You do not need this txt file to reproduce this.*

In [4]:
# read txt file (first 10 lines)
txt_path = "/home/yayuanli/fun/mistake_detection/fine_grained_action_mistake_detection/src/object_unsafe_samples.txt"
with open(txt_path, 'r') as f:
    for i in range(10):
        print(f.readline())
        

candicate_misalign_sample: ('42e4a840-68a1-4992-923d-7452500b4218_68880_69120', 'C drops the dough in the dough sheeter') 

 target_narration: {'V': 'add', 'ARG1': 'dough'} 

 target_matched_history_sample: ('42e4a840-68a1-4992-923d-7452500b4218_850_1090', 'C adds the scraped dough to the dough')

====

candicate_misalign_sample: ('42e4a840-68a1-4992-923d-7452500b4218_64442_64682', 'C pours out the dough in the bowl on the table') 

 target_narration: {'V': 'add', 'ARG1': 'dough'} 

 target_matched_history_sample: ('42e4a840-68a1-4992-923d-7452500b4218_850_1090', 'C adds the scraped dough to the dough')

====

candicate_misalign_sample: ('42e4a840-68a1-4992-923d-7452500b4218_63153_63393', 'C throws a dough in a bowl on a scale') 

 target_narration: {'V': 'add', 'ARG1': 'dough'} 



### update 2: For ARG1 (noun), if there is a combined word (e.g., "nail gun"), use it as the matching word. If no combined words, use the first word that is SRLed as ARG1

Examples: 

```bash
C places the wood plank on the shooting board. ['place']_['wood plank']
C smears the watercolor on the paint brush  ['smear']_['watercolor']
C checks the cloth ['check']_['cloth']
C places the plant inside the flower pot with his right hand ['place']_['plant']
C drops the wooden spoon on the pot with her right hand ['drop']_['wooden spoon']
C puts water on a plastic  bowl ['put']_['water']
C lifts the onion  ['lift']_['onion']
C picks up a card ['pick']_['card']
C pulls out the needle from the bag. ['pull']_['needle']
C sharpens the knife.  ['sharpen']_['knife']
...
```

### update 3 (a request): return the matching word (the word used for determine (mis)alignment)
It is the column `valid_txt_verb_single` for `V` and `valid_txt_noun_single` for `ARG1`.

Note that they are all one element list. I.e., there is only one matching word for each semantic role (as described in update1, it is the first combined word or the first word)

Also, note that it is a list stored as str during the `pd.to_csv` process. One could recover it to list by `eval()`


In [5]:
avalue = misalignsrl["valid_txt_noun_single"].iloc[0]
print(f"original value and data type: {avalue} {type(avalue)}")
print(f"converted value and data type: {eval(avalue)} {type(eval(avalue))}")



original value and data type: ['paper bag'] <class 'str'>
converted value and data type: ['paper bag'] <class 'list'>


### update 4: filter out the misalignsrl samples where "hand" is the matching word for ARG1
ratio of narrations remained after filtering out 'hand': 0.9634594285169108

In [6]:
# demonstration
sum(misalignsrl["valid_txt_noun_single"] == "['hand']")

0

In [7]:
# the bad example Shane showed me is not there any more
misalignsrl[(misalignsrl["video_uid"] == "c7d5d40f-840c-4be0-b79d-ab41394479a2") & (abs(misalignsrl["narration_timestamp_sec"]-804.11388) < 0.01)]

Unnamed: 0,video_uid_narration_timestamp_sec,narr_uid,narration_text,MisalignSRL_V,MisalignSRL_ARG1,MisalignSRL_V_ARG1,alignment_group_idx,video_uid,start_frame,end_frame,...,clip_end_sec,clip_start_frame,clip_end_frame,narration_timestamp_sec,clip_narration_timestamp_sec,noun_vec_coarse,noun_vec_fine,verb_vec_coarse,verb_vec_fine,narration_annotation_uid


## Request on 20240627 (Slack)
Initial request
```bash
Hi Yayuan! I’m wondering if there’s a way for me to access your SRL results on the ego4d narrations? I’m thinking I might need them to filter out some types of actions that are kind of out of scope for my work.
```

Response:
- down load `https://prism.eecs.umich.edu/yayuanli/z_web/dataset/Ego4D_Mistake/v1/misalignsrl_more_samples_same_split.parquet`
- Put the path to `parquet_path`
- run this section

In [15]:
import pandas as pd
import random

# load parquet to pandas DF
parquet_path = '/nfs/turbo/coe-chaijy/sstorks/simulation_informed_pcr4nlu/TRAVEl/ego4d_mismatch_srl_files/misalignsrl_more_samples_same_split_combinedwords_wohand_objectstatesafe.parquet'
misalignsrl = pd.read_parquet(parquet_path)
# select a random row
row = misalignsrl.iloc[random.randint(0, len(misalignsrl))]
# print narration, V, ARG1
print(f"narration_text: {row['narration_text']}")
print(f"V: {row['V']}, ARG1: {row['ARG1']}")
print(f"V (as group): {row['valid_txt_verb_single']}, ARG1 (as group): {row['valid_txt_noun_single']}")
# print columns
print(f"columns: {misalignsrl.columns}")

narration_text: C turns off the tap 
V: turns, ARG1: the tap
V (as group): ['turn'], ARG1 (as group): ['tap']
columns: Index(['video_uid_narration_timestamp_sec', 'narr_uid', 'narration_text',
       'MisalignSRL_V', 'MisalignSRL_ARG1', 'MisalignSRL_V_ARG1',
       'alignment_group_idx', 'video_uid', 'start_frame', 'end_frame', 'ARG0',
       'V', 'ARG1', 'valid_txt_noun', 'valid_txt_verb',
       'valid_txt_verb_single', 'valid_txt_noun_single', 'start_sec',
       'end_sec', 'clip_start_sec', 'clip_end_sec', 'clip_start_frame',
       'clip_end_frame', 'narration_timestamp_sec',
       'clip_narration_timestamp_sec', 'noun_vec_coarse', 'noun_vec_fine',
       'verb_vec_coarse', 'verb_vec_fine', 'narration_annotation_uid'],
      dtype='object')


## Previous requests (before 20240627)
- https://github.com/shanestorks/TRAVEl/pull/23
- https://github.com/shanestorks/TRAVEl/pull/3