# Extracting Ground Truth Face and Cut Labels

A script for extracting ground truth face and cut labels from `txt` files with hand coding data in [the Gaze Data for the Analysis of Attention in Feature Films dataset](http://graphics.stanford.edu/~kbreeden/gazedata.html).

Running the last cell (setting `desired_feature` according to what feature you want to extract ground-truth labels for — e.g. `face` or `cut`) will save a pickled version of a ground truth label dictionary for that feature for each hand coding `txt` file.
The dictionary will have keys of frame numbers and values indicating whether or not that frame contains the desired feature (0 means the frame does not contain the feature; 1 means that it does).

In [1]:
import os
import pickle

In [2]:
def extract_labels(hand_coding_file, desired_feature):
    """
    Extracts hand-coded ground truth labels for a feature for each frame in a Gaze dataset video clip.
    
    Parameters:
    - hand_coding_file: the filename of the handcoding file to parse for ground truth labels, ends with a ".txt" extension.
    - desired_feature: the feature we want to extract the ground-truth label for. (e.g. "face" or "cut")
    
    Returns a dictionary where keys are frame numbers, and values are whether or not the frame contains the desired feature (0 if not in the frame; 1 if it is).
    """
    with open(hand_coding_file) as f:
        lines = [line for line in f]

    # Figure out the last frame number from the line formatted like "end {start_frame_num} {end_frame_num}"
    for i in range(0, len(lines)):
        if "end" in lines[i] and "//" not in lines[i]:
            end_str, end_frame_num_str, end_frame_num_str_again = lines[i].split()

    # Populate initial dictionary with keys from start frame number (which is 1) to end frame number,
    # all with initial values of 0 (no instance of the feature in the frame).
    # This dictionary will contain keys of frame numbers, and values of 0 or 1 indicating whether or not the frame contains the desired feature.
    # 0 means the frame does not contain the feature; 1 means that it does.
    val = 0
    frame_to_feature_dict = {key:val for key in range(1, int(end_frame_num_str) + 1)}

    # Determine what hand coding labels correspond to desired feature
    # FACES
    # f: a single face is present onscreen
    # fa: a non-human face is present onscreen
    # ff: multiple faces (human and/or nonhuman) are onscreen
    # (note: here, we assume that all frames labeled ff have at least 1 human)

    # CUTS
    # c: plain cut
    # mmc: motion matched cut
    # xf: cross fade
    feature_to_label_map = {"face": ["f", "ff"], "cut": ["c", "mmc", "xf"]}
    desired_labels = feature_to_label_map[desired_feature]

    for i in range(1, len(lines)):
        line = lines[i]
        cleaned_line = line.replace("\n", "") # remove newline character
        if cleaned_line and "//" not in cleaned_line: # comments all begin with //
            feature, start_frame_num_str, end_frame_num_str = cleaned_line.split()
            start_frame_num = int(start_frame_num_str)
            end_frame_num = int(end_frame_num_str)
            # If this line is for a label indicating the desired feature...
            if feature in desired_labels:
                # For every frame in the frame range this label applies to, mark that that frame contains the desired feature
                # by setting the frame number's value in the dictionary to 1
                for frame_num in range(start_frame_num, end_frame_num + 1):
                    frame_to_feature_dict[frame_num] = 1
    
    return frame_to_feature_dict        

**Run the block below to run extract ground-truth labels from the `txt` files in the `hand_coding` folder.**

Update `desired_feature` variable depending on what feature ("face" or "cut") you want to extract the ground truth labels for.

In [3]:
# Options: "face", "cut"
desired_feature = "cut"
        
# Get list of all files and directories in hand_coding directory
file_list = os.listdir("hand_coding")

# For each clip hand coding txt file...
for filename in file_list:
    # We only want to process our hand coding files, which are txt files
    if filename.endswith(".txt"):
        clip_with_hcode_suffix, extension = filename.split(".")
        
        # Using hand coding file, obtain a dictionary where keys are frame numbers and values indicate whether or not
        # the frame contains the desired feature.
        # (0 means the frame doesn't contain the desired feature; 1 means the frame does).
        frame_to_feature_dict = extract_labels("hand_coding/" + filename, desired_feature)
        
        # If a directory for pickled ground truth label dictionaries does not currently exist, make that directory
        ground_truth_dict_dir_path = "ground_truth_" + desired_feature + "_label_dicts"
        if not os.path.isdir(ground_truth_dict_dir_path):
            ! mkdir {ground_truth_dict_dir_path}
        
        # Save (serialize) pickle of the ground truth label dictionary
        pickled_dict_filepath = ground_truth_dict_dir_path + "/" + clip_with_hcode_suffix + "_frame_to_" + desired_feature + "_dict" + ".pkl"
        with open(pickled_dict_filepath, "wb") as f:
            pickle.dump(frame_to_feature_dict, f)
        
        # Test loading (deserializing) pickled data to make sure pickled file saved correctly
        # with open(pickled_dict_filepath, "rb") as f:
        #     print(pickle.load(f))