# Notebook 4: Test Exploratory Coding (First Iteration)
This notebook documents the first batch of integrated coding with the VLM. This was applied to a random 5 videos constituted of 95 frames from outside the date range of the main dataset.

**Prompt List:**
1. Prompt 3a

> You are an expert annotator of social media videos. You are provided a still image from a video. Your job is to analyze the backdrop of the image and classify it into the following five mutually exclusive setting categories. Ignore any text overlay or captions. For the provided image, select the one category that best describes its background or setting based on its definition: Graphics: Image does not contain live-action imagery (e.g., animations, CGI, black screens); Outdoor combat: Image set outdoors containing explicit signs of warfare such as visible destruction, military hardware or active combat; Outdoor non-combat: Image set outdoors WITHOUT explicit signs of warfare. This includes urban or natural landscapes without weapons or destruction; Indoor combat: Image set indoors containing explicit signs of warfare such as visible destruction, rubble, military hardware or active combat. This includes damaged interiors and confined combat settings (such as tunnels); Indoor non-combat: Image set indoors WITHOUT explicit signs of warfare. This includes studio environments and homes without weapons or destruction. Analyze the provided still and reply with one of these exact labels: “Graphics”, “Outdoor combat”, “Outdoor non‑combat”, “Indoor combat”, or “Indoor non‑combat”.

---

2. Prompt 3b (Includes motivation)

> You are an expert annotator of social media videos. You are provided a still image from a video. Your job is to analyze the backdrop of the image and classify it into the following six mutually exclusive setting categories. Ignore any text overlay or captions. For the provided image, select the one category that best describes its background or setting based on its definition: Graphics: Image is artificial or non–live‑action imagery (e.g., animations, CGI, black screens); Outdoor combat: Image set outdoors containing explicit signs of warfare such as visible destruction, military hardware or active combat; Outdoor non-combat: Image set outdoors WITHOUT explicit signs of warfare. This includes urban or natural landscapes without weapons or destruction; Indoor combat: Image set indoors containing explicit signs of warfare such as visible destruction, rubble, military hardware or active combat. This includes damaged interiors and confined combat settings (such as tunnels); Indoor non-combat: Image set indoors WITHOUT explicit signs of warfare. This includes studio environments and homes without weapons or destruction. Respond with a Python-style tuple in the format: ("classification", "motivation") - "classification" must be one of: "Graphics", "Outdoor combat", "Outdoor non-combat", "Indoor combat", or "Indoor non-combat"; "motivation" is a justification of your classification in 50 words or fewer, explaining what visual elements in the image led to your choice.

---

3. Prompt 3c (Motivation and option for "Uncertain")

> You are an expert annotator of social media videos. You are provided a still image from a video. Your job is to analyze the backdrop of the image and classify it into the following six mutually exclusive setting categories. Ignore any text overlay or captions. For the provided image, select the one category that best describes its background or setting based on its definition: Graphics: Image is artificial or non–live‑action imagery (e.g., animations, CGI, black screens); Outdoor combat: Image set outdoors containing explicit signs of warfare such as visible destruction, military hardware or active combat; Outdoor non-combat: Image set outdoors WITHOUT explicit signs of warfare. This includes urban or natural landscapes without weapons or destruction; Indoor combat: Image set indoors containing explicit signs of warfare such as visible destruction, rubble, military hardware or active combat. This includes damaged interiors and confined combat settings (such as tunnels); Indoor non-combat: Image set indoors WITHOUT explicit signs of warfare. This includes studio environments and homes without weapons or destruction. However, if you are uncertain, return 'Uncertain'.  Respond with a Python-style tuple in the format: ("classification", "motivation") - "classification" must be one of: "Graphics", "Outdoor combat", "Outdoor non-combat", "Indoor combat", "Indoor non-combat", or “Uncertain; "motivation" is a justification of your classification in 50 words or fewer, explaining what visual elements in the image led to your choice or uncertainty.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
import face_recognition
import re
import ast
import krippendorff
import networkx as nx

In [None]:
# Establish directory
frames_folder = ''
coding_main_vlm_path = ''
coding_m_path = ''
coding_c_path = ''

# Clean up original VLM Data
This section takes the original VLM data and parses it

In [None]:
df_coding_main = pd.read_csv(coding_main_vlm_path)
df_coding_main

In [None]:
# Function for safe eval of tuples
def safe_eval(value):
    try:
        return ast.literal_eval(value)
    except (ValueError, SyntaxError):
        print(f"Error parsing '{value}'")
        return None  # Return None or any fallback value for invalid strings


In [None]:
# Convert column into tuple
df_coding_main['setting_vlm_3b_tuple'] = df_coding_main['setting_vlm_3b_tuple'].apply(safe_eval)

In [None]:
# Split tuples into results and motivation
for i, scene in df_coding_main.iterrows():
    tuple_value = scene['setting_vlm_3b_tuple']
    if tuple_value is not None:
        result = tuple_value[0]
        motivation = tuple_value[1]
        df_coding_main.loc[i, 'setting_vlm_3b'] = result
        df_coding_main.loc[i, 'setting_vlm_3b_motivation'] = motivation
        print(f"Classification result: {result} Motivation: {motivation}")
        print()


In [None]:
# Repeat for 03
df_coding_main['setting_vlm_3c_tuple'] = df_coding_main['setting_vlm_3c_tuple'].apply(safe_eval)

# Split tuples into results and motivation
for i, scene in df_coding_main.iterrows():
    tuple_value = scene['setting_vlm_3c_tuple']
    if tuple_value is not None:
        result = tuple_value[0]
        motivation = tuple_value[1]
        df_coding_main.loc[i, 'setting_vlm_3c'] = result
        df_coding_main.loc[i, 'setting_vlm_3c_motivation'] = motivation
        print(f"Classification result: {result} Motivation: {motivation}")
        print()

In [None]:
df_coding_main

In [None]:
# Coerce columns formatting
df_coding_main['setting_vlm_3a'] = df_coding_main['setting_vlm_3a'].str.lower().str.capitalize()
df_coding_main['setting_vlm_3b'] = df_coding_main['setting_vlm_3b'].str.lower().str.capitalize()
df_coding_main['setting_vlm_3c'] = df_coding_main['setting_vlm_3c'].str.lower().str.capitalize()

df_coding_main

# Bring in manually coded sheets
`df_coding_m` is the primary coder whereas `df_coding_c` is the secondary coder.

In [None]:
df_coding_m = pd.read_csv(coding_m_path)
df_coding_c = pd.read_csv(coding_c_path)

In [None]:
df_coding = pd.merge(df_coding_main, df_coding_m, how='left')
df_coding

In [None]:
df_coding = pd.merge(df_coding, df_coding_c, how='left')
df_coding

In [None]:
# Compare distributions quickly
print(df_coding['setting_m'].value_counts())
print(df_coding['setting_c'].value_counts())
print(df_coding['setting_vlm_3a'].value_counts())
print(df_coding['setting_vlm_3b'].value_counts())
print(df_coding['setting_vlm_3c'].value_counts())


In [None]:
df_coding.to_csv('', index=False)

In [None]:
df_coding = pd.read_csv('')

# Analysis of results
Here we will review the results of the coding, by calculating the intercoder reliability as well as manually reviewing the motivations provided by the VLM.

In [None]:
def preprocess_for_alpha(df):
    """
    Preprocess DataFrame columns to encode categorical values as numeric.
    This dynamically handles categorical data and missing values (NaN).
    """
    df_processed = df.copy()

    # Use pandas' factorization to encode categories into integer values
    for column in df_processed.columns:
        df_processed[column], _ = pd.factorize(df_processed[column], use_na_sentinel=True)
        # Replace factorized -1 (used for NaN) with np.nan for proper handling
        # df_processed[column] = df_processed[column].replace(-1, np.nan)

    return df_processed


def calc_alpha(df, level_of_measurement='nominal'):

    # Preprocess the data
    # df_numeric = preprocess_for_alpha(df)

    ratings = df.to_numpy().T

    alpha = krippendorff.alpha(ratings, level_of_measurement=level_of_measurement)
    # print(f"Krippendorff's Alpha: {alpha}")
    return alpha


In [None]:
# Manually refactor code categories
setting_refactor = {'Graphics' : 1,
                    'Indoor combat': 2,
                    'Indoor non-combat': 3,
                    'Outdoor combat': 4,
                    'Outdoor non-combat': 5,
                    'Uncertain': np.nan}

In [None]:
# Check Intercoder reliability
# Define the combinations of columns to be compared
df_intercoder_alpha = pd.DataFrame([
    {'column_1':'setting_m','column_2':'setting_c','type':'setting'},

    {'column_1':'setting_m','column_2':'setting_vlm_3a','type':'setting'},
    {'column_1':'setting_c','column_2':'setting_vlm_3a','type':'setting'},

    {'column_1':'setting_m','column_2':'setting_vlm_3b','type':'setting'},
    {'column_1':'setting_c','column_2':'setting_vlm_3b','type':'setting'},

    {'column_1':'setting_m','column_2':'setting_vlm_3c','type':'setting'},
    {'column_1':'setting_c','column_2':'setting_vlm_3c','type':'setting'},

    {'column_1':'setting_vlm_3a','column_2':'setting_vlm_3b','type':'setting'},
    {'column_1':'setting_vlm_3a','column_2':'setting_vlm_3c','type':'setting'},
    {'column_1':'setting_vlm_3b','column_2':'setting_vlm_3c','type':'setting'},

])


In [None]:
for i, row in df_intercoder_alpha.iterrows():
    column_1 = row['column_1']
    column_2 = row['column_2']
    coding_type = row['type']
    if coding_type == 'setting':
        refactor_dict = setting_refactor

    df_intercoder = df_coding[[column_1, column_2]].copy()
    df_intercoder[column_1] =  df_intercoder[column_1].map(refactor_dict)
    df_intercoder[column_2] =  df_intercoder[column_2].map(refactor_dict)


    df_intercoder = df_intercoder.replace(-1, np.nan)

    # print(df_intercoder)

    df_intercoder_alpha.loc[i, 'intercoder_alpha'] = calc_alpha(df_intercoder[[column_1, column_2]])



In [None]:
df_intercoder_alpha

Reviewing this, we can see that the VLM is generally stable within their annotations. It is particularly strong with and without the "Uncertain" option. This is a good sign.

Between the human coders, the agreement is acceptable but could be better. The agreements with the VLM were less than the minimum acceptable threshold of 0.667. Interestingly, the VLM agreed with the secondary coder moreso than myself.

In [None]:
# Filter for visualization to avoid clutter
df_intercoder_alpha_select = df_intercoder_alpha.loc[[0,3,4]]
df_intercoder_alpha_select

In [None]:
print(df_intercoder_alpha_select)

In [None]:
# Visualize as network diagram
def viz_agreement_network(df, output_path=None):

    # Initialize the graph
    G = nx.Graph()

    # List of nodes
    nodes = list(set(df['column_1'].tolist() + df['column_2'].tolist()))
    G.add_nodes_from(nodes)

    # Add edges with weights
    for i, row in df.iterrows():
        G.add_edge(row['column_1'], row['column_2'], weight=row['intercoder_alpha'])

    # Plot the graph
    plt.figure(figsize=(10,10))
    pos = nx.spring_layout(G)  # Layout algorithm for positions
    edges = G.edges(data=True)

    # Prepare edge widths based on intercoder_alpha
    widths = [d['weight'] * 30 for (u, v, d) in edges]  # multiplied for visibility

    nx.draw(G, pos, with_labels=True, node_size=2000, node_color='skyblue',
            width=widths, edge_color='blue', alpha=0.6,
            font_size=10, font_weight='bold')

    # # Add edge labels to show scores
    # edge_labels = {(u, v): f"{d['weight']:.2f}" for u, v, d in edges}
    # nx.draw_network_labels(G, pos, edge_labels=edge_labels, font_color='red')

    plt.title("Intercoder Agreement Network")
    plt.axis('off')
    if output_path is not None:
        plt.savefig(output_path)
    plt.show()

In [None]:
viz_agreement_network(df_intercoder_alpha_select, '')

Because they are so near each other, they look visually identical with one another unfortunately.

# Visualizing results for review
We can now visualize the results and motivations to review.
In the interest of reducing clutter, only prompt 3b and 3c were reviewed.

In [None]:
import textwrap
# Visualize each prediction (setting)
for index, scene in df_coding.iterrows():
    video_id = scene['url']
    scene_id = scene['id']

    cleaned_id = re.sub(r'\.mp4', '', scene_id)
    image_path = os.path.join(frames_folder, str(video_id), f'{cleaned_id}.jpeg')
    image = face_recognition.load_image_file(image_path) # load image

    plt.figure(figsize=(10, 5))
    plt.title(f"Coding result for {scene_id}", fontsize=12)
    plt.axis('off')
    plt.imshow(image)

    # Set up textwrap
    max_width = 30

    setting_m = scene['setting_m']
    setting_c = scene['setting_c']
    setting_vlm_3b = scene['setting_vlm_3b']
    setting_vlm_3b_motivate = scene['setting_vlm_3b_motivation']
    wrapped_motivation_3b = textwrap.fill(setting_vlm_3b_motivate, width=max_width)

    setting_vlm_3c = scene['setting_vlm_3c']
    setting_vlm_3c_motivate = scene['setting_vlm_3c_motivation']
    wrapped_motivation_3c = textwrap.fill(setting_vlm_3c_motivate, width=max_width)




    if setting_m == setting_c:
        human_match = 'Match'
    else:
        human_match = 'No match'

    if setting_c == setting_vlm_3b and setting_m == setting_vlm_3b:
        vlm_match_3b = 'Match'
    else:
        vlm_match_3b = 'No match'

    if setting_c == setting_vlm_3c and setting_m == setting_vlm_3c:
        vlm_match_3c = 'Match'
    else:
        vlm_match_3c = 'No match'


    annotation = f'Pri Coder: {setting_m} \nSec Coder: {setting_c}\n' \
    f'Human coder match: {human_match}\n\n' \
    f'VLM (3B): {setting_vlm_3b}\nMotivation: {wrapped_motivation_3b}\n' \
    f'VLM match: {vlm_match_3b}\n\n' \
    f'VLM (3C): {setting_vlm_3c}\nMotivation: {wrapped_motivation_3c}\n' \
    f'VLM match: {vlm_match_3c}' \
    # Annotate on the middle-right side of the image
    plt.annotate(
        annotation,
        xy=(1.05, 0.5),  # Position: 1.05 means slightly to the right of the axes
        xycoords='axes fraction',  # Position relative to axes (not data coordinates)
        fontsize=10,
        ha='left',  # Align text to the left
        va='center',  # Vertically centered
        bbox=dict(boxstyle='round,pad=0.3', edgecolor='black', facecolor='white')  # Add highlight box
    )
    # Adjust spacing
    plt.grid(False)
    plt.tight_layout()
    plt.savefig(f'')
    plt.show()




# Reprocessing and splitting setting
We realized that the combined dimensions were likely depressing the agreement results. Thus, we decided to split them.

In [None]:
df_coding = pd.read_csv('')

In [None]:
def split_setting(setting_str):

    combat_presence = ''
    location = ''
    if setting_str == 'Graphics':
        location = 'Graphics'
        combat_presence = 'Graphics'
    elif setting_str == 'Uncertain':
        location = 'Uncertain'
        combat_presence = 'Uncertain'
    else:
        if 'Outdoor' in setting_str:
            location = 'Outdoor'
        elif 'Indoor' in setting_str:
            location = 'Indoor'
        if 'non-combat' in setting_str:
            combat_presence = 'Non-combat'
        elif 'combat' in setting_str and 'non-combat' not in setting_str:
            combat_presence = 'Combat'

    # Error checking
    if combat_presence == '' or location == '':
        print(f"Error: Setting {setting_str} is not recognized.")

    return location, combat_presence





In [None]:
df_coding['location_vlm_3c'], df_coding['combat_presence_vlm_3c'] = zip(*df_coding['setting_vlm_3c'].apply(split_setting))

df_coding['location_vlm_3b'], df_coding['combat_presence_vlm_3b'] = zip(*df_coding['setting_vlm_3b'].apply(split_setting))
df_coding['location_vlm_3a'], df_coding['combat_presence_vlm_3a'] = zip(*df_coding['setting_vlm_3a'].apply(split_setting))
df_coding['location_m'], df_coding['combat_presence_m'] = zip(*df_coding['setting_m'].apply(split_setting))
df_coding['location_c'], df_coding['combat_presence_c'] = zip(*df_coding['setting_c'].apply(split_setting))


In [None]:
df_coding



In [None]:
# update
df_coding.to_csv('')

In [None]:
# Establish new refactor

location_refactor = {'Graphics' : 1,
                    'Indoor': 2,
                    'Outdoor': 3,
                    'Uncertain': np.nan}

combat_presence_refactor = {'Graphics' : 1,
                    'Combat': 2,
                    'Non-combat': 3,
                    'Uncertain': np.nan}

In [None]:
# Check Calculate new intercoder agreement

# Define the columns to be compared
df_intercoder_location_combat_presence = pd.DataFrame([

    {'column_1':'location_m','column_2':'location_c','type':'location'},
    {'column_1':'location_m','column_2':'location_vlm_3a','type':'location'},
    {'column_1':'location_c','column_2':'location_vlm_3a','type':'location'},
    {'column_1':'location_m','column_2':'location_vlm_3b','type':'location'},
    {'column_1':'location_c','column_2':'location_vlm_3b','type':'location'},
    {'column_1':'location_m','column_2':'location_vlm_3c','type':'location'},
    {'column_1':'location_c','column_2':'location_vlm_3c','type':'location'},
    {'column_1':'location_vlm_3a','column_2':'location_vlm_3b','type':'location'},
    {'column_1':'location_vlm_3a','column_2':'location_vlm_3c','type':'location'},
    {'column_1':'location_vlm_3b','column_2':'location_vlm_3c','type':'location'},

    {'column_1':'combat_presence_m','column_2':'combat_presence_c','type':'combat_presence'},
    {'column_1':'combat_presence_m','column_2':'combat_presence_vlm_3a','type':'combat_presence'},
    {'column_1':'combat_presence_c','column_2':'combat_presence_vlm_3a','type':'combat_presence'},
    {'column_1':'combat_presence_m','column_2':'combat_presence_vlm_3b','type':'combat_presence'},
    {'column_1':'combat_presence_c','column_2':'combat_presence_vlm_3b','type':'combat_presence'},
    {'column_1':'combat_presence_m','column_2':'combat_presence_vlm_3c','type':'combat_presence'},
    {'column_1':'combat_presence_c','column_2':'combat_presence_vlm_3c','type':'combat_presence'},
    {'column_1':'combat_presence_vlm_3a','column_2':'combat_presence_vlm_3b','type':'combat_presence'},
    {'column_1':'combat_presence_vlm_3a','column_2':'combat_presence_vlm_3c','type':'combat_presence'},
    {'column_1':'combat_presence_vlm_3b','column_2':'combat_presence_vlm_3c','type':'combat_presence'},


])


In [None]:
# Calculate reliability
for i, row in df_intercoder_location_combat_presence.iterrows():
    column_1 = row['column_1']
    column_2 = row['column_2']
    coding_type = row['type']
    # if coding_type == 'setting':
    #     refactor_dict = setting_refactor
    #
    if coding_type == 'location':
        refactor_dict = location_refactor
    elif coding_type == 'combat_presence':
        refactor_dict = combat_presence_refactor

    df_intercoder = df_coding[[column_1, column_2]].copy()
    df_intercoder[column_1] =  df_intercoder[column_1].map(refactor_dict)
    df_intercoder[column_2] =  df_intercoder[column_2].map(refactor_dict)


    df_intercoder = df_intercoder.replace(-1, np.nan)

    # print(df_intercoder)

    df_intercoder_location_combat_presence.loc[i, 'intercoder_alpha'] = calc_alpha(df_intercoder[[column_1, column_2]])

In [None]:
df_intercoder_location_combat_presence

As suspected, combat presence was specifically the culprit of deflating the agreement scores. In fact, the VLM's agreement with myself and the secondary coder separately was higher than myself with the secondary coder! This thus allows us to focus on combat presence as a problem point.

In [None]:
# Filter and visualize as networks again
df_intercoder_location = df_intercoder_location_combat_presence.loc[[0,3,4]]
df_intercoder_location

In [None]:
viz_agreement_network(df_intercoder_location, '')

In [None]:
df_intercoder_combat_presence = df_intercoder_location_combat_presence.loc[[10,13,14]]
df_intercoder_combat_presence

In [None]:
viz_agreement_network(df_intercoder_combat_presence, '')