image_id	video_id	thumbnail	show_face	show_body	show_product	show_how_to_use_product	clothes	skin_exposure	location_type	personal_spaces	family_and_friends	camera_focus	lighting_and_ambiance	race_ethnicity	skin_color	hair_color	hair_style	eye_color	physical_build_and_fitness_cues	gender	transcript				

Visual Variables Breakdown:
- show_face: 0 or 1 Indicates whether the YouTuber’s face is visible in the video.
- show_body: 0 or 1 Indicates visibility of body parts (e.g., hands, hair, nails, chest).
- show_product: 0 or 1 Identifies whether the product is physically shown in the video.
- show_how_to_use_product: 0 or 1 Indicates if the creator demonstrates how to use the product.
- clothes: ordinal variable: (0 = formal, 1 = casual, 2 = intimate) Rates the level of physical exposure based on attire.
- skin_exposure: ordinal variable: (0 = fully covered, 1 = minimally exposed, 2 = partially covered, 3 = minimally covered, 4 = fully exposed) Rates the level of skin exposure.
- location_type: Categorical (0 = studio, 1 = park, 2 = home, 3 = bathroom, 4 = public/open space) Classifies the detected background location.
- personal_spaces: Binary (0 = public/open, 1 = private) Captures whether the video is filmed in private spaces like home or bathroom.
- family_and_friends: Binary (0 = no, 1 = yes) Identifies if family or friends appear in the video.
- camera_focus: Binary (0 = person, 1 = product) Identifies whether the content focuses on the product or the person
- lighting_and_ambiance: Binary (0 = natural, 1 = artificial) Tracks the type of lighting or ambiance used in the video.
- race_ethnicity: Categorical (0 = white, 1 = black, 2 = hispanic, 3 = asian, 4 = other) Classifies the detected race or ethnicity.
- skin_color: Categorical (0 = light, 1 = medium, 2 = dark) Observed skin tone
- hair_color: Categorical (0-5) Observed hair color
- hair_style: Categorical (0-5) Observed hair style
- eye_color: Categorical (0-5) Observed eye color
- physical_build_and_fitness_cues: Categorical (0-4) 1 = Slim, 2 = Athletic, 3 = Curvy, 4 = Muscular. Observed physical build and fitness cues
- gender: 0 = Male, 1 = Female, 2 = Not applicable. Observed gender



In [1]:
from dotenv import load_dotenv
import os
# Load environment variables
load_dotenv()

# Get YouTube API key
API_KEY = os.getenv("H_YOUTUBE_API_KEY")
if not API_KEY:
    raise ValueError("API key not found. Please ensure the '.env' file is set up correctly.")

In [None]:
#%pip install google-genai
from google import genai
from google.genai import types
#from google.genai import types
import PIL.Image
import time

# Replace with your actual API key
client = genai.Client(api_key=API_KEY)
def analyze_screenshot(image_path, model="gemini-2.0-flash"):
    """
    Analyzes a screenshot of a YouTube product review video and extracts visual variables.

    Args:
        image_path: Path to the screenshot image file.

    Returns:
        A dictionary containing the extracted visual variables, or None if an error occurred.  Also returns
        intermediate descriptions.
    """

    try:
        # Load the image
        img = PIL.Image.open(image_path)
        #b64_image = types.Part.from_bytes(pathlib.Path(image_path_2).read_bytes(), "image/jpeg")

        # 1.  Model Setup (for image analysis, NOT generation)

        # 2.  Prompt Construction (Structured for Gemini)

        # --- Prompt Section 0: Overall Image Description ---
        prompt_part0 = "You are given an image of a YouTube product review video. Analyze the image and output a detailed description of the scene, the person (YouTuber), what they are doing and any products visible in the image."
        response0 = client.models.generate_content(
            model=model,
            contents=[prompt_part0, img])
        detail_description = response0.text
        print(f"Detailed Description:\n{detail_description}\n---")

        # --- Prompt Section 1: Overall Scene Description ---
        prompt_part1 = """
        Describe the overall scene in this image, as if you are setting the stage for a play.
        Focus on:
        - The general setting (indoors, outdoors, specific room type).
        - The overall lighting conditions (bright, dim, natural light, artificial light).
        - Any prominent objects or features in the background.
        - The general atmosphere or mood conveyed by the scene.  Is it professional, casual, intimate?
        - Don't focus on people or products yet, just the environment.
        """
        response1 = client.models.generate_content(
            model=model,
            contents=[prompt_part1, img])
        scene_description = response1.text
        print(f"Scene Description:\n{scene_description}\n---")


        # --- Prompt Section 2: Person/YouTuber Description ---
        prompt_part2 = f"""
        Now, focus on the person (or people) in the image, likely the YouTuber. Provide details on:
        - Their apparent gender, race/ethnicity, skin color, hair color, hair style, and eye color. Be respectful and avoid stereotypes.
          If these details are not clearly visible, state that.  Estimate if you can, but note the uncertainty.
        - Describe their clothing.  Is it formal, casual, or intimate attire?
        - Describe their physical build (e.g., slim, athletic, curvy, muscular), but only if clearly visible.
        - Are any parts of their body visible (hands, arms, face, etc.)?  Be specific.
        - Estimate the amount of skin exposed (fully covered, minimally exposed, partially covered, minimally covered, fully exposed)
        - Are there any other people present, like family or friends?
        
        Previous Scene Context (use if helpful, otherwise ignore):
        {scene_description}
        """
        response2 = client.models.generate_content(
            model=model,
            contents=[prompt_part2, img])
        person_description = response2.text
        print(f"Person Description:\n{person_description}\n---")


        # --- Prompt Section 3: Product Description ---
        prompt_part3 = f"""
        Describe any products visible in the image. Focus on:
        - What is the product (if identifiable)?
        - How is the person interacting with the product? Are they holding it, demonstrating its use, or is it just in the background?
        - Is the camera focused primarily on the person or the product?

        Previous Context (use if helpful):
        Scene: {scene_description}
        Person: {person_description}
        """
        response3 = client.models.generate_content(
            model=model,
            contents=[prompt_part3, img])
        product_description = response3.text
        print(f"Product Description:\n{product_description}\n---")



        # --- Prompt Section 4: Variable Extraction ---
        prompt_part4 = f"""
        Based on the following descriptions, extract the following visual variables.  Provide a value for each variable.
        If a variable cannot be determined, provide a clear reason (e.g., "not visible," "ambiguous").

        Descriptions:
        Scene: {scene_description}
        Person: {person_description}
        Product: {product_description}

        Variables (provide a single value for each):
        - show_face: (0 or 1) Is the YouTuber's face visible?
        - show_body: (0 or 1) Are other body parts (hands, arms, etc.) visible?
        - show_product: (0 or 1) Is a product visible?
        - show_how_to_use_product: (0 or 1) Is the person demonstrating how to use the product?
        - clothes: (0 = formal, 1 = casual, 2 = intimate)  Describe the clothing.
        - skin_exposure: (0 = fully covered, 1 = minimally exposed, 2 = partially covered, 3 = minimally covered, 4 = fully exposed)
        - location_type: (0 = studio, 1 = park, 2 = home, 3 = bathroom, 4 = public/open space)
        - personal_spaces: (0 = public/open, 1 = private)
        - family_and_friends: (0 = no, 1 = yes) Are family/friends present?
        - camera_focus: (0 = person, 1 = product)
        - lighting_and_ambiance: (0 = natural, 1 = artificial)
        - race_ethnicity: (0 = white, 1 = black, 2 = hispanic, 3 = asian, 4 = other)
        - skin_color: (0 = light, 1 = medium, 2 = dark)
        - hair_color: (Provide a descriptive label, e.g., "brown", "blonde", "black", "red", "gray", "other")
        - hair_style: (Provide a descriptive label, e.g., "long", "short", "curly", "straight", "braided", "other")
        - eye_color: (Provide a descriptive label, e.g., "brown", "blue", "green", "hazel", "other")
        - physical_build_and_fitness_cues: (0 = Not applicable, 1 = Slim, 2 = Athletic, 3 = Curvy, 4 = Muscular)
        - gender: (0 = Male, 1 = Female, 2 = Not applicable)
        """
        response4 = client.models.generate_content(
            model=model,
            contents=[prompt_part4, img])
        variable_extraction = response4.text
        print(f"Variable Extraction:\n{variable_extraction}\n---")

        # 3.  Parse the Variable Extraction

        # extracted_variables = {}
        


        # sleep for a bit to avoid rate limiting
        time.sleep(20)
        return detail_description, scene_description, person_description, product_description, variable_extraction

    except Exception as e:
        print(f"Error during analysis: {e}")
        if "Error during analysis: 429 RESOURCE_EXHAUSTED" in str(e):
            print("Rate limit exceeded. Please wait a few minutes before trying again.")
            #try again
            time.sleep(60)
            return analyze_screenshot(image_path, model)
        return None, None, None, None, None
 

In [3]:
import glob

CACHE_DIR = "cache/csv_files"

csv_matched = glob.glob(os.path.join(CACHE_DIR, "cleaned_ytb_search_combined_*matched_keywords_with_metadata.csv"))
csv_matched

['cache/csv_files\\cleaned_ytb_search_combined_results_apple_airtag_matched_keywords_with_metadata.csv',
 'cache/csv_files\\cleaned_ytb_search_combined_results_cosrx_snail_mucin_serum_matched_keywords_with_metadata.csv',
 'cache/csv_files\\cleaned_ytb_search_combined_results_liquid_death_matched_keywords_with_metadata.csv',
 'cache/csv_files\\cleaned_ytb_search_combined_results_lululemon_align_pants_matched_keywords_with_metadata.csv',
 'cache/csv_files\\cleaned_ytb_search_combined_results_macbook_pro_m4_matched_keywords_with_metadata.csv',
 'cache/csv_files\\cleaned_ytb_search_combined_results_meta_quest_3_matched_keywords_with_metadata.csv',
 'cache/csv_files\\cleaned_ytb_search_combined_results_samsung_galaxy_watch_7_matched_keywords_with_metadata.csv',
 'cache/csv_files\\cleaned_ytb_search_combined_results_samsung_oled_tv_matched_keywords_with_metadata.csv',
 'cache/csv_files\\cleaned_ytb_search_combined_results_stanley_tumbler_matched_keywords_with_metadata.csv',
 'cache/csv_files

In [4]:
search_products = ['Apple AirTag',
 'COSRX Snail Mucin Serum',
 'Liquid Death',
 'Lululemon Align Pants',
 'Macbook Pro M4',
 'Meta quest 3',
 'Samsung Galaxy Watch 7',
 'Samsung OLED TV',
 'Stanley tumbler',
 'Tesla Model Y']
search_products.sort()

In [5]:
image_folders = ['../images/' + product.lower().replace(' ', '_') for product in search_products]
image_folders

['../images/apple_airtag',
 '../images/cosrx_snail_mucin_serum',
 '../images/liquid_death',
 '../images/lululemon_align_pants',
 '../images/macbook_pro_m4',
 '../images/meta_quest_3',
 '../images/samsung_galaxy_watch_7',
 '../images/samsung_oled_tv',
 '../images/stanley_tumbler',
 '../images/tesla_model_y']

# Test on Tech videos: Macbook Pro M4

In [7]:
tech_csv = csv_matched[4]
tech_image_folder = image_folders[4]
tech_csv, tech_image_folder

('cache/csv_files\\cleaned_ytb_search_combined_results_macbook_pro_m4_matched_keywords_with_metadata.csv',
 '../images/macbook_pro_m4')

In [10]:
import pandas as pd
tech_df = pd.read_csv(tech_csv)
tech_df.head()

Unnamed: 0,product_type,video_id,video_url,video_title,video_description,video_published,video_thumbnail,description,channelTitle,publishTime,...,publish_month,fuzzy_ratio_title,matched_title,fuzzy_ratio_description,matched_description,merged_screenshot,video_chapters,heatmap,automatic_captions,transcript_screenshot_not_null
0,Macbook Pro M4,G0cmfY7qdmY,https://www.youtube.com/watch?v=G0cmfY7qdmY,MacBook Pro Announcement - October 30,Watch the special Apple announcement to learn ...,2024-10-30 15:11:07+00:00,https://i.ytimg.com/vi/G0cmfY7qdmY/default.jpg,Watch the special Apple announcement to learn ...,Apple,2024-10-30T15:11:07Z,...,2024-10,86,1,86,1,[{'url': 'https://i.ytimg.com/sb/G0cmfY7qdmY/s...,,"[{'start_time': 0.0, 'end_time': 10.13, 'value...",[{'url': 'https://manifest.googlevideo.com/api...,1
1,Macbook Pro M4,HN-WH7C4K0Q,https://www.youtube.com/watch?v=HN-WH7C4K0Q,Here&#39;s the Thing about the M3 Macbook Air...,Usually it's the easiest laptop in the world r...,2024-03-20 21:20:58+00:00,https://i.ytimg.com/vi/HN-WH7C4K0Q/default.jpg,Usually it's the easiest laptop in the world r...,Marques Brownlee,2024-03-20T21:20:58Z,...,2024-03,64,1,29,0,[{'url': 'https://i.ytimg.com/sb/HN-WH7C4K0Q/s...,"[{'start_time': 0.0, 'title': 'Intro', 'end_ti...","[{'start_time': 0.0, 'end_time': 5.59, 'value'...",[{'url': 'https://manifest.googlevideo.com/api...,1
2,Macbook Pro M4,ZWgr7qP6yhY,https://www.youtube.com/watch?v=ZWgr7qP6yhY,Space Black M3 Max MacBook Pro Review: We Can ...,The new matte black MacBook Pro is more capabl...,2023-11-09 23:48:45+00:00,https://i.ytimg.com/vi/ZWgr7qP6yhY/default.jpg,The new matte black MacBook Pro is more capabl...,Marques Brownlee,2023-11-09T23:48:45Z,...,2023-11,86,1,86,1,[{'url': 'https://i.ytimg.com/sb/ZWgr7qP6yhY/s...,,"[{'start_time': 0.0, 'end_time': 5.36, 'value'...",[{'url': 'https://manifest.googlevideo.com/api...,1
3,Macbook Pro M4,1TCuf_Qcfv8,https://www.youtube.com/watch?v=1TCuf_Qcfv8,15&quot; MacBook Air M2 Review: The Obvious Th...,The bigger MacBook Air made too much sense. 15...,2023-06-12 13:00:43+00:00,https://i.ytimg.com/vi/1TCuf_Qcfv8/default.jpg,The bigger MacBook Air made too much sense.\n\...,Marques Brownlee,2023-06-12T13:00:43Z,...,2023-06,79,1,79,1,[{'url': 'https://i.ytimg.com/sb/1TCuf_Qcfv8/s...,"[{'start_time': 0.0, 'title': 'Intro', 'end_ti...","[{'start_time': 0.0, 'end_time': 4.91, 'value'...","[{'ext': 'json3', 'url': 'https://www.youtube....",1
4,Macbook Pro M4,9HQx5pgUoiY,https://www.youtube.com/watch?v=9HQx5pgUoiY,M4 Max MacBook Pro: I&#39;m Convinced!,M4 generation has created a gap from M1. This ...,2024-11-18 17:30:04+00:00,https://i.ytimg.com/vi/9HQx5pgUoiY/default.jpg,M4 generation has created a gap from M1. This ...,Marques Brownlee,2024-11-18T17:30:04Z,...,2024-11,86,1,86,1,[{'url': 'https://i.ytimg.com/sb/9HQx5pgUoiY/s...,,"[{'start_time': 0.0, 'end_time': 7.51, 'value'...",[{'url': 'https://manifest.googlevideo.com/api...,1


In [11]:
demo_ids = tech_df.video_id[:10]
demo_ids

0    G0cmfY7qdmY
1    HN-WH7C4K0Q
2    ZWgr7qP6yhY
3    1TCuf_Qcfv8
4    9HQx5pgUoiY
5    Al5jdKG5Yhc
6    UFV6wukB_Rg
7    b4x8boB2KdI
8    ijTNBZszvJ0
9    8yEUAeBIz8w
Name: video_id, dtype: object

In [12]:
for id in demo_ids:
    print(id)
    # find all screenshots for this video
    screenshots = glob.glob(os.path.join(tech_image_folder, id + "*"))
    for screenshot in screenshots:
        print(screenshot)

G0cmfY7qdmY
../images/macbook_pro_m4\G0cmfY7qdmY_1.jpg
../images/macbook_pro_m4\G0cmfY7qdmY_10.jpg
../images/macbook_pro_m4\G0cmfY7qdmY_11.jpg
../images/macbook_pro_m4\G0cmfY7qdmY_12.jpg
../images/macbook_pro_m4\G0cmfY7qdmY_2.jpg
../images/macbook_pro_m4\G0cmfY7qdmY_3.jpg
../images/macbook_pro_m4\G0cmfY7qdmY_4.jpg
../images/macbook_pro_m4\G0cmfY7qdmY_5.jpg
../images/macbook_pro_m4\G0cmfY7qdmY_6.jpg
../images/macbook_pro_m4\G0cmfY7qdmY_7.jpg
../images/macbook_pro_m4\G0cmfY7qdmY_8.jpg
../images/macbook_pro_m4\G0cmfY7qdmY_9.jpg
HN-WH7C4K0Q
../images/macbook_pro_m4\HN-WH7C4K0Q_1.jpg
../images/macbook_pro_m4\HN-WH7C4K0Q_10.jpg
../images/macbook_pro_m4\HN-WH7C4K0Q_11.jpg
../images/macbook_pro_m4\HN-WH7C4K0Q_12.jpg
../images/macbook_pro_m4\HN-WH7C4K0Q_13.jpg
../images/macbook_pro_m4\HN-WH7C4K0Q_2.jpg
../images/macbook_pro_m4\HN-WH7C4K0Q_3.jpg
../images/macbook_pro_m4\HN-WH7C4K0Q_4.jpg
../images/macbook_pro_m4\HN-WH7C4K0Q_5.jpg
../images/macbook_pro_m4\HN-WH7C4K0Q_6.jpg
../images/macbook_pro_m

In [25]:
# test with 1 image
test_img = '../images/macbook_pro_m4\G0cmfY7qdmY_1.jpg'
detail_desc, scene_desc, person_desc, product_desc, variable_desc = analyze_screenshot(test_img)
detail_desc, scene_desc, person_desc, product_desc, variable_desc


  test_img = '../images/macbook_pro_m4\G0cmfY7qdmY_1.jpg'


Detailed Description:
Here is a detailed description of the image:

**Scene:**

The image is a collage of shots, likely from a YouTube product review video. The dominant scene appears to be set in a modern, architecturally striking space with a large, rounded window offering a view of a green landscape and a contemporary building. The interior is minimalist, with sleek white walls and a bright, airy feel. Other scenes include one showing a car driving through a tunnel and a full shot of the YouTuber in the same location.

**Person (YouTuber):**

The YouTuber is a man with short, dark hair, wearing a dark T-shirt and what looks like a smartwatch on his left wrist. He appears in several of the shots, gesturing and speaking directly to the camera. He stands in front of the window, and in one shot, he is standing on what looks like a white podium.

**Products Visible:**

- Apple Macbook pro with wallpaper on the screen. The computer is silver.

**Action/Activity:**

The YouTuber is present

('Here is a detailed description of the image:\n\n**Scene:**\n\nThe image is a collage of shots, likely from a YouTube product review video. The dominant scene appears to be set in a modern, architecturally striking space with a large, rounded window offering a view of a green landscape and a contemporary building. The interior is minimalist, with sleek white walls and a bright, airy feel. Other scenes include one showing a car driving through a tunnel and a full shot of the YouTuber in the same location.\n\n**Person (YouTuber):**\n\nThe YouTuber is a man with short, dark hair, wearing a dark T-shirt and what looks like a smartwatch on his left wrist. He appears in several of the shots, gesturing and speaking directly to the camera. He stands in front of the window, and in one shot, he is standing on what looks like a white podium.\n\n**Products Visible:**\n\n- Apple Macbook pro with wallpaper on the screen. The computer is silver.\n\n**Action/Activity:**\n\nThe YouTuber is presenting 

In [None]:
image_id	video_id	thumbnail	show_face	show_body	show_product	show_how_to_use_product	clothes	skin_exposure	location_type	personal_spaces	family_and_friends	camera_focus	lighting_and_ambiance	race_ethnicity	skin_color	hair_color	hair_style	eye_color	physical_build_and_fitness_cues	gender	transcript				

In [29]:
columns = [
    'video_id', 'screenshot', 'detail_desc', 'scene_desc', 'person_desc', 
    'product_desc', 'variable_desc', 'show_face', 'show_body', 'show_product',
    'show_how_to_use_product', 'clothes', 'skin_exposure', 'location_type',
    'personal_spaces', 'family_and_friends', 'camera_focus', 'lighting_and_ambiance',
    'race_ethnicity', 'skin_color', 'hair_color', 'hair_style', 'eye_color',
    'physical_build_and_fitness_cues', 'gender', 'transcript'
]

# Initialize DataFrame with all columns
img_to_txt_df = pd.DataFrame(columns=columns)

# List to store rows
rows = []

for id in demo_ids:
    screenshots = glob.glob(os.path.join(tech_image_folder, id + "*"))
    for screenshot in screenshots:
        detail_desc, scene_desc, person_desc, product_desc, variable_desc = analyze_screenshot(screenshot)
        
        new_row = {
            'video_id': id,
            'screenshot': screenshot,
            'detail_desc': detail_desc,
            'scene_desc': scene_desc,
            'person_desc': person_desc,
            'product_desc': product_desc,
            'variable_desc': variable_desc
        }
        
        rows.append(new_row)


img_to_txt_df = pd.DataFrame(rows, columns=columns)
print("Columns in DataFrame:", img_to_txt_df.columns.tolist())
print("\nSample of data:")
print(img_to_txt_df.head())

Detailed Description:
Here is a detailed description of the image:

**Scene Description:**

The image appears to be a screen capture from a YouTube product review video, possibly a tech review or announcement by Apple. The screen is split into multiple segments, each presenting a different view or angle.

The background of the main segments features a modern, futuristic-looking interior with large curved windows that offer a view of an exterior landscape. There are trees and a building with a distinct architectural style visible outside.

In one segment, there's footage from what looks like a dashboard camera driving through a brightly lit tunnel. Another shows a figure standing in a large, open, white space, possibly the same set as the main segments, but from a distance to showcase the location's scale.

**YouTuber Description:**

In several segments, a man is visible. He appears to be the presenter or YouTuber. He's dressed casually in a dark t-shirt. In most segments, he stands and

In [30]:
# save to csv
img_to_txt_df.to_csv('cache/csv_files/macbook_pro_m4_image_to_text.csv', index=False)

print("CSV file saved successfully.")

# save to excel
img_to_txt_df.to_excel('cache/csv_files/macbook_pro_m4_image_to_text.xlsx', index=False)

print("Excel file saved successfully.")

CSV file saved successfully.
Excel file saved successfully.
