## Phase I Project Proposal

### YouTube Marketing Campaign Performance Analysis

Name: Melanie Yu
DS 3000
Fall 2025

### Introduction

What makes a marketing video go viral on YouTube? Businesses invest millions in video marketing campaigns, but understanding what 
exactly drives engagement is more complicated. For this project, I'm interested in finding out whether 
factors like video length, upload timing, or channel characteristics can predict 
a video's success in terms of views, likes, and comments. My analysis will address two key questions:
1. What video characteristics (duration, upload day, channel subscribers) 
   correlate most strongly with high engagement rates?
2. Can we classify marketing videos as "high-performing" vs "low-performing" 
   based on their metadata features?
   
These questions have practical business applications like if marketing teams can optimize 
their content strategy by understanding which video characteristics drive engagement, 
and if brands can better allocate their advertising budgets by predicting campaign success. 
The insights from this project could help me understand how companies improve their ROI on video 
marketing investments.

### Data Collection

I plan to use the YouTube Data API to collect data on marketing campaign videos. 
The API provides access to video metadata including view counts, likes, comments, 
duration, publish dates, and channel information. This data is returned in JSON 
format and can be easily parsed into a structured dataset. The YouTube API is free to use with a daily quota limit, and I've shown below how I can access the relevant data. I've collected information on marketing-related 
videos by searching for terms like "marketing campaign," "brand advertisement," and 
"company promotional video." The data includes both numeric features (views, likes, 
comments, duration) and categorical features (channel name, publish day, category) 
that will be useful for answering my questions of interest. The complete data collection code has been run and the resulting dataset saved as a CSV file. Below I show a simplified version of the API calls and load the 
collected data to demonstrate it meets the project requirements.

In [42]:
import requests
import pandas as pd
from datetime import datetime

# YouTube API Setup
API_KEY = 'AIzaSyDwFnPD5r409mDjMbCxWOTpljaCI6AfhVE'
BASE_URL = 'https://www.googleapis.com/youtube/v3/'

# Search for marketing videos
search_url = f'{BASE_URL}search'
params = {
    'part': 'snippet',
    'q': 'marketing campaign commercial',
    'type': 'video',
    'maxResults': 50,
    'key': API_KEY
}

response = requests.get(search_url, params=params)
search_data = response.json()

# Get video IDs
video_ids = [item['id']['videoId'] for item in search_data['items']]

# Get video details
videos_url = f'{BASE_URL}videos'
video_params = {
    'part': 'snippet,statistics,contentDetails',
    'id': ','.join(video_ids),
    'key': API_KEY
}

video_response = requests.get(videos_url, params=video_params)
video_data = video_response.json()

# Load and display the collected data
df = pd.read_csv('youtube_marketing_data.csv')
df.head()

Unnamed: 0,video_id,title,channel_name,publish_date,publish_day,view_count,like_count,comment_count,duration_seconds,category_id
0,El7IviLvm7s,Toshiba Corporate Video,Toshiba Australia and New Zealand,2024-07-31 04:51:21,Wednesday,109780,204,0,67,28
1,M4KX1oeJX94,15 Funny Commercials that will make you Laugh ...,Mining Asteroids,2024-08-08 17:00:23,Thursday,112364,555,11,627,28
2,ZP02qq8yAss,Katseye's GAP Ad Is Marketing GENIUS (Here’s Why),Mar's Magazine,2025-08-25 20:51:30,Monday,152518,8912,456,969,22
3,IQovoot_ZUM,Coca cola Creates First Ever Drinkable Adver...,Dipdrop Branding Solution,2015-06-30 14:18:30,Tuesday,989308,9204,115,113,22
4,UFxCzPU61vU,Best Marketing Campaigns of the Last Decade: 2...,Digital Uncovered,2020-04-25 10:50:46,Saturday,341569,3970,37,755,26


### Data Usage and Remaining Issues

The dataset collected above contains rich information about marketing videos on 
YouTube. I have multiple numeric features (view_count, like_count, comment_count, 
duration_seconds) and categorical features (channel_name, publish_day, category_id) 
that can be used to answer my questions of interest. To address my first question about which characteristics correlate with engagement, I plan to use regression analysis to predict engagement metrics (views, likes, 
comments) based on video features. This will help identify which factors are most 
important for video success. For my second question about classifying videos as high vs low performing, I will 
use classification techniques. I'll create a binary target variable based on 
engagement rates (e.g., videos above the median engagement as "high-performing") 
and train a model to predict this category based on video metadata.
Some data cleaning for my data will be needed so for example: 
- Handling any missing values in like_count or comment_count (some videos disable these)
- Converting duration_seconds into more interpretable bins (short/medium/long videos)
- Potentially creating an "engagement_rate" feature that normalizes likes/comments by views
- Considering creating features like "time_since_upload" to account for video age
  
Once the data is properly prepared, supervised machine learning methods like linear regression (for predicting 
engagement metrics) and logistic regression or decision trees (for classification) 
should be well-suited to answer these questions.
