In [228]:
%%html
<style>
.jp-Cell:nth-child(1) .jp-InputArea,
.jp-Cell:nth-child(3) .jp-InputArea,
.jp-Cell:nth-child(18) .jp-InputArea {
        display: none !important;
    }
</style>

# 🎵 Going Viral: Analyzing TikTok Success with Data Science

In this lesson, we'll explore what makes TikTok videos go viral using data science! We'll analyze real patterns in social media engagement and learn how to optimize content for better reach.
## 📚 Prerequisites

- Basic Python knowledge
- Installed libraries: pandas, matplotlib, seaborn, numpy

## 🎯 Learning Objectives

- Analyze TikTok engagement patterns using real data
- Understand the relationship between posting time and video success
- Identify which content categories perform best
- Learn how video characteristics affect engagement
- Create data-driven content strategies

## 📊 Part 1: Exploratory Data Analysis
First, we'll examine our TikTok dataset which contains information about:

- Views, likes, comments, and shares
- Video duration
- Posting time
- Music usage
- Hashtag count
- Content category

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import numpy as np
from IPython.display import display, Javascript

# Enhanced dataset with categories
categories = ['Dance', 'Comedy', 'Tutorial', 'Lifestyle', 'Food', 'Gaming', 'Music', 'Fashion', 'Sports', 'News']

# Create base data
np.random.seed(42)  # For reproducibility
n_samples = 100

# Generate video duration pattern
duration_sec = np.random.uniform(5, 60, n_samples)
base_engagement = 20 - (0.3 * duration_sec)  
noise = np.random.normal(0, 2, n_samples)    
engagement_rate = base_engagement + noise

# Generate time posted data
hour_weights = np.ones(24) * 0.5  
hour_weights[19:22] = 8.0 
hour_weights[15:17] = 4.0 
hour_weights[12:13] = 3.0 
hour_weights[2:5] = 0.1
# Normalize weights
hour_weights = hour_weights / hour_weights.sum()

data = {
    'video_id': range(1, n_samples + 1),
    'duration_sec': duration_sec,
    'post_hour': np.random.choice(range(24), size=n_samples, p=hour_weights),
    'has_music': np.random.choice([True, False], size=n_samples, p=[0.8, 0.2]),
    'hashtag_count': np.random.randint(0, 8, n_samples),
    'category': np.random.choice(categories, size=n_samples),
    'engagement_rate': engagement_rate
}

df = pd.DataFrame(data)
time_multiplier = np.where(df['post_hour'].between(19, 21), 2.5,  
                  np.where(df['post_hour'].between(15, 17), 1.8,   
                  np.where(df['post_hour'].between(12, 13), 1.5,   
                  np.where(df['post_hour'].between(2, 5), 0.3,     
                  1.0))))  

df['views'] = np.random.exponential(scale=100000, size=n_samples) * time_multiplier
df['likes'] = df['views'] * df['engagement_rate']/100 * np.random.uniform(0.8, 1.2, n_samples)
df['comments'] = df['views'] * df['engagement_rate']/500 * np.random.uniform(0.8, 1.2, n_samples)
df['shares'] = df['views'] * df['engagement_rate']/200 * np.random.uniform(0.8, 1.2, n_samples)

# Round appropriate columns
df['engagement_rate'] = df['engagement_rate'].round(2)
df['likes'] = df['likes'].round(0)
df['comments'] = df['comments'].round(0)
df['shares'] = df['shares'].round(0)

print("Dataset Preview:")
print(df.head())

print("\nSummary Statistics:")
print(df[['views', 'engagement_rate', 'duration_sec']].describe())

## 🔍 Key Metrics
We'll focus on these engagement metrics:

- Views
- Likes
- Comments
- Shares
- Engagement Rate (calculated as: (likes + comments + shares) / views * 100)

## 🎯 Activities
### Activity 1: Time Analysis
Let's discover the best times to post on TikTok!

In [230]:
# Create a box plot of views by posting hour
# Delete this line (and hashtag at the beginning!) and input your code here, then press Control+Enter to run the cell

Questions to Consider:

1. What patterns do you notice in viewing times?
2. When are engagement rates highest?
3. How might this affect your posting strategy?

### Activity 2: Category Analysis
Different types of content perform differently. Let's analyze which categories get the most engagement!

In [None]:
# Create a bar plot of engagement rates by category
# Delete this line (and hashtag at the beginning!) and input your code here, then press Control+Enter to run the cell

Discussion Points:

1. Which categories have the highest engagement rates?
2. Why might certain categories perform better?
3. How could this inform content strategy?

### Activity 3: Video Duration Analysis
Does video length matter? Let's find out!

In [None]:
# Create a scatter plot of duration vs engagement
# Delete this line (and hashtag at the beginning!) and input your code here, then press Control+Enter to run the cell

Analysis Questions:

1. Is there an optimal video length?
2. How does duration affect different types of content?
3. What length would you recommend for your content?

### Activity 4: Correlation Analysis
Let's see how different metrics relate to each other.

In [None]:
# Create a correlation heatmap
# Delete this line (and hashtag at the beginning!) and input your code here, then press Control+Enter to run the cell

Think About:

1. Which metrics are most strongly correlated?
2. What insights can we draw from these relationships?
3. How could this inform content creation?

## 🎮 Interactive Challenge: Design Your TikTok!
Now it's your turn to apply what you've learned! Design your own TikTok video concept and see how it might perform.
Your video design should consider:

- Duration
- Number of hashtags
- Music usage
- Posting time
- Content category

In [None]:
# Run this cell to design your video!
# Please click "Run" at the top of your screen and then press "Run Selected Cell and All Below" 

In [None]:
def predict_success(duration, hashtags, has_music, hour, category):
    """
    Enhanced scoring system for video potential
    """
    score = 0
    
    # Duration factor (videos between 15-30 seconds get bonus)
    if 15 <= duration <= 30:
        score += 25
    elif 30 < duration <= 45:
        score += 15
    else:
        score += 10
        
    # Hashtag factor (3-5 hashtags optimal)
    if 3 <= hashtags <= 5:
        score += 20
    else:
        score += 10
        
    # Music factor (adjusted for News category)
    if has_music and category != 'News':
        score += 20
    elif has_music and category == 'News':
        score += 10  # Less important for news content
    
    # Time factor (adjusted for News and Sports)
    if category == 'News':
        # News content can perform well during morning and evening hours
        if 7 <= hour <= 9 or 18 <= hour <= 22:
            score += 20
        else:
            score += 10
    elif category == 'Sports':
        # Sports content often performs best during/after game times
        if 19 <= hour <= 23:  # Evening games/highlights
            score += 20
        else:
            score += 15
    else:
        # Standard timing for other categories
        if 18 <= hour <= 22:
            score += 20
        elif 12 <= hour <= 17:
            score += 15
        else:
            score += 10
        
    # Category factor
    category_scores = {
        'Dance': 15,
        'Comedy': 15,
        'Tutorial': 12,
        'Lifestyle': 10,
        'Food': 12,
        'Gaming': 10,
        'Music': 15,
        'Fashion': 12,
        'Sports': 15,
        'News': 13
    }
    score += category_scores.get(category, 10)
    
    return score

def get_user_video_idea():
    print("\n🎥 Let's design your TikTok video!")
    
    duration = float(input("How long will your video be (in seconds)? "))
    hashtags = int(input("How many hashtags will you use? "))
    has_music = input("Will your video have music? (yes/no) ").lower().startswith('y')
    hour = int(input("What hour will you post? (0-23) "))
    
    print("\nCategories:", ', '.join(categories))
    category = input("What category is your video? ").capitalize()
    
    return {
        'duration': duration,
        'hashtags': hashtags,
        'has_music': has_music,
        'hour': hour,
        'category': category
    }

# Run the interactive challenge
video_idea = get_user_video_idea()
score = predict_success(**video_idea)
print(f"\nYour video score is: {score}/100")

## 📝 Group Activities
### Team Challenges

1. Form teams and compete to create the highest-scoring video concept
2. Analyze real trending TikToks and compare them to our predictions
3. Create a presentation about why certain categories perform better
4. Design a marketing strategy based on the data analysis

Discussion Questions

1. Why do you think certain categories perform better than others?
2. How might these patterns change during different seasons or events?
3. What other factors might influence a video's success?
4. How could we improve our prediction model?

## 📈 Extended Learning

- Research how trends have changed over time
- Compare findings with other social media platforms
- Consider how cultural events affect engagement patterns
- Explore machine learning approaches to predict video success

## 🎓 Assessment Ideas

- Create a data-driven content strategy
- Design and analyze a hypothetical viral campaign
- Present findings about optimal posting strategies
- Develop a prediction model for video success

Remember: Social media trends change constantly! These patterns might shift over time, so always stay current with your analysis.