# Netflix Viewing History - Machine Learning Analysis

In this notebook, we'll apply various machine learning techniques to analyze Netflix viewing patterns:
1. Content-based recommendation system
2. Viewing pattern clustering
3. Next viewing time prediction
4. Binge-watching analysis

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

# Import our ML module
import sys
sys.path.append('../src')
from ml_models import NetflixMLAnalyzer

# Configure plotting
%matplotlib inline
plt.style.use('default')
sns.set_theme()

# Load and prepare data
df = pd.read_csv('../data/NetflixViewingHistory.csv')
df['Date'] = pd.to_datetime(df['Date'])

# Initialize our ML analyzer
ml_analyzer = NetflixMLAnalyzer()

## 1. Content-Based Recommendation System

We'll start with a simple recommendation system that suggests similar content based on titles you've watched.

In [None]:
# Get your most recently watched show
latest_show = df.sort_values('Date', ascending=False).iloc[0]['Title']
print(f"Finding shows similar to: {latest_show}\n")

# Get recommendations
similar_shows = ml_analyzer.find_similar_content(df, latest_show)
print("Recommended shows based on your recent watching:")
print(similar_shows)

## 2. Viewing Pattern Clustering

Next, we'll use K-means clustering to identify patterns in your viewing behavior.

In [None]:
# Perform clustering
clusters = ml_analyzer.cluster_viewing_patterns(df)

# Add clusters to dataframe for visualization
df['Cluster'] = clusters

# Visualize clusters
plt.figure(figsize=(12, 6))
for cluster in range(4):
    cluster_data = df[df['Cluster'] == cluster]
    plt.scatter(cluster_data['Date'].dt.hour,
               cluster_data['Date'].dt.dayofweek,
               label=f'Cluster {cluster}',
               alpha=0.6)

plt.title('Viewing Pattern Clusters')
plt.xlabel('Hour of Day')
plt.ylabel('Day of Week')
plt.legend()
plt.grid(True)
plt.show()

## 3. Viewing Time Prediction

Now we'll predict when you're most likely to watch Netflix based on your history.

In [None]:
# Get viewing time predictions
predictions = ml_analyzer.predict_next_viewing_time(df)

# Print predictions
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
print(f"Most likely viewing day: {days[predictions['most_likely_day']]}")
print(f"Most likely viewing hour: {predictions['most_likely_hour']}:00")

# Visualize hourly probabilities
plt.figure(figsize=(12, 6))
predictions['hour_probabilities'].plot(kind='bar')
plt.title('Probability of Watching by Hour')
plt.xlabel('Hour of Day')
plt.ylabel('Probability')
plt.tight_layout()
plt.show()

## 4. Binge-Watching Analysis

Finally, let's analyze your binge-watching patterns.

In [None]:
# Analyze binge-watching patterns
binge_analysis = ml_analyzer.analyze_binge_patterns(df)

print(f"Binge-watching ratio: {binge_analysis['binge_ratio']:.2%}")
print("\nTop shows you've binged:")
print(binge_analysis['top_binged_shows'])
print(f"\nAverage binge session length: {binge_analysis['average_session_length']} episodes")