# Netflix Viewing Activity Analysis

This notebook analyzes Netflix viewing activity data to understand viewing patterns and trends over time. We'll examine:

1. Daily viewing durations
2. Basic statistics (total sessions, total duration, average duration)
3. Monthly viewing averages
4. Rolling averages to identify trends

Let's start by importing the required libraries and loading our data.

In [None]:
import csv
from datetime import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime, timedelta
import pandas as pd

## Data Loading and Processing

We'll load the viewing activity data from the CSV file. For each viewing session, we'll:
- Extract the date from the start time
- Convert duration from HH:MM:SS format to minutes
- Filter for a specific profile ('C')

In [None]:
dates = []
durations = []

with open('/Users/emreozkul/Desktop/dsa-project/attention-span-analysis/data/ViewingActivity.csv', 'r') as file:
    reader = csv.DictReader(file)

    for row in reader:
        if row["Profile Name"] == "C":
            date = datetime.strptime(row["Start Time"].split(" ")[0], '%Y-%m-%d')
            
            # Convert duration string (e.g., "1:23:45") to minutes
            duration_parts = row["Duration"].split(':')
            duration_minutes = (int(duration_parts[0]) * 60 + 
                             int(duration_parts[1]) +
                             int(duration_parts[2]) / 60)
            
            dates.append(date)
            durations.append(duration_minutes)

## Basic Statistics

Let's calculate some basic statistics about the viewing activity:

In [None]:
sessions = len(dates)
sum_durations = sum(durations)
avg_duration = sum_durations / sessions

print(f"Total Sessions: {sessions}")
print(f"Total Duration (minutes): {sum_durations:.2f}")
print(f"Average Duration per Session (minutes): {avg_duration:.2f}")

## Daily Viewing Duration Visualization

This plot shows the viewing duration for each day. Each point represents a viewing session.

In [None]:
def plot_data():
    plt.figure(figsize=(12, 6))
    plt.plot(dates, durations, marker='o')

    plt.gcf().autofmt_xdate()
    plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))

    plt.xlabel('Date')
    plt.ylabel('Duration (minutes)')
    plt.title('Viewing Duration Over Time')
    plt.grid(True, linestyle='--', alpha=0.7)
    plt.tight_layout()
    plt.show()

plot_data()

## Monthly Average Analysis

To better understand long-term trends, let's analyze the monthly averages of viewing duration.

In [None]:
# Calculate monthly averages
df = pd.DataFrame({'date': dates, 'duration': durations})
df['month_year'] = df['date'].dt.to_period('M')
monthly_avg = df.groupby('month_year')['duration'].mean().reset_index()
monthly_avg['month_year'] = monthly_avg['month_year'].dt.to_timestamp()

# Create the plot
plt.figure(figsize=(12, 6))
plt.plot(monthly_avg['month_year'], monthly_avg['duration'], marker='o', linewidth=2)

plt.gcf().autofmt_xdate()
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
plt.xlabel('Month')
plt.ylabel('Average Duration (minutes)')
plt.title('Monthly Average Viewing Duration')
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

# Print monthly statistics
print("\nMonthly Statistics:")
for _, row in monthly_avg.iterrows():
    print(f"{row['month_year'].strftime('%Y-%m')}: {row['duration']:.2f} minutes")

## Rolling Average Analysis

To smooth out daily variations and see clearer trends, let's calculate a 7-day rolling average of viewing duration.

In [None]:
def plot_average_duration_over_time():
    df = pd.DataFrame({'date': dates, 'duration': durations})
    df = df.sort_values('date')
    
    # Calculate 7-day rolling average
    df['rolling_avg'] = df['duration'].rolling(window=7, min_periods=1).mean()

    plt.figure(figsize=(12, 6))
    plt.scatter(df['date'], df['duration'], alpha=0.4, label='Individual Sessions')
    plt.plot(df['date'], df['rolling_avg'], color='red', linewidth=2, label='7-day Rolling Average')

    plt.gcf().autofmt_xdate()
    plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
    plt.xlabel('Date')
    plt.ylabel('Duration (minutes)')
    plt.title('Average Viewing Duration Over Time')
    plt.legend()
    plt.grid(True, linestyle='--', alpha=0.7)
    plt.tight_layout()
    plt.show()

plot_average_duration_over_time()

## Analysis Summary

This analysis provides several insights into Netflix viewing patterns:

1. **Daily Patterns**: The scatter plot shows individual viewing sessions, revealing daily viewing habits and potential binge-watching sessions.

2. **Monthly Trends**: The monthly averages help identify seasonal patterns or long-term changes in viewing habits.

3. **Rolling Average**: The 7-day rolling average smooths out daily variations, making it easier to spot trends and patterns in viewing behavior.

4. **Overall Statistics**: We can see the total number of viewing sessions and average duration, giving us a broad picture of Netflix usage.

This data can be useful for:
- Understanding personal viewing habits
- Identifying potential patterns in binge-watching behavior
- Tracking changes in viewing habits over time