### Media Company Case Study

#### Problem Statement:
A digital media company (similar to Voot, Hotstar, Netflix, etc.) had launched a show. Initially, the show got a good response, but then witnessed a decline in viewership. The company wants to figure out what went wrong.

`Approach`:

We are concerned about determining the driver variable for show viewership. We are interested in predicting the key driver variables and their impact rather than forcasting the results.

What could be potential reasons for the decline in viewershp?

The potential reasons could be:

- Decline in the number of people coming to the platform
- Fewer people watching the video
- A Decrease in marketing spend?
- Competitive shows, e.g. cricket/ IPL
- Special holidays
- Twist in the story

`Data Description`:
- Views_show : Number of times the show was viewed
- Visitors : Number of visitors who browsed the platform, but not necessarily watched a video.
- Views_platform : Number of times a video was viewed on the platform
- Ad_impression : Proxy for marketing budget. Represents number of impressions generated by ads
- Cricket_match_india: If a cricket match was being played. 1 indicates match on a given day, 0 indicates there wasn't
- Character_A : Describes presence of Character A. 1 indicates character A was in the episode, 0 indicates they weren't.

In [1]:
# 1. Read in the data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

df = pd.read_csv("mediacompany.csv")
df

Unnamed: 0,Date,Views_show,Visitors,Views_platform,Ad_impression,Cricket_match_india,Character_A,Unnamed: 7
0,3/1/2017,183738,1260228,1706478,1060860448,0,0,
1,3/2/2017,193763,1270561,1690727,1031846645,0,0,
2,3/3/2017,210479,1248183,1726157,1010867575,0,0,
3,3/4/2017,240061,1492913,1855353,1079194579,1,0,
4,3/5/2017,446314,1594712,2041418,1357736987,0,0,
...,...,...,...,...,...,...,...,...
75,5/15/2017,313945,1808684,2226788,1398052759,1,0,
76,5/16/2017,185689,1814227,2199844,1311961223,1,0,
77,5/17/2017,142260,1755803,2225752,1248266254,1,0,
78,5/18/2017,135871,1749654,2302789,1284859759,1,0,


In [2]:
# 2. Clean the data
# df.drop("Unnamed: 7", axis=1, inplace=True)
df["Date"] = pd.to_datetime(df["Date"])
df["Weekend"] = df["Date"].dt.weekday.apply(lambda x: x==0 or x==6)
df

Unnamed: 0,Date,Views_show,Visitors,Views_platform,Ad_impression,Cricket_match_india,Character_A,Unnamed: 7,Weekend
0,2017-03-01,183738,1260228,1706478,1060860448,0,0,,False
1,2017-03-02,193763,1270561,1690727,1031846645,0,0,,False
2,2017-03-03,210479,1248183,1726157,1010867575,0,0,,False
3,2017-03-04,240061,1492913,1855353,1079194579,1,0,,False
4,2017-03-05,446314,1594712,2041418,1357736987,0,0,,True
...,...,...,...,...,...,...,...,...,...
75,2017-05-15,313945,1808684,2226788,1398052759,1,0,,True
76,2017-05-16,185689,1814227,2199844,1311961223,1,0,,False
77,2017-05-17,142260,1755803,2225752,1248266254,1,0,,False
78,2017-05-18,135871,1749654,2302789,1284859759,1,0,,False


In [3]:
# 3. Perform EDA to observe patterns and correlations
sns.lineplot(data=df, x="Date", y=["Views_show", "Weekend"])

ValueError: Length of list vectors must match length of `data` when both are used, but `data` has length 80 and the vector passed to `y` has length 2.

In [None]:
sns.barplot(data=df, x="Date", y="Views_show")

In [None]:
sns.pairplot(df[['Views_show', 'Visitors', 'Views_platform', 'Ad_impression']])

In [None]:
df[['Views_show', 'Visitors', 'Views_platform', 'Ad_impression']].corr()

In [None]:
sns.boxplot(data=df, x="Cricket_match_india", y="Views_show")

In [None]:
sns.boxplot(data=df, x="Character_A", y="Views_show")

In [None]:
# 4. Build a statsmodel and perform backward elimination to arrive at statistically significant predictors
xtrain, xtest, ytrain, ytest = train_test_split(df.drop(["Date", "Views_show"], axis=1), df["Views_show"], train_size=0.75, test_size=0.25, random_state=42)
xtrain = sm.add_constant(xtrain)
xtest = sm.add_constant(xtest)

model = sm.OLS(ytrain, xtrain)
result = model.fit()

print(result.summary())

In [None]:
# 5. Evaluate the model using regression metrics