# The Simpsons - A Not So D'Oh Statistical Analysis

In [None]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style('darkgrid')

from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))

In [None]:
episodes = pd.read_csv('../input/simpsons_episodes.csv')
episodes.head()

It seems to be an awesome data set for a simple statistical analysis but with some very interesting information. First things first, let's get rid of a couple of columns that we aren't going to need in our analysis. We can drop the 'image_url' and 'vide_url'. And then let's drop some rows with missing values.

In [None]:
print('Data shape before: {}'.format(episodes.shape))
episodes.drop(['image_url', 'video_url'], axis=1, inplace=True)
print('Data shape after: {}'.format(episodes.shape))

In [None]:
episodes.season.unique()

Awesome! It seems like we have 28 seasons of The Simpsons in our data set. Let's see if we have missing values in the data set.

In [None]:
episodes.isnull().sum()

In [None]:
episodes.dropna(inplace=True)
episodes.isnull().sum()

Let's take a look at the distribution of viewership through the years.

In [None]:
fig, (axis1, axis2) = plt.subplots(2, 1, figsize=(15, 10))
sns.barplot(x='season', y='views', data=episodes, ci=None, ax=axis1)
sns.barplot(x='season', y='us_viewers_in_millions', data=episodes, ci=None, ax=axis2)

In [None]:
episodes['us_viewers'] = episodes['us_viewers_in_millions'] * 1000

fig, (axis1, axis2) = plt.subplots(1, 2, figsize=(15, 5))

sns.regplot(x='id', y='views', data=episodes, ci=None, ax=axis1)
sns.regplot(x='id', y='us_viewers', data=episodes, ci=None, ax=axis2)
episodes.drop('us_viewers', axis=1, inplace=True)

In [None]:
fig, (axis1, axis2) = plt.subplots(1, 2, figsize=(15, 5))
sns.regplot(x='id', y='imdb_rating', data=episodes, ci=None, ax=axis1)
sns.regplot(x='id', y='imdb_votes', data=episodes, ci=None, ax=axis2)

It seems that The Simpsons ratings have been falling through the years, specially in the US. But that's not a surprise, overall television ratings have been following for the last decades, in fact, when compared to other shows, the simpsons is not doing that bad. But the interesting thing here is that imdb_ratings and votes have also been falling, what could that mean?