# Lab: Seaborn

Seaborn is another plotting library. Some consider it the [`ggplot`](https://ggplot2.tidyverse.org/) of Python with excellent default setting which make your data life easier. There is rather good [documentation online](https://seaborn.pydata.org/) and it comes with Anaconda Python.

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns

We can load and create the same plots as before.

In [None]:
office_df = pd.read_csv('data/raw/office_ratings.csv', encoding='UTF-8')
office_df.head()

In [None]:
sns.relplot(x='total_votes', y='imdb_rating', data=office_df)

In [None]:
office_df['air_date'] =  pd.to_datetime(office_df['air_date'], errors='ignore')

g = sns.relplot(x="air_date", y="imdb_rating", kind="scatter", data=office_df)

## Functions

We can define our own functions. A function helps us with code we are going to run multiple times. For instance, the below function scales values between 0 and 1.

Here is a modified function from [stackoverflow](https://stackoverflow.com/questions/26414913/normalize-columns-of-pandas-data-frame).

In [None]:
office_df.head()

In [None]:
def normalize(df, feature_name):
    result = df.copy()
    
    max_value = df[feature_name].max()
    min_value = df[feature_name].min()
    
    result[feature_name] = (df[feature_name] - min_value) / (max_value - min_value)
    
    return result

Passing the dataframe and name of the column will return a dataframe with that column scaled between 0 and 1.

In [None]:
normalize(office_df, 'imdb_rating')

Replacing the origonal dataframe. We can normalize both out votes and rating.

In [None]:
office_df = normalize(office_df, 'imdb_rating')

In [None]:
office_df = normalize(office_df, 'total_votes')

In [None]:
office_df

Seaborn prefers a long format table. Details of melt can be found [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.melt.html).

In [None]:
office_df_long=pd.melt(office_df, id_vars=['season', 'episode', 'title', 'air_date'], value_vars=['imdb_rating', 'total_votes'])
office_df_long

Which we can plot in seaborn like so.

In [None]:
sns.relplot(x='air_date', y='value', size='variable', data=office_df_long)

In [None]:
?sns.relplot