![ine-divider](https://user-images.githubusercontent.com/7065401/92672068-398e8080-f2ee-11ea-82d6-ad53f7feb5c0.png)
<hr>

# Seaborn statistical plots (Solutions)

![orange-divider](https://user-images.githubusercontent.com/7065401/92672455-187a5f80-f2ef-11ea-890c-40be9474f7b7.png)

In [None]:
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Titanic Survivors

These exercises can be performed with either Pandas `.plot()` or Seaborn methods.

In [None]:
titanic = pd.read_csv('data/titanic.csv')

<div class='alert alert-warning'>Warning: there are some missing data in this DataFrame</div>

In [None]:
titanic.info()

## EX1: age distributions

Plot the distribution of the ages of all passengers and visually determine the mode.

In [None]:
sns.displot(titanic.age.dropna(), bins=int(titanic.age.max()));

## EX2: Survived

Group the ages by survival and show a box plot.

### Seaborn

In [None]:
sns.boxplot(y=np.array(titanic.age), x=titanic.alive);

In [None]:
sns.catplot(y='age', x='alive', kind='box', data=titanic);

### Pandas

In [None]:
titanic[['age','alive']].boxplot(by='alive', sym='k.', figsize=(8, 6));

## EX3: Plot CDF

Plot the cumulative distribution functions of the ages of the passengers by survival.

*Is age a reasonable determining factor for survival?*

In [None]:
cdf = titanic.groupby('alive')['age'].value_counts(normalize=True)
ax = cdf['no'].sort_index().cumsum().plot(drawstyle='steps', label='dead', legend=True)
cdf['yes'].sort_index().cumsum().plot(drawstyle='steps', label='alive', legend=True);

# Tips

In [None]:
tips = pd.read_csv('data/tips.csv')

In [None]:
tips.info()

## EX1: tip fraction

1. compute the tip fraction
2. Plot the tip fraction against the total bill with a regression
  1. Is the rate of tipping per the size of the total bill constant?
  2. Does smoking, sex, or time matter?

In [None]:
tips['tip_frac'] = tips['tip'] / (tips['total_bill'] - tips['tip'])
tips.sort_values('tip_frac').tail()

In [None]:
# a 250% and a 70% tip are very strange
tips = tips.sort_values('tip_frac')[:-2]

In [None]:
sns.lmplot(x='total_bill', y='tip_frac', data=tips);

In [None]:
sns.lmplot(x='total_bill', y='tip_frac', hue='smoker', data=tips)

In [None]:
sns.lmplot(x='total_bill', y='tip_frac', hue='time', data=tips)

In [None]:
sns.lmplot(x='total_bill', y='tip_frac', hue='sex', data=tips);