# 8. Multivariate EDA - Categorical vs Continuous
All the plots in the categorical section in the [seaborn tutorial](http://seaborn.pydata.org/tutorial/categorical.html) will be helpful here. We will be plotting categorical and continuous variable together.

In [None]:
%matplotlib inline
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

diamonds = pd.read_csv('../data/diamonds.csv')

new_order = ['cut', 'color', 'clarity','carat', 'price', 'x', 'y','z','depth', 'table']
diamonds = diamonds[new_order]

order = ['Fair', 'Good', 'Very Good', 'Premium', 'Ideal']
diamonds['cut'] = pd.Categorical(diamonds['cut'], ordered=True, categories=order)

order = ['J', 'I', 'H', 'G', 'F', 'E', 'D']
diamonds['color'] = pd.Categorical(diamonds['color'], ordered=True, categories=order)

order = ['I1', 'SI2', 'SI1', 'VS2', 'VS1', 'VVS2', 'VVS1', 'IF']
diamonds['clarity'] = pd.Categorical(diamonds['clarity'], ordered=True, categories=order)

### Comparing all categories vs all continuous variables
The Figure below plots the mean of the price at every level of category for the three categorical variables.

Interestingly, there appears to be a negative relationship between color and price. The better the color, the lower the price of the diamond. This is the exact opposite of what we might expect. The same negative relationship is seen in regards to clarity.

In [None]:
sns.barplot('color', 'price', data=diamonds, ci=None)

In [None]:
sns.barplot('cut', 'price', data=diamonds, ci=None)

In [None]:
sns.barplot('clarity', 'price', data=diamonds, ci=None)

### Multiplicative effect
Let's add another variable and split each value of clarity into its own plot. Again, it seems that there is a negative relationship between color and price even when we hold clarity constant. There is one obvious exception in the bottom right plot where clarity is the best.

In [None]:
sns.catplot(x='color', y='price', data=diamonds, kind='bar', col='clarity', col_wrap=4, ci=None)

### cut also has a negative relationship with price
Surprisingly, the diamonds with the best color (D) and the best cut (Ideal) have the lowest priced diamonds. This is very unintuitive.

In [None]:
sns.catplot(x='color', y='price', data=diamonds, kind='bar', col='cut', ci=None, col_wrap=3)

### Heat map to identify high and low prices

In [None]:
color_clarity_price_mean = diamonds.pivot_table(index='color', columns='clarity', values='price')

In [None]:
sns.heatmap(color_clarity_price_mean)

# Exercise
Replicate on your dataset