<a href="https://colab.research.google.com/github/kaushil24/DS-Cookbook101/blob/master/Seaborn_Recipes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Beautiful decoration recipes
* Change figure size
* Change font size
* Distribution plot
* Scatter plot
* Frequency distribution 
* Box Plot
* Heat Map
* Pair Plot
* Pie Chart


# Seaborn

**A good [article](https://www.analyticsvidhya.com/blog/2019/09/comprehensive-data-visualization-guide-seaborn-python/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+AnalyticsVidhya+%28Analytics+Vidhya%29) on various seaborn plots** 

## Change figure size:

```
plt.figure(figsize=(width, hieght))
sns.boxplot(x="col_name", y="col_name", data=data)
```

## Change font Properties 

### Change axes' and title's font size 

```
b = sns.boxplot(x=tips["total_bill"])
b.axes.set_title("Title",fontsize=50)
b.set_xlabel("X Label",fontsize=30)
b.set_ylabel("Y Label",fontsize=20)
b.tick_params(labelsize=5)
```

### Change all font style or size

```
sns.set(context='notebook', style='darkgrid', palette='deep', font='sans-serif',  font_scale=1, color_codes=True, rc=None)
```

### Chenge label rotation on axes:

```
plt.xticks(rotation=45, ha='right')
```

![Image here](https://raw.githubusercontent.com/kaushil24/DS-Cookbook101/master/Media/count_plot_rotated_X_labels.png)

**Approach 2:**
```
plt.figure(figsize=(10,5))
chart = sns.countplot(
    data=data[data['Year'] == 1980],
    x='Sport',
    palette='Set1'
)
chart.set_xticklabels(chart.get_xticklabels(), rotation=45)
```

## Distribution Plot:

```
sns.distplot(df['col_name'])
```

### To compare your distribution with normal distribution on the same plot
```
from scipy.stats import norm
sns.distplot(df['col'], fit=norm)
```
Something like this:
![alt text](https://i.imgur.com/XRj7Z4T.png)

### Modify x-axis label range:

```
plt.subplots(figsize = (width, height))
sns.distplot(df[col], color = 'g')
plt.xticks([i for i in range(start_x_axis, end_x_axis , bin_width )])
plt.show()
```


### Multiple Distribution plots on same graph:

Used to compare multiple distributions on same scale. <br>
For eg: You want to compare age dirstibution of those people who were geanted loan andfor those who were not granted loans, you can use this to plot everything on the same graph. 

```
sns.distplot( df['col_1'] ) # (Optional paramete to change color color = 'b' )  
sns.distplot( df['col_2'] )
plt.legend(['col_1', 'col_2'])
```

### Probability plot 
**Note: This is not a Seaborn function, but since this section has all the plotting functions, I'll add it here:**

```
from scipy import stats
import matplotlib.pyplot as plt
stats.pairplot(df['col'], plot = plt)
```

## Scatter Plot:

Note: You can also directly pass a vector to 'x' and 'y'. <br>
Docs: https://seaborn.pydata.org/generated/seaborn.scatterplot.html

```
sns.scatterplot(x = 'col_name', y = 'col_name', data = dataFrame, hue = 'col_name')
```

## Frequency distribution histogram:
(aka countplot)

```
sns.countplot(x = 'col_name', hue = 'hues', data=df, order = df['col_name'].value_counts().index)
```

## Box Plot <br>

For categorical features, you can directly pass the string of the feature name and sns will automatically draw an individual boxplot for each category occuring in that feature. 

```
sns.boxplot(x = 'col_name', y = 'col_name', data = dataFrame)
```


## Violin Plot

```
plt.figure(figsize = (10, 8))
sns.violinplot(df[col], inner = 'quartile', orient = 'v')
plt.show()
```


## Heat Map

https://seaborn.pydata.org/generated/seaborn.heatmap.html
```
import matplotlib.pyplot as plt
colormap = plt.cm.RdBu
sns.heatmap(cor, annot=True, linewidths = float, cmap=colormap linecolor='black')
```
**Note: If you dont want to dipslay the numeric value of each cell, put annot=False**

### Corelation matrix:
An easy way to visualize corealtion is to use heat map. Use the following snippet to display heat map.
```
corr_mtx = df.corr()
sns.heatmap(corr_mtx, annot = True)
```

### Display heat map of only those columns with corelation greater than threshold (theta)
```
cor = df.corr()
cols = cor[cor > theta].notnull().sum() > 1
sns.heatmap(cor.loc[cols, cols])
```

### Highlight cells with corelation values greater than threshold

```
sns.heatmap(cor[:][(cor>theta) | (cor< -theta)], annot=True, linewidths=1, linecolor='white')
```

**EXPLAINATION**<br>
* ```cor[:]``` -> Means all the columns of correlation matrix
* ```cor[:][(cor>theta) | (cor< -theta)]``` -> Means all those rows having values either greater than theta or lesser than theta. The "|" means logical or operator. 

![something like this ](https://i.imgur.com/XWiQsem.png)

## Pair Plot
Used to plot pairwise relationship in a dataset. <br> 
**ONE OF THE BEST CHARTS TO SEE RELATIONSHIPS**

```
sns.pair(df[col_list])
```

## Pie Chart
Using matplotlib for this one. <br>
The ```autopct="%1.1f%%"``` is used to display % in the graph.

```
col_name = 'col_name'
unique_values_count = [ (df[col_name]==a).sum() for a in df[col_name].value_counts().index]
labels = df[col_name].value_counts().index
plt.pie(unique_values_count, labels=labels, autopct="%1.1f%%")
plt.legend()
plt.show()
```

## Multiple Plots

Generic code for 1 dimentional multiple plot:
```
c = 2
f, axes = plt.subplots(nrows = 1, ncols = c, figsize = (c * width_of_each_plot, height_of_each_plot))

for i in range(c):
  sns.countplot(df[col], ax = axes[i])

plt.show()
```

Generic code for 2 dimentional multiple plot:
```
r, c = 2, 2
f, axes = plt.subplots(nrows = r, ncols = c, figsize = (c * width_of_each_plot, r * height_of_each_plot))

for i in range(r):
  for j in range(c):
    sns.countplot(df[col], ax = axes[i, j])

plt.show()
```

### Code for plotting numeric columns on grid plots: <br>
Decide `r, c` such that the length of your `numeric_cols` is closest to `r*c`. Eg, if `len(numeric_cols)` is 11, possible values of `r,c` are `r = 3, c = 4` or `r = 4, c = 3` or `r = 2, c = 6` etc. <br>
The `try...except` block is to prevent `index out of bound error`.  
```
numeric_cols = df.select_dtypes(exclude = 'object').columns.to_list()
print(len(numeric_cols))
r, c = 3, 4
f, axes = plt.subplots(nrows = 3, ncols = 4, figsize = (4*8, 3*8))
try:
    for i in range(r):
        for j in range(c):
            idx = i*c + j
            p = sns.distplot(df[numeric_cols[idx]], ax = axes[i, j])
            p.set_xlabel(numeric_cols[idx],fontsize=25)
except:
    pass
```
 

### Simillarly for plotting categorical columns on  grid plot:
```
cat_cols = df.select_dtypes(include = 'object').columns.to_list()
print(len(cat_cols))
r, c = 3, 4
f, axes = plt.subplots(nrows = 3, ncols = 4, figsize = (4*8, 3*8))
try:
    for i in range(r):
        for j in range(c):
            idx = i*c + j
            sns.distplot(df[cat_cols[idx]], ax = axes[i, j]).set_xlabel(cat_cols[idx],fontsize=25)
except:
    pass
```