![What would you like to show?](https://sun9-70.userapi.com/MG2SPUAk0SYYCFyOUkGBd-UNST5yeLoNXrfNjA/xepSbWe4jKg.jpg) 

**Trends** - A trend is defined as a pattern of change.
* sns.lineplot - **Line charts** are best to show trends over a period of time, and multiple lines can be used to show trends in more than one group.

**Relationship** - There are many different chart types that you can use to understand relationships between variables in your data.
* sns.barplot - **Bar charts** are useful for comparing quantities corresponding to different groups.
* sns.heatmap - **Heatmaps** can be used to find color-coded patterns in tables of numbers.
* sns.scatterplot - **Scatter plots** show the relationship between two continuous variables; if color-coded, we can also show the relationship with a third categorical variable.
* sns.regplot - Including a **regression line** in the scatter plot makes it easier to see any linear relationship between two variables.
* sns.lmplot - This command is useful for drawing **multiple regression lines**, if the scatter plot contains multiple, color-coded groups.
* sns.swarmplot - **Categorical scatter plots** show the relationship between a continuous variable and a categorical variable.

**Distribution** - We visualize distributions to show the possible values that we can expect to see in a variable, along with how likely they are.
* sns.distplot - **Histograms** show the distribution of a single numerical variable.
* sns.kdeplot - **KDE plots** (or **2D KDE plots**) show an estimated, smooth distribution of a single numerical variable (or two numerical variables).
* sns.jointplot - This command is useful for simultaneously displaying a 2D KDE plot with the corresponding KDE plots for each individual variable.

**Categorical scatterplots:**
* sns.stripplot() (with kind="strip"; the default)
* sns.swarmplot() (with kind="swarm")

**Categorical distribution plots:**
* sns.boxplot() (with kind="box")
* sns.violinplot() (with kind="violin")
* sns.boxenplot() (with kind="boxen")

**Categorical estimate plots:**
* sns.pointplot() (with kind="point")
* sns.barplot() (with kind="bar")
* sns.countplot() (with kind="count")

In [None]:
import warnings
warnings.filterwarnings('ignore')

import numpy             as np
import pandas            as pd
import seaborn           as sns
import matplotlib.pyplot as plt

%matplotlib inline

plt.rcParams['figure.dpi'] = 100

This is a cheat sheet for Data Visualization with Seaborn.
In this notebooks I use information from:
* kaggle notebook [Complete Data Visualization Tutorial Seaborn !!!](https://www.kaggle.com/ravichaubey1506/complete-data-visualization-tutorial-seaborn#Statistical-Relationships);
* kaggle course [Data Visualization](https://www.kaggle.com/learn/data-visualization).

*Please feel free to leave comments and advices, up-votes are really appreciated!)*

# Scatter Plots

In [None]:
# Path of the file to read
candy_filepath = '../input/data-for-datavis/candy.csv'

# Fill in the line below to read the file into a variable candy_data
candy_data = pd.read_csv(candy_filepath, index_col='id')

# Scatter plot showing the relationship between 'sugarpercent' and 'winpercent'
sns.scatterplot(x=candy_data['sugarpercent'], y=candy_data['winpercent'])

In [None]:
# Scatter plot showing the relationship between 'pricepercent', 'winpercent', and 'chocolate'
sns.scatterplot(x=candy_data['pricepercent'], y=candy_data['winpercent'], hue=candy_data['chocolate'])

In [None]:
# Color-coded scatter plot with regression lines
sns.lmplot(x='pricepercent', y='winpercent', hue='chocolate', data=candy_data)

In [None]:
# Scatter plot showing the relationship between 'chocolate' and 'winpercent'
sns.swarmplot(x=candy_data['chocolate'], y=candy_data['winpercent'])

In [None]:
tips = pd.read_csv('../input/seaborn-tips-dataset/tips.csv')

# Scatter plot with two variables (two-demensional)
sns.relplot(x='total_bill', y='tip', color='b', data=tips)

In [None]:
# Scatter plot with three variables (third dimension is a color of points)
sns.relplot(x='total_bill', y='tip', hue='smoker', palette='viridis', data=tips)

In [None]:
# Scatter plot with different marker styles
sns.relplot(x='total_bill', y='tip', hue='smoker', style='smoker',
            data=tips, palette = 'viridis')

In [None]:
# Scatter plot with four variables
sns.relplot(x='total_bill', y='tip', hue='smoker', style='time', data=tips, palette='viridis')

In [None]:
# Scatter plot with numeric hue semantic
sns.relplot(x='total_bill', y='tip', hue='size', data=tips)

In [None]:
# Scatter plot with size semantic as third variable 
sns.relplot(x='total_bill', y='tip', size='size', data=tips)

In [None]:
# Scatter plt with customized markers size
sns.relplot(x='total_bill', y='tip', size='size', sizes=(15, 200), data=tips)

# Line Plots

In [None]:
# Path of the file to read
museum_filepath = '../input/data-for-datavis/museum_visitors.csv'

# Fill in the line below to read the file into a variable museum_data
museum_data = pd.read_csv(museum_filepath, index_col='Date', parse_dates=True)

# Line chart showing the number of visitors to each museum over time
plt.figure(figsize=(10, 5))
sns.lineplot(data=museum_data)

In [None]:
# Line plot showing the number of visitors to Avila Adobe over time
plt.figure(figsize=(10, 5))
sns.lineplot(data=museum_data['Avila Adobe'])

In [None]:
df = pd.DataFrame(dict(time=np.arange(500), value=np.random.randn(500).cumsum()))

In [None]:
# Line plot using sns.relplot() with kind='line'
g = sns.relplot(x='time', y='value', kind='line', data=df)
g.fig.autofmt_xdate()

In [None]:
# Line plot without sorting x values: sort=False
df = pd.DataFrame(np.random.randn(500, 2).cumsum(axis=0), columns=['x', 'y'])
sns.relplot(x='x', y='y', sort=False, kind='line', data=df);

In [None]:
fmri = pd.read_csv('../input/seaborn-fmri-dataset/fmri.csv')

# Line plot with default aggregation the multiple measurements at each x value 
# plotting the mean and the 95% confidence interval around the mean

sns.relplot(x='timepoint', y='signal', kind='line', data=fmri, color='blue')

In [None]:
# Line plot without visualization of Confidence Interval: ci=None
sns.relplot(x='timepoint', y='signal', ci=None, kind='line', color='blue', data=fmri)

In [None]:
# Lineplot with standard deviation instead of confidence interval
sns.relplot(x='timepoint', y='signal', kind='line', ci="sd", data=fmri)

In [None]:
# Line plot without aggregation: estimator=None
sns.relplot(x='timepoint', y='signal', estimator=None, kind='line', data=fmri)

In [None]:
# Line plot with aggregation for three variables (hue semantic)
sns.relplot(x='timepoint', y='signal', hue='event', kind='line', data=fmri)

In [None]:
# Line plot with aggregation wuth four variables (x, y, hue, style) without markers
sns.relplot(x="timepoint", y="signal", hue="region", style="event",
            kind="line", data=fmri)

In [None]:
# line plot with aggregation wuth four variables (x, y, hue, style) with markers

sns.relplot(x="timepoint", y="signal", hue="region", style="event",
            dashes=False, markers=True, kind="line", data=fmri);

In [None]:
# line plot with both hue and style used for one variable

sns.relplot(x="timepoint", y="signal", hue="event", style="event",
            kind="line", data=fmri);

In [None]:
# line plot with numeric hue variable

dots = pd.read_csv('../input/seaborn-dots-dataset/dots.csv').query("align == 'dots'")
sns.relplot(x="time", y="firing_rate",
            hue="coherence", style="choice",
            kind="line", data=dots);

In [None]:
# line plot with customized specific color values for each line

palette = sns.cubehelix_palette(light=0.6, n_colors=6)
sns.relplot(x="time", y="firing_rate",
            hue="coherence", style="choice",
            palette=palette,
            kind="line", data=dots);

In [None]:
# line plot with Data values

df = pd.DataFrame(dict(time=pd.date_range("2017-1-1", periods=500),
                       value=np.random.randn(500).cumsum()))
g = sns.relplot(x="time", y="value", kind="line", data=df)
g.fig.autofmt_xdate()

In [None]:
# line plots on one picture with subset of data devided by columns and rows

sns.relplot(x="timepoint", y="signal", hue="subject",
            col="region", row="event",palette = 'viridis', height=3,
            kind="line", estimator=None, data=fmri);

In [None]:
# line plots faceted on the columns and 'wraped' into rows

sns.relplot(x="timepoint", y="signal", hue="event", style="event",
            col="subject", col_wrap=5,palette = 'viridis',
            height=3, aspect=.75, linewidth=2.5,
            kind="line", data=fmri.query("region == 'frontal'"));

In [None]:
sns.set(style="ticks", color_codes=True)

# Categorical Scatterplots

In [None]:
# catplot with default parameters

sns.catplot(x="day", y="total_bill", data=tips, jitter = True)

In [None]:
# catplot with jitter=False

sns.catplot(x="day", y="total_bill", data=tips, jitter = False);

In [None]:
# catplot without points overlapping: kind='swarm'

sns.catplot(x="day", y="total_bill", kind="swarm", data=tips)

In [None]:
# catplot with third variable - hue semantic

sns.catplot(x="day", y="total_bill", hue="sex", kind="swarm", data=tips);

In [None]:
sns.catplot(x="size", y="total_bill", kind="swarm",
            data=tips.query("size != 3"));

In [None]:
# catplot with customized values order

sns.catplot(x="smoker", y="tip", order=["No", "Yes"], data=tips);

In [None]:
# catplot with horizontal axis of categorical data

sns.catplot(x="total_bill", y="day", hue="time", kind="swarm", data=tips);

# Boxplots

In [None]:
# boxplot

sns.catplot(x="day", y="total_bill", kind="box", data=tips);

In [None]:
# boxplot with third variable by hue semantic

sns.catplot(x="day", y="total_bill", hue="smoker", kind="box", data=tips);

In [None]:
# boxplot without 'dodging': dodge=False

tips["weekend"] = tips["day"].isin(["Sat", "Sun"])
sns.catplot(x="day", y="total_bill", hue="weekend",
            kind="box", dodge=False, data=tips);

In [None]:
# boxplot with 'boxen' style

diamonds = pd.read_csv('../input/diamonds/diamonds.csv')
sns.catplot(x="color", y="price", kind="boxen",
            data=diamonds.sort_values("color"));

# Violinplots

In [None]:
sns.catplot(x="total_bill", y="day", hue="sex",
            kind="violin", data=tips);

In [None]:
# violinplot with 'splited' violins

sns.catplot(x="day", y="total_bill", hue="sex",
            kind="violin", split=True, data=tips);

In [None]:
sns.catplot(x="day", y="total_bill", hue="sex",
            kind="violin", inner="stick", split=True,
            palette="pastel", data=tips);

In [None]:
# violinplot combined with swarmplot

g = sns.catplot(x="day", y="total_bill", kind="violin", inner=None, data=tips)
sns.swarmplot(x="day", y="total_bill", color="k", size=3, data=tips, ax=g.ax);

# Bar plots

In [None]:
# Path of the file to read
ign_filepath = '../input/data-for-datavis/ign_scores.csv'

# Fill in the line below to read the file into a variable ign_data
ign_data = pd.read_csv(ign_filepath, index_col="Platform")

# Bar chart showing average score for racing games by platform
plt.figure(figsize=(15, 4)) # Your code here
sns.barplot(x=ign_data.index, y=ign_data['Racing'])

In [None]:
titanic = pd.read_csv('../input/python-seaborn-datas/titanic.csv')
sns.catplot(x="Sex", y="Survived", hue="Pclass", kind="bar", data=titanic);

In [None]:
sns.catplot(x="Survived", kind="count", palette="ch:.25", data=titanic);

In [None]:
sns.catplot(y="Survived", hue="Pclass", kind="count",
            palette="pastel", edgecolor=".6",
            data=titanic);

# Count Plots

In [None]:
plt.figure(figsize=[10,5])
sns.countplot(x = 'chocolate', hue = 'hard', data = candy_data)
plt.xticks(rotation = 20);

# Heatmap

In [None]:
# Heatmap showing average game score by platform and genre
plt.figure(figsize=(14, 7))
sns.heatmap(data=ign_data, annot=True)

# Point plots

In [None]:
sns.catplot(x="Sex", y="Survived", hue="Pclass", kind="point", data=titanic);

In [None]:
sns.catplot(x="Pclass", y="Survived", hue="Sex",
            palette={"male": "g", "female": "m"},
            markers=["^", "o"], linestyles=["-", "--"],
            kind="point", data=titanic);

In [None]:
iris = pd.read_csv('../input/seaborn-iris-dataset/iris.csv')
sns.catplot(data=iris, orient="h", kind="box");

In [None]:
sns.violinplot(x=iris.species, y=iris.sepal_length);

In [None]:
g = sns.catplot(x="Fare", y="Survived", row="Pclass",
                kind="box", orient="h", height=1.5, aspect=4,
                data=titanic.query("Fare > 0"))
g.set(xscale="log");

In [None]:
from scipy import stats
sns.set(color_codes=True)

# Plotting Distributions

In [None]:
# Paths of the files to read
cancer_b_filepath = '../input/data-for-datavis/cancer_b.csv'
cancer_m_filepath = '../input/data-for-datavis/cancer_m.csv'

# Fill in the line below to read the (benign) file into a variable cancer_b_data
cancer_b_data = pd.read_csv(cancer_b_filepath, index_col='Id')

# Fill in the line below to read the (malignant) file into a variable cancer_m_data
cancer_m_data = pd.read_csv(cancer_m_filepath, index_col='Id')

In [None]:
# Histograms for benign and maligant tumors
sns.distplot(a=cancer_m_data['Area (mean)'], kde=False)
sns.distplot(a=cancer_b_data['Area (mean)'], kde=False)

In [None]:
# KDE plots for benign and malignant tumors
sns.kdeplot(data=cancer_b_data['Radius (worst)'], shade=True)
sns.kdeplot(data=cancer_m_data['Radius (worst)'], shade=True)

In [None]:
x = np.random.normal(size=100)
sns.distplot(x);

In [None]:
# displot as Histogram

sns.distplot(x, kde=False, rug=True);

In [None]:
sns.distplot(x, bins=20, kde=False, rug=True);

In [None]:
sns.distplot(x, hist=False, rug=True);

In [None]:
sns.kdeplot(x, shade=True);

In [None]:
# The bandwidth (bw) parameter of the KDE controls how tightly the estimation is fit to the data,
# much like the bin size in a histogram. 
# The default behavior tries to guess a good value using a common reference rule, 
# but it may be helpful to try larger or smaller values

sns.kdeplot(x)
sns.kdeplot(x, bw=.2, label="bw: 0.2")
sns.kdeplot(x, bw=2, label="bw: 2")
plt.legend();

In [None]:
sns.kdeplot(x, shade=True, cut=0)
sns.rugplot(x);

In [None]:
mean, cov = [0, 1], [(1, .5), (.5, 1)]
data = np.random.multivariate_normal(mean, cov, 200)
df = pd.DataFrame(data, columns=["x", "y"])

# Jointplots

In [None]:
# jointplot for sctterplots - visualizing a bivariate distribution

sns.jointplot(x="x", y="y", data=df);

In [None]:
# jointplot 'hexbin' plot - a bivariate analogue of a histogram - 

x, y = np.random.multivariate_normal(mean, cov, 1000).T
with sns.axes_style("white"):
    sns.jointplot(x=x, y=y, kind="hex", color="k");

In [None]:
# jointplot as KDE

sns.jointplot(x="x", y="y", data=df, kind="kde");

In [None]:
f, ax = plt.subplots(figsize=(6, 6))
sns.kdeplot(df.x, df.y, ax=ax)
sns.rugplot(df.x, color="g", ax=ax)
sns.rugplot(df.y, vertical=True, ax=ax);

In [None]:
f, ax = plt.subplots(figsize=(6, 6))
cmap = sns.cubehelix_palette(as_cmap=True, dark=0, light=1, reverse=True)
sns.kdeplot(df.x, df.y, cmap=cmap, n_levels=60, shade=True);

In [None]:
# customizing JointGrid object

g = sns.jointplot(x="x", y="y", data=df, kind="kde", color="m")
g.plot_joint(plt.scatter, c="w", s=30, linewidth=1, marker="+")
g.ax_joint.collections[0].set_alpha(0)
g.set_axis_labels("$X$", "$Y$");

# Pairplot

In [None]:
sns.pairplot(iris);

In [None]:
sns.pairplot(iris, hue="species");

In [None]:
g = sns.PairGrid(iris)
g.map_diag(sns.kdeplot)
g.map_offdiag(sns.kdeplot, n_levels=6);

# Linear Relationships

In [None]:
sns.regplot(x="total_bill", y="tip", data=tips);

In [None]:
sns.lmplot(x="size", y="tip", data=tips);

In [None]:
sns.lmplot(x="size", y="tip", data=tips, x_jitter=.05);

In [None]:
sns.lmplot(x="size", y="tip", data=tips, x_estimator=np.mean);

In [None]:
# lmplot with binary y variable

tips["big_tip"] = (tips.tip / tips.total_bill) > .15
sns.lmplot(x="total_bill", y="big_tip", data=tips,
           y_jitter=.03);

In [None]:
# lmplot with Loistic regression (in case of binary y data)

sns.lmplot(x="total_bill", y="big_tip", data=tips,
           logistic=True, y_jitter=.03);

In [None]:
sns.lmplot(x="total_bill", y="tip", data=tips,
           lowess=True);

In [None]:
sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips);

In [None]:
sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips,
           markers=["o", "x"], palette="Set1");

In [None]:
sns.lmplot(x="total_bill", y="tip", hue="smoker",
           col="time", row="sex", data=tips);

# FacetGrid

In [None]:
g = sns.FacetGrid(tips, col="sex", hue="smoker")
g.map(plt.scatter, "total_bill", "tip", alpha=.7)
g.add_legend();

In [None]:
ordered_days = tips.day.value_counts().index

g = sns.FacetGrid(tips, row="day", row_order=ordered_days,
                  height=1.7, aspect=4,)

g.map(sns.distplot, "total_bill", hist=False, rug=True);

# Changing styles with seaborn

Seaborn has five different themes:
* 'darkgrid',
* 'whitegrid',
* 'dark',
* 'white',
* 'ticks'.

In [None]:
# Change the style of the figure to the "dark" theme
sns.set_style('dark')
plt.figure(figsize=(10, 4))
sns.lineplot(data=museum_data['Avila Adobe'])