# Plotting

**Table of Content**

- [Load Data and Libraries](#load_data_lib)
- [Pie Chart](#piechart)
- [Bar Chart](#barchart)
- [Line Chart](#linechart)
- [Area Chart](#areachart)
- [Histogram Chart](#histchart)
- [Scatter Chart](#scatterchart)
- [Box Chart](#boxchart)
- [Summarization Chart](#summarizationchart)
- [Time Series Chart](#timeserieschart)

In [2]:
import warnings
warnings.simplefilter('ignore')

# Make figure inline
%matplotlib inline

## Load Data and Libraries <a id='load_data_lib'></a>

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

In [None]:
import pandas as pd
data = pd.read_csv('data/pokemon.csv')

## Pie Chart <a id='piechart'></a>

### pandas plot pie

In [None]:
data['Type 2'].value_counts().plot.pie()

## Bar Chart <a id='barchart'></a>

### pandas plot bar

**With Sorting:**

In [None]:
data['Type 1'].value_counts().head(20).plot.bar()

**Without Sorting:**

In [None]:
data['Type 1'].value_counts().sort_index().head(20).plot.bar()

### seaborn countplot

**Without Sorting:**

In [None]:
sns.countplot(data['Type 1'])
fig = plt.gcf()
fig.set_size_inches(12, 8)

**With Sorting:**

In [None]:
sns.countplot(data['Type 1'], order=data['Type 1'].value_counts().index)
fig = plt.gcf()
fig.set_size_inches(12, 8)

### seaborn barplot

In [None]:
# By default, sort will be performed in descending order
type1_data = data['Type 1'].value_counts().to_frame().reset_index()
# Disable sort
# type1_data = data['Type 1'].value_counts(sort=False).to_frame().reset_index()

In [None]:
type1_data.columns = ['Type', 'Count']

In [None]:
type1_data.head(20)

In [None]:
sns.barplot(x='Type', y='Count', hue='Type', data=type1_data.head(20))
fig = plt.gcf()
f_csize = fig.get_size_inches()
fig.set_size_inches(f_csize[0] * 3, f_csize[1] *2)

## Line Chart <a id='linechart'></a>

### pandas plot line

In [None]:
data['Type 1'].value_counts(sort=False).plot.line()

### seaborn kdeplot

In [None]:
sns.kdeplot(data['Type 1'].value_counts(sort=False))

In [None]:
sns.kdeplot(data['Type 1'].value_counts(sort=False), vertical=True)

### seaborn displot

In [None]:
sns.distplot(data['Type 1'].value_counts(), hist=False)

## Area Chart <a id='areachart'></a>

### pandas plot area

In [None]:
data['Type 1'].value_counts(sort=False).plot.area()

### searborn kdeplot

In [None]:
sns.kdeplot(data['Type 1'].value_counts(sort=False), shade=True)

## Histogram Chart <a id='histchart'></a>

### pandas plot hist

In [None]:
data['Type 1'].value_counts().plot.hist()

### seaborn kdeplot

In [None]:
sns.distplot(data['Type 1'].value_counts(), kde=False, hist_kws=dict(edgecolor="k", linewidth=1))

## Scatter Chart <a id='scatterchart'></a>

### pandas plot scatter

In [None]:
data.plot.scatter(x='HP', y='Attack')

### seaborn lmplot

**Without Regression:**

In [None]:
sns.lmplot(x='HP', y='Attack', data=data, fit_reg=False)

**With Regression:** (try to find the relationship between x and y)

In [None]:
sns.lmplot(x='HP', y='Attack', data=data, fit_reg=True)

## Box Chart<a id='boxchart'></a>

### pandas plot box

**Multiple Columns/Fields within the same Plot:**

In [None]:
data.boxplot(column=['Attack', 'Defense'])

**Group without Subplot:**

In [None]:
data.boxplot(column=['Attack'], by='Type 2')
fig = plt.gcf()
fig.set_size_inches(24, 8)

**Group with Subplot:**

In [None]:
data.groupby('Type 2').boxplot(column=['Attack'])
fig = plt.gcf()
fig.set_size_inches(32, 32)

### seaborn boxplot

**Multiple Columns/Fields within the same Plot:**

In [None]:
sns.boxplot(x='variable', y='value', data=pd.melt(data[['Attack', 'Defense']]))

**Group without Subplot:**

In [None]:
sns.boxplot(x='Type 2', y='Attack', data=data)
fig = plt.gcf()
fig.set_size_inches(24, 8)

**Group with Subplot:**

In [None]:
sns.catplot(kind='box', col='Type 2', col_wrap=8, x='Attack', data=data)
fig = plt.gcf()
fig.set_size_inches(32, 8)

## Summarization Chart <a id='summarizationchart'></a>

### pandas plotting parallel_coordinates

In [None]:
from pandas.plotting import parallel_coordinates

target = (
    data.loc[:, ['Type 2', 'HP', 'Attack', 'Defense', 'Speed']]
    #.applymap(lambda x: int(x) if str.isdecimal(x) else np.nan)
    .dropna()
)

parallel_coordinates(target, 'Type 2')
fig = plt.gcf()
fig.set_size_inches(24, 16)

### sns heatmap

In [None]:
corr = (
    data.loc[:, ['HP', 'Attack', 'Defense', 'Speed']]
    #.applymap(lambda x: int(x) if str.isdecimal(x) else np.nan)
    .dropna()
).corr()

sns.heatmap(corr, annot=True)

## Time Series Chart <a id='timeserieschart'></a>

Time-series variables are populated by values which are specific to a point in time. Time is linear and infinitely fine-grained, so really time-series values are a kind of special case of interval variables.

### Load Data

In [None]:
stocks = pd.read_csv("data/prices.csv", parse_dates=['date'])
stocks.drop(['Unnamed: 0'], axis=1, inplace=True)

In [None]:
stocks.head()

### Resample

**Without resampling - a totally mess** 

In [None]:
stocks['date'].value_counts().sort_values().plot.line()

**Resampling by year**

In [None]:
stocks['date'].value_counts().resample('Y').sum().plot.line()

**If datetime is set as index, other fields can leverage the power of resample**

In [None]:
stocks.set_index('date', inplace=True)

In [None]:
stocks['volume'].resample('Y').mean().plot.bar()

### Lag Plot

A lag plot compares data points from each observation in the dataset against data points from a previous observation. So for example, data from December 21st will be compared with data from December 20th, which will in turn be compared with data from December 19th, and so on. 

In [None]:
from pandas.plotting import lag_plot
lag_plot(stocks['volume'].tail(550))

### Autocorrelation Plot

 The autocorrelation plot is a multivariate summarization-type plot that lets you check *every* periodicity at the same time. It does this by computing a summary statistic&mdash;the correlation score&mdash;across every possible lag in the dataset. This is known as autocorrelation.

In an autocorrelation plot the lag is on the x-axis and the autocorrelation score is on the y-axis. The farther away the autocorrelation is from 0, the greater the influence that records that far away from each other exert on one another.

In [None]:
from pandas.plotting import autocorrelation_plot

autocorrelation_plot(stocks['volume'])
fig = plt.gcf()
fig.set_size_inches(24, 16)