# Matplotlib
Pandas documentation: http://matplotlib.org/contents.html

Matploblib can be used for creating plots and charts.

### Caution
Matplotlib의 sub-module인 pyplot을 **Windows**환경에서 구성하는 데에 문제가 있습니다. (이외에 Linux(especially Ubuntu)와 OS X에서는 문제가 없음을 확인하였습니다.)

이를 해결하기 위해서는 {your anaconda directory}\Lib\site-packages\matplotlib\font_manager.py를 수정해야 합니다. fond_manager.py 내에 `win32InstalledFonts()` 함수 안의 내용 중에 일부를 다음과 같이 바꾸어야 합니다.


```{.python}
key, direc, any = winreg.EnumValue( local, j)
if not is_string_like(direc):
    continue
if not os.path.dirname(direc):
    direc = os.path.join(directory, direc)
direc = direc.split('\0', 1)[0]
```


저의 경우에는 `direc = os.path.abspath(direc).lower()`를 `direc = direc.split('\0', 1)[0]`로 변경하여 문제를 해결하였습니다.

참고한 글은 http://stackoverflow.com/a/34007642 입니다.

## 0. Basic

The library is generally used as follows:

- Call a plotting function with some data (e.g., plot())
- Call many functions to setup the properties of the plot (e.g., labes ald colors)
- Make the plot visible (e.g., show())

In [1]:
# matplotlib의 결과물을 notebook에 바로 띄우기 위한 문구
%matplotlib inline

# import packages
# 여기에서는 matplotlib의 submodule인 pyplot을 주로 활용
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# increase default figure and font sizes for easier viewing
plt.rcParams['figure.figsize'] = (8, 6)
plt.rcParams['font.size'] = 14

In [None]:
# basic line plot
myarray = np.array([1,2,3])
plt.plot(myarray)
plt.xlabel('x axis')
plt.ylabel('y axis')
plt.title('basic line plot')
plt.show()

In [None]:
# basic scatter plot
x = np.array([1, 2, 3])
y = np.array([2, 5, 3])
plt.scatter(x,y)
plt.xlabel('some x axis')
plt.ylabel('some y axis')
plt.title('basic scatter plot')
plt.show()

## 1. Histogram
Purpose: Showing the distribution of a numerical variable

In [None]:
# Example data: Drinks data
drink_cols = ['country', 'beer', 'spirit', 'wine', 'liters', 'continent']
url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv'
drinks = pd.read_csv(url, header=0, names=drink_cols, na_filter=False)

In [None]:
# sort the beer column and mentally split it into 3 groups
drinks.beer.order().values

In [None]:
# compare with histogram
drinks.beer.plot(kind='hist', bins=3)

In [None]:
# try more bins
# add title and labels
drinks.beer.plot(kind='hist', bins=20)
plt.xlabel('Beer Servings')
plt.ylabel('Frequency')
plt.title('Histogram of Beer Servings')

In [None]:
# compare with density plot (smooth version of a histogram)
drinks.beer.plot(kind='density', xlim=(0, 500))

In [None]:
# histogram of beer servings grouped by continent
drinks.hist(column='beer', by='continent')

In [None]:
# share the x axes
drinks.hist(column='beer', by='continent', sharex=True)

In [None]:
# share the x and y axes
drinks.hist(column='beer', by='continent', sharex=True, sharey=True)

In [None]:
# change the layout
drinks.hist(column='beer', by='continent', sharex=True, layout=(2, 3))

## 2. Scatter plot
Purpose: Showing the relationship between two numerical variables

In [None]:
# select the beer and wine columns and sort by beer
drinks[['beer', 'wine']].sort('beer').values

In [None]:
# compare with scatter plot
drinks.plot(kind='scatter', x='beer', y='wine')

In [None]:
# add transparency
drinks.plot(kind='scatter', x='beer', y='wine', alpha=0.3)

In [None]:
# vary point color by spirit servings
drinks.plot(kind='scatter', x='beer', y='wine', c='spirit', colormap='Blues')

In [None]:
# scatter matrix of three numerical columns
columns = ['beer', 'spirit', 'wine']
pd.scatter_matrix(drinks[columns])

In [None]:
# increase figure size
pd.scatter_matrix(drinks[columns], figsize=(10, 8))

## 3. Bar plot
Purpose: Showing a numerical comparison across different categories

In [None]:
# count the number of countries in each continent
drinks.continent.value_counts()

In [None]:
# compare with bar plot
drinks.continent.value_counts().plot(kind='bar')

In [None]:
# calculate the mean alcohol amounts for each continent
drinks.groupby('continent').mean()

In [None]:
# side-by-side bar plots
drinks.groupby('continent').mean().plot(kind='bar')

In [None]:
# drop the liters column
drinks.groupby('continent').mean().drop('liters', axis=1).plot(kind='bar')

In [None]:
# stacked bar plots
drinks.groupby('continent').mean().drop('liters', axis=1).plot(kind='bar', stacked=True)

## 4. Box plot
Purpose: Showing quartiles and outliers for one or more numerical variables

**Five-number summary:**

- min = minimum value
- 25% = first quartile (Q1) = median of the lower half of the data
- 50% = second quartile (Q2) = median of the data
- 75% = third quartile (Q3) = median of the upper half of the data
- max = maximum value

(More useful than mean and standard deviation for describing skewed distributions)

**Interquartile Range (IQR)** = Q3 - Q1

**Outliers:**

- below Q1 - 1.5 * IQR
- above Q3 + 1.5 * IQR

In [None]:
# sort the spirit column
drinks.spirit.order().values

In [None]:
# show "five-number summary" for spirit
drinks.spirit.describe()

In [None]:
# compare with box plot
drinks.spirit.plot(kind='box')

In [None]:
# include multiple variables
drinks.drop('liters', axis=1).plot(kind='box')

In [None]:
# reminder: box plot of beer servings
drinks.beer.plot(kind='box')

In [None]:
# box plot of beer servings grouped by continent
drinks.boxplot(column='beer', by='continent')

In [None]:
# box plot of all numeric columns grouped by continent
drinks.boxplot(by='continent')

## 5. Line plot
Purpose: Showing the trend of a numerical variable over time

In [None]:
# read in the ufo data
url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/ufo.csv'
ufo = pd.read_csv(url)
ufo['Time'] = pd.to_datetime(ufo.Time)
ufo['Year'] = ufo.Time.dt.year

In [None]:
# count the number of ufo reports each year (and sort by year)
ufo.Year.value_counts().sort_index()

In [None]:
# compare with line plot
ufo.Year.value_counts().sort_index().plot()

In [None]:
# don't use a line plot when there is no logical ordering
drinks.continent.value_counts().plot()

## 6. Saving a plot and Changing a style

In [None]:
# saving a plot to a file
drinks.beer.plot(kind='hist', bins=20, title='Histogram of Beer Servings')
plt.xlabel('Beer Servings')
plt.ylabel('Frequency')
plt.savefig('beer_histogram_original.png')

In [None]:
# list available plot styles
plt.style.available

In [None]:
# change to a different style
plt.style.use('ggplot')

In [None]:
# saving a plot to a file
drinks.beer.plot(kind='hist', bins=20, title='Histogram of Beer Servings')
plt.xlabel('Beer Servings')
plt.ylabel('Frequency')
plt.savefig('beer_histogram_ggplot.png')

In [None]:
# Other example
plt.style.use('seaborn-pastel')
drinks.beer.plot(kind='hist', bins=20, title='Histogram of Beer Servings')
plt.xlabel('Beer Servings')
plt.ylabel('Frequency')
plt.savefig('beer_histogram_seaborn.png')

# 7. Style

**Marker**:
- http://matplotlib.org/1.5.1/examples/lines_bars_and_markers/marker_reference.html
- http://matplotlib.org/api/markers_api.html

**Line**:
- http://matplotlib.org/1.5.1/examples/lines_bars_and_markers/line_styles_reference.html
- http://matplotlib.org/api/lines_api.html

**Color**:
- http://matplotlib.org/examples/color/named_colors.html
- http://matplotlib.org/api/colors_api.html


In [None]:
from IPython.display import Image

In [None]:
Image(filename="./img/marker_reference_00.png")

In [None]:
Image(filename="./img/marker_reference_01.png")

In [None]:
Image(filename="./img/line_styles_reference.png")

In [None]:
Image(filename="./img/named_colors.png")

In [None]:
Image(filename="./img/named_colors_brief.png")

In [None]:
# from http://matplotlib.org/users/pyplot_tutorial.html
# 자유롭게 변형해보면서 그림을 수정해보시오.

def f(t):
    return np.exp(-t) * np.cos(2*np.pi*t)

t1 = np.arange(0.0, 5.0, 0.1)
t2 = np.arange(0.0, 5.0, 0.02)

plt.figure(1)
plt.subplot(211)
plt.plot(t1, f(t1), 'ro', t2, f(t2), 'k') # 'ro': 빨간색 동그라미, 'k': 검은색

plt.subplot(212)
plt.plot(t2, np.cos(2*np.pi*t2), 'r--') # 'r--': 빨간색 점선
plt.show()