# Data Visualization
Data visualization is very important in data analytics, especially during data exploration.
Although there are many great data visualization tools which are more user-friendly, the ability to embed data visualization provides flexibility and allows us to keep data preparation, data exploration, and data visualization in one place.

There are several python packages that create great visualization.  Regardless, matplotlib is a bread and butter tool that work well with pandas.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

Note that the '%matplotlib inline' directive tells jupyter to display a plot right below the code

## matplotlib

In [None]:
s1 = pd.Series(range(10))
s1

In [None]:
plt.plot(s1)

### Figure and Subplots
Plots in matplotlib reside within a *Figure* object.  You can create a new figure with *plt.figure* command.
With figure, we can add plots, configure the apperance, and manage the display through the figure.

In [None]:
fig = plt.figure()
ax1 = fig.add_subplot(2, 2, 1)
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 2, 3)

Note that for jupyter, plots are reset after each cell is evaluated.  Thus, to create complex plots, we will have to put all plot commands in a single cell.

In [None]:
fig = plt.figure()
ax1 = fig.add_subplot(2, 2, 1)
ax1.hist(np.random.randn(10000), bins=20, color='k', alpha=0.3)
ax2 = fig.add_subplot(2, 2, 2)
ax2.scatter(np.arange(30), np.arange(30)+3*np.random.randn(30))
ax3 = fig.add_subplot(2, 2, 3)
ax3.plot(np.random.randn(50).cumsum(), 'k--')

matplotlib provides helper function to create a grid of subplots.

In [None]:
fig, axes = plt.subplots(2, 3)
axes[0, 0].hist(np.random.randn(100), bins=20, color='k', alpha=0.3)
axes[1, 1].scatter(np.arange(30), np.arange(30)+3*np.random.randn(30))
axes[1, 2].plot(np.random.randn(50).cumsum(), 'k--')

### Colors, Markers, and Line Styles
We can control the style of the plot by setting the parameters or style shortcut.

In [None]:
plt.plot(s1, 'g--')

In [None]:
plt.plot(s1, linestyle='--', color='g')

In [None]:
data = np.random.randn(30).cumsum()
plt.plot(data, 'bo--')

In [None]:
plt.plot(data, color='b', linestyle='dashed', marker='o')

We can use drawstyle to alter the plot behavior.

In [None]:
plt.plot(data, 'b--', label='default')
plt.plot(data, 'k-', drawstyle='steps-post', label='steps-post')
plt.legend(loc='best')

### Ticks, labels, and legends
We can use pyplot and ax.subplot to manage plot decorations.  There are several decorations including ticks, labels, legends, etc.

In [None]:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(np.random.randn(1000).cumsum())
ticks = ax.set_xticks([0, 250, 500, 750, 1000])
labels = ax.set_xticklabels(['one', 'two', 'three', 'four', 'five'], rotation=30, fontsize='small')
props = {
    'title': 'My random plot',
    'xlabel': 'Stages'
}
ax.set(**props)

We can add legend by assigning label to each plot and set the location of the legends.

In [None]:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

ax.plot(np.random.randn(1000).cumsum(), 'k', label='one')
ax.plot(np.random.randn(1000).cumsum(), 'k--', label='two')
ax.plot(np.random.randn(1000).cumsum(), 'k.', label='three')

ax.legend(loc='best')

## Ploting with Pandas
pandas provides several helper functions to integrate its data structure with matplotlib.

### Line Plots
Series and DataFrame have a *plot* attribute for making some basic plot types.

In [None]:
s = pd.Series(np.random.randn(10).cumsum(), index=np.arange(0, 100, 10))
s.plot()

In [None]:
np.random.randn(10).cumsum()

In [None]:
df = pd.DataFrame(np.random.randn(10, 4).cumsum(0), columns=['A', 'B', 'C', 'D'], index=np.arange(0, 100, 10))
df = df.apply(abs)

In [None]:
df.plot()

In [None]:
fig, axes = plt.subplots(2, 1)
df.A.plot.bar(ax=axes[0], color='r', alpha=0.7)
df.B.plot.barh(ax=axes[1], color='b', alpha=0.7)

In [None]:
df.plot.bar()

In [None]:
df.plot.barh(stacked=True, alpha=0.5)

In [None]:
df2 = pd.DataFrame(np.random.randn(1000))
df2.hist(bins=50)