# Plotting with pandas Series

Both pandas Series and DataFrames have a `plot` method capable of creating a variety of plots with the data they contain. pandas directly uses matplotlib for all of its plotting and does not have any plotting capabilities on its own. pandas simply calls matplotlib's plotting functions internally, supplying them the arguments for you. pandas provides only a small subset of the total available types of plots that matplotlib offers. pandas does not give you full control over the plots it creates. However, it does return the underlying matplotlib axes object, which you can assign to a variable, and then use to customize the plot however you wish.

### Series plots

In this chapter, we cover plotting with the simpler pandas Series. All plotting runs through the `plot` method with the `kind` parameter controlling the type of plot. Set the `kind` parameter equal to one of the following strings:

* `line` - line plot (default)
* `bar` - vertical bar plot
* `barh` - horizontal bar plot
* `box` - box plot
* `hist` - histogram
* `kde` - kernel density estimation plot
* `pie` - pie plot
* `area` - area plot

For all of these plots, the Series **index** is used as the x-values and the Series **values** as the y-values. We begin by reading in the stocks dataset and selecting Amazon's closing price as a Series.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('mdap.mplstyle')
stocks = pd.read_csv('../data/stocks/stocks10.csv', index_col='date', 
                     parse_dates=['date'])
amzn = stocks['AMZN']
amzn.head(3)

## Line plots

We'll now create a line plot of these prices. The `kind` method is defaulted to `'line'`, but since that's not an intuitive default, it is set explicitly below. pandas uses matplotlib to create the figure and axes for you and returns the axes which we assign to a variable.

In [None]:
ax = amzn.plot(kind='line')

Let's verify that we have a matplotlib axes.

In [None]:
type(ax)

All axes have a `figure` attribute that you can access to retrieve the figure.

In [None]:
fig = ax.figure

The figure properties will be equal to those in the run configuration settings. Let's verify the size and DPI.

In [None]:
fig.get_size_inches()

In [None]:
fig.get_dpi()

All axes and figure methods can now be called to update the plot.

In [None]:
ax.set_ylabel('closing price')
ax.set_facecolor('lightgray')
ax.set_yscale('log')
fig.set_facecolor('tan')
fig.set_size_inches(4, 1.5)
fig

### Recreating plot with matplotlib

Let's replicate this plot using matplotlib directly. We must manually set the x-axis label and move the tick labels. pandas uses the index name for the x-axis label.

In [None]:
fig, ax = plt.subplots(figsize=(4, 1.5))
ax.plot(amzn)
ax.set_xlabel('date')
for label in ax.get_xticklabels():
    label.set_rotation(30)
    label.set_ha('right')

### Plotting parameters

There are a substantial number of parameters available to the `plot` method to customize its appearance. Setting these might make it so that you won't have to use matplotlib directly. Below, we set the figure size, use a log scale for the y-axis, add grid lines, a legend, and a title. The rotation and size of the ticks are controlled by `rot` and `fontsize`. Any other parameter not part of the `plot` documentation is passed to the underlying matplotlib plotting method, which is `ax.plot` in this instance. Here, `c`, `ls`, and `lw` change the property of the line itself.

In [None]:
amzn.plot(kind='line', figsize=(5, 2), logy=True, grid=True, legend=True,
          title='Amazon Closing Price', rot=15, fontsize=6, 
          c='crimson', ls='--', lw=1);

## Bar plots

Bar plots are created by setting the `kind` parameter to the string `'bar'` or `'barh'`. Each value in the Series will be plotted as a bar and labeled with its corresponding index value. Let's calculate the number of times Amazon's stock had a greater than 5% positive movement from the previous day's close for each year. We begin by finding the percentage change and test whether it meets our criteria.

In [None]:
amzn_up_down = amzn.pct_change(1) > .05
amzn_up_down.head(3)

We then group by year and sum the `True` values to get the count by year.

In [None]:
num_big_up_days = amzn_up_down.resample('Y', kind='period').sum()
num_big_up_days.head(3)

This Series can now be made into a bar plot. In this case, the extra parameters (`lw`, `ec`, and `hatch`) are forwarded to the matplotlib axes `bar` method.

In [None]:
num_big_up_days.plot(kind='bar', figsize=(5, 2),
                     title='Number of Days with Greater than 5% Increase', 
                     lw=2, ec='tan', hatch='-');

## Distribution plots

Box plots, histograms, and KDEs are the available distribution plots to pandas Series. Let's use the salary column from the City of Houston employee dataset for these examples.

In [None]:
emp = pd.read_csv('../data/employee.csv', parse_dates=['hire_date'])
sal = emp['salary']
sal.head(3)

### Box and whisker plots

A horizontal box plot is created by using the `vert` parameter, which is forwarded to the axes `boxplot` method along with `widths` and `patch_artist`. The ticks are also formatted to be in thousands of dollars.

In [None]:
from matplotlib import ticker
ax = sal.plot(kind='box', figsize=(4, 1), vert=False, widths=.4, patch_artist=True)
conv_dollar = lambda x, pos: f'${x // 1000:.0f}k'
ax.xaxis.set_major_formatter(ticker.FuncFormatter(conv_dollar))

### Histograms

A histogram of salaries is created below, with the tail end eliminated using the `range` parameter. Box plots are much better tools for analyzing extreme values, while histograms are much better at analyzing the "middle" of the data (such as those within the whiskers of the box plot). This is why the range of values is bounded below. In general, with box plots, you would not want to limit the range.

In [None]:
ax = sal.plot(figsize=(4, 1.5), kind='hist', bins=20, ec='black', range=(0, 150_000))
ax.xaxis.set_major_formatter(ticker.FuncFormatter(conv_dollar))

### KDEs

A kernel density plot (KDE) is essentially a histogram with an infinite number of bins. A KDE plot is the only type of plot that does not have an equivalent matplotlib axes method. It estimates the probability density function of a distribution. Multiple calls to the `plot` method in a single cell will place each plot on the same axes. Below, we plot a histogram and KDE of the same salary data. You can see how closely the KDE curve matches the histogram. In order for the KDE and histogram to have the same units, we set `density` to `True`.

In [None]:
ax = sal.plot(kind='hist', bins=20, ec='black', range=(0, 150_000), density=True)
sal.plot(kind='kde', xlim=(0, 150_000), lw=3)
ax.xaxis.set_major_formatter(ticker.FuncFormatter(conv_dollar))

## Pie Charts

Pie charts are circles with wedge areas for each value in the Series corresponding to its proportion of the whole. Let's count the number of employees whose salary fall into a particular range. The `cut` function is used to create the buckets with the `value_counts` method doing the counting.

In [None]:
sals = pd.cut(sal, bins=[0, 30_000, 70_000, 100_000, sal.max() + 1],
              labels=['< $30k', '$30k - $70k', '$70k - $100k', '> $100k'])
sal_ct = sals.value_counts(sort=False)
sal_ct

For pie charts, the index values are used as the labels for each wedge. The `autopct` parameter is forwarded to `ax.pie` and can be set to a function that is passed the percentage of each wedge. It returns a formatted percentage and the raw count. By default, pandas use the name of the Series as the y-axis label. A title looks more appropriate so it is removed by setting it to an empty string.

In [None]:
ax = sal_ct.plot(kind='pie', figsize=(3, 3), title='COH Salary Distribution',
                 autopct=lambda x: f'{x:.1f}% - {x / 100 * sal_ct.sum():,.0f}')
ax.set_ylabel('');

## Area Plots

Area plots are like line plots, but fill the area between the x-axis and the line with a color. It's equivalent in matplotlib is `stackplot`. Area plots are much more useful when using DataFrames, as you'll see in the next chapter. Below, the first 20 trading days of Amazon are plotted as an area plot.

In [None]:
amzn.head(20).plot(kind='area', alpha=.3);

## Adding a plot to a previously made axes

For all of the above plots, we let pandas create the figure and single axes. It's possible for us to create the figure and axes (possibly more than one) first and then tell pandas to use that particular axes with the `ax` parameter. Below, we create our figure and axes first and then place a histogram of the salaries on that axes.

In [None]:
fig, ax = plt.subplots(figsize=(5, 1.5), facecolor='.8')
sal.plot(kind='hist', bins=20, ec='black', range=(0, 150_000), ax=ax);

This becomes especially useful when placing plots into a figure with multiple axes. Here, four axes are created and unpacked into separate variables with the help of the numpy array `flatten` method. pandas is used to place each plot on each axes. The ticks are formatted and located appropriately.

In [None]:
fig, ax_array = plt.subplots(nrows=2, ncols=2, figsize=(5, 3), tight_layout=True)
ax1, ax2, ax3, ax4 = ax_array.flatten()
sal.plot(kind='hist', bins=20, ec='black', range=(0, 150_000), ax=ax1)
sal.plot(kind='box', vert=False, widths=.4, ax=ax2, yticks=[])
sal.plot(kind='kde', xlim=(0, 150_000), ax=ax3)
sal_ct.plot(kind='pie', ax=ax4, radius=1.4)
ax1.xaxis.set_major_formatter(ticker.FuncFormatter(conv_dollar))
ax2.xaxis.set_major_formatter(ticker.FuncFormatter(conv_dollar))
ax3.xaxis.set_major_formatter(ticker.FuncFormatter(conv_dollar))
ax3.xaxis.set_major_locator(ticker.MultipleLocator(50_000))
ax4.set_ylabel('')
fig.suptitle('COH Salary Distribution Summary', y=1.04);