# Plotting

Plots and visualizations are an important part of most scientific publications, and Python has several libraries for producing high-quality plots. The most common is `matplotlib`, which we will demonstrate here. We will again be using the San Francisco parking data. In this exercise, we'll cover making line, bar, and scatter plots; changing labels, colors, and legends; and exporting plots to image or vector files for inclusion in publications.

## Setup and loading libraries

We will once again load `pandas`, and will also load `matplotlib`. The most common interface to `matplotlib` is in the `matplotlib.pyplot` package; this provides an interface where you can build up a plot sequentially using a series of commands. By convention and to save typing, we import `matplotlib.pyplot` as `plt`.

The first line is not necessary, but will make plots appear sharper if you have a retina or 4K display.

In [None]:
%config InlineBackend.figure_format = "retina"

import pandas as pd
import matplotlib.pyplot as plt

## Load data

We are once again going to load the parking data.

In [None]:
data = pd.read_csv("../data/sfpark.csv")

## A first plot

Let's just make a scatter plot of entries vs. exits. Since most people don't leave their cars overnight, these should be relatively closely correlated, and the scatter plot should be close to a straight 45 degree line.

We will use the `plt.scatter` function to create a scatterplot. This accepts two arrays as arguments, the x axis and the y axis, in that order. We're using the entries and exits column from the data frame.

In [None]:
plt.scatter(data.entries, data.exits)

## Adding axis labels and making the points smaller

The points are too big to really see the trend, and the axes are currently unlabeled.

In [None]:
plt.scatter(data.entries, data.exits, s=0.1)
plt.xlabel("Entries")
plt.ylabel("Exits")

## Other types of plots

Let's make a line plot of total entries by day. To do this, you will first need to create a dataset with a single total number of entries for each day. You should also convert the dates to the datetime data type from objects, because otherwise the x axis will be out of order (it will be ordered alphabetically, and 10/1/2012 comes before 2/1/2012 alphabetically). Create a dataset with a date column and an entries column that is the sum of all entries on that date.

In [None]:
data["date"] = pd.to_datetime(data.date, format="%m/%d/%Y")
by_date = data.groupby("date").entries.sum().reset_index()

## Now, we can create the line plot

In [None]:
plt.plot(by_date.date, by_date.entries)

## Grouping by month

That's pretty chaotic. Instead of grouping by date, we can resample this data to the monthly level. To do this, we first need to index the by_date data frame by month, then use the resample function. The resample function is basically a time-based group by. We are resampling to the first of each month, and summing monthly entries.

In [None]:
by_date = by_date.set_index("date")

In [None]:
by_month = by_date.entries.resample("MS").sum()
by_month

## Using series directly

In the past, after a groupby, we've used `.reset_index()` to convert the series back to a data frame with columns for the grouping variables. However, we don't have to do this, we can use the series directly. To plot it, we can use the index as the x axis and the raw values as the y axis. We do this by plotting `<series>.index` on the x axis, and just the series by itself on the y axis.

In [None]:
plt.plot(by_month.index, by_month)

### Exercise

Add axis labels.

In [None]:
plt.plot(by_month.index, by_month)
plt.xlabel("Month")
plt.ylabel("Total entries")

### Resizing the plot

The month labels overlap each other, let's make the plot larger. We do this with the `plt.figsize` function. This function must be called before any other plotting functions, as it initializes a new plot. Does this plot give any indication of which months are outliers, and why?

In [None]:
plt.figure(figsize=(12,8))
plt.plot(by_month.index, by_month)
plt.xlabel("Month")
plt.ylabel("Total entries")

## Bar plots

Another common plot type is the bar plot, used for unordered data. Let's make a bar plot of the entries by garage, to show how popular the different garages are.

### Exercise

First, calculate the total number of entries by garage.

In [None]:
by_garage = data.groupby("facility").entries.sum()

### Make the bar plot

I am using the `plt.barh` function here for a <u>h</u>orizontal bar plot, so the labels are legible. `plt.bar` would produce a vertical bar plot.

In [None]:
plt.barh(by_garage.index, by_garage)
plt.xlabel("Total entries")

### Changing colors

Most `plt.` functions take an argument `color=` to control the color. There are a number of different ways to specify colors, including color names and RGB or hex specifications; [details can be found here](https://matplotlib.org/stable/tutorials/colors/colors.html).

In [None]:
plt.barh(by_garage.index, by_garage, color="green")
plt.xlabel("Total entries")

## Multiple plots on the same axes

It might be that not all the garages have the same seasonal pattern shown above. Downtown shopping is probably driving the spikes in demand during December. Let's try plotting downtown and the rest of the city separately.

In [None]:
downtown_data = data[data.district == "Downtown"]
downtown_by_date = downtown_data.groupby("date").entries.sum()
downtown_by_month = downtown_by_date.resample("MS").sum()

outside_downtown_data = data[data.district != "Downtown"]
outside_downtown_by_date = outside_downtown_data.groupby("date").entries.sum()
outside_downtown_by_month = outside_downtown_by_date.resample("MS").sum()

In [None]:
plt.plot(downtown_by_month.index, downtown_by_month)
plt.plot(outside_downtown_by_month.index, outside_downtown_by_month)

### Legends

This plot would be more useful if we had a legend so we knew which line was which. We can include a legend using `plt.legend`, and by specifying labels in the `plt.plot` lines. `plt.legend` will do its best to place the legend to avoid your data, but you can also specify a location with the `loc` argument.

In [None]:
plt.plot(downtown_by_month.index, downtown_by_month, label="Downtown")
plt.plot(outside_downtown_by_month.index, outside_downtown_by_month, label="Elsewhere")
plt.legend()

## Subplots

Frequently, it is useful to have multiple plots together. `plt.subplot` can be used to achieve this. `plt.subplot` takes three arguments: the number of rows and columns of plots, and which position you want to plot into next. The positions are numbered left to right and top to bottom - in a 2x2 grid, for example, the positions are:

```
1 2
3 4
```

It is also possible to not have all the plots be the same size, by plotting a single plot into multiple locations. This is done by specifying a range of positions to plot into (see the third plot below).

Let's put three of our plots together: the scatter chart in the top left, the line plot of all garages in the top right, and the bar plot across the bottom.

The last line calls the `plt.tight_layout` function, which adjust plot spacing to account for the size of the labels.

In [None]:
# start by making the figure larger, to fit all of the subplots
plt.figure(figsize=(12, 8))

# First, top left
plt.subplot(2, 2, 1)
plt.scatter(data.entries, data.exits, s=0.1)
plt.xlabel("Entries")
plt.ylabel("Exits")
plt.title("Entries vs. exits")

# top right
plt.subplot(2, 2, 2)
plt.plot(by_month.index, by_month)
plt.ylabel("Entries")
plt.xlabel("Month")

# across the bottom, fill in spaces 3-4
plt.subplot(2, 2, (3, 4))
plt.barh(by_garage.index, by_garage)
plt.xlabel("Total entries")

plt.tight_layout()

## Saving figures

The `plt.savefig` function can be used to save figures to files. It will determine the file type based on the file extension. I recommend saving `.png` files for raster (pixel-based) graphics, and `.svg` files for vector graphics.

For raster graphics, you can also specify a dpi= argument to savefig, which controls the resolution of the output in dots per inch. For publication, values of at least 300 are recommended to avoid graininess.

In [None]:
# start by making the figure larger, to fit all of the subplots
plt.figure(figsize=(12, 8))

# First, top left
plt.subplot(2, 2, 1)
plt.scatter(data.entries, data.exits, s=0.1)
plt.xlabel("Entries")
plt.ylabel("Exits")
plt.title("Entries vs. exits")

# top right
plt.subplot(2, 2, 2)
plt.plot(by_month.index, by_month)
plt.ylabel("Entries")
plt.xlabel("Month")

# across the bottom, fill in spaces 3-4
plt.subplot(2, 2, (3, 4))
plt.barh(by_garage.index, by_garage)
plt.xlabel("Total entries")

plt.tight_layout()

plt.savefig("all_plots.png", dpi=300)
plt.savefig("all_plots.svg")