# More advanced plotting with pandas/Matplotlib

At this point you should know the basics of making plots with pandas and the Matplotlib module. Now we will expand on our basic plotting skills to learn how to create more advanced plots. In this part, we will show how to visualize data using pandas/Matplotlib and create plots such as the one below.

---

##General information
**Sources**

This lesson is inspired by the Geo-python module at the University of Helsinki which in turn acknowledges the Programming in Python lessons from the Software Carpentry organization. This version was adapted for Colab and a UK context by Ruth Hamilton.

**About this document**

This is a Google Colab Notebook. This particular notebook is designed to introduce you to a few of the basic concepts of programming in Python. Like other common notebook formats (e.g. Jupyter), the contents of this document are divided into cells, which can contain:

Markdown-formatted text,
Python code, or
raw text
You can execute a snippet of code in a cell by pressing Shift-Enter or by pressing the Run Cell button that appears when your cursor is on the cell .

---




![Subplot example in Matplotlib](https://geo-python-site.readthedocs.io/en/latest/_images/subplots.png)

## The data

In this part of the lesson we'll continue working with our weather observation data for Sheffield from the CEDA archives.

## Getting started

Let's start again by importing the libraries we'll need.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Loading the data

Now we'll load the data just as we did in the first part of the lesson: 


In [None]:
# Define absolute path to the file
fp = r'/content/drive/Shareddrives/TRP479_Spatial_Data_Science_Data/L7/sy_summary_data.csv'

data = pd.read_csv(fp, usecols=['MAX','MIN','MID','YEAR_MONTH'])



#creates a new column, DATE, and converts the YEAR_MONTH value to a datetime obejct
data['DATE'] = pd.to_datetime(data['YEAR_MONTH'], format='%Y_%m')

#set the index of our data to use the new DATE column
data.set_index(data['DATE'],inplace=True)

In [None]:
print(f"Number of rows: {len(data)}")

As you can see, we are dealing with a relatively large data set.

Let's have a closer look at the time first rows of data: 

In [None]:
data.head()

## Preparing the data

First, we have to deal with no data values. Let's check how many no data values we have:

In [None]:
print(f"Number of no data values per column:\n{data.isna().sum()}")

So, we have 2 missing values in the MIN and MID columns. Let's get rid of those.  

We can remove rows from our DataFrame where `'MIN'` is missing values using the `dropna()` method, as shown below:

In [None]:
data.dropna(subset=["MIN"], inplace=True)

In [None]:
#let's check what has happenend
print(f"Number of rows after removing no data values: {len(data)}")

That's better.

### Check your understanding

What would happen if we removed all rows with any no data values from our data (also considering no data values in the `MAX` and `MIN` columns)?

In [None]:
# Calculate the number of rows after removing all no data values
# Note: Do not apply .dropna() with the "inplace" parameter!


In [None]:
# Calculate the number of rows without removing all rows containing NA values


## Using subplots

Let's continue working with the weather data and learn how to use *subplots*. Subplots are figures where you have multiple plots in different panels of the same figure, as was shown at the start of the lesson.

### Extracting seasonal temperatures

Let's now select data for different seasons:

- Winter (December - February)
- Spring (March - May)
- Summer (June - August)
- Autumn (Septempber - November)

It's going to be easiest to to this bassed on the *month* value. We can extract this from the `'YEAR_MONTH'` column using the `.str.slice()` methods we learned in [last weeks' workshop](https://drive.google.com/file/d/1r0LO9ihsSEhB8s1YsvebYV0aby-SpV68/view?usp=sharing).   

In [None]:
#use the .str.splice() methods to create a new column with just the *month* information



In [None]:
#@title Click here to show code
data["MONTH"] = data["YEAR_MONTH"].str.slice(start=5)

In [None]:

#check it has worked; and check the data types
print(data.head())
print(data.dtypes)

You should see that our new `'MONTH'` column has an 'object' data type; this means it is stored as a *string* (you can also tell this by the fact that there are leading 'zeros' - i.e. `'01'` rather than `'1'`).

It can be easier to work with integers rather than 'strings' so let's convert the `'MONTH'` column to an integer type. We can do that with the `.astype(int)` method.

In [None]:
data["MONTH"]=data["MONTH"].astype(int)
print(data.dtypes)

In [None]:
# Type in the example for winter

spring = data.loc[(data.MONTH >= 3) & (data.MONTH < 6)]
spring_temps = spring['MID']

summer = data.loc[(data.MONTH >= 6) & (data.MONTH < 9)]
summer_temps = summer['MID']

autumn = data.loc[(data.MONTH >= 9) & (data.MONTH < 12)]
autumn_temps = autumn['MID']

winter = data.loc[(data.MONTH >= 12) | (data.MONTH < 3)]
winter_temps = winter['MID']

Now we can plot our data to see how the different seasons look separately.

In [None]:
# Example plot for winter
ax1 = winter_temps.plot()
winter_temps.head()

In [None]:
ax2 = spring_temps.plot()

In [None]:
ax3 = summer_temps.plot()

In [None]:
ax4 = autumn_temps.plot()

OK, so from these plots we can already see that the temperatures the four seasons are quite different, which is rather obvious of course. It is important to also notice that the scale of the *y*-axis changes in these four plots. If we would like to compare different seasons to each other we need to make sure that the temperature scale is similar in the plots for each season.

### Finding data bounds

Let's set our *y*-axis limits so that the upper limit is the maximum temperature + 5 degrees in our data (full year), and the lowest is the minimum temperature - 5 degrees.

In [None]:
# Find lower limit for y-axis
min_temp = min(
    winter_temps.min(), spring_temps.min(), summer_temps.min(), autumn_temps.min()
)
min_temp = min_temp - 5.0

# Find upper limit for y-axis
max_temp = max(
    winter_temps.max(), spring_temps.max(), summer_temps.max(), autumn_temps.max()
)
max_temp = max_temp + 5.0

# Print y-axis min, max
print(f"Min: {min_temp}, Max: {max_temp}")

We can now use this temperature range to standardize the y-axis range on our plot.

### Creating our first set of subplots

Let's now continue and see how we can plot all these different plots into the same figure. We can create a 2x2 panel for our visualization using Matplotlib’s `subplots()` function where we specify how many rows and columns we want to have in our figure. We can also specify the size of our figure with `figsize` parameter as we have seen earlier with pandas. As a reminder, `figsize` takes the `width` and `height` values (in inches) as inputs.

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))
axes

We can see that as a result we have now a list containing two nested lists where the first one contains the axis for column 1 and 2 on **row 1** and the second list contains the axis for columns 1 and 2 for **row 2**.

We can parse these axes into their own variables so it is easier to work with them.

In [None]:
ax11 = axes[0][0]
ax12 = axes[0][1]
ax21 = axes[1][0]
ax22 = axes[1][1]

Now we have four axis variables for the different panels in our figure. Next we can use them to plot the seasonal data. Let's begin by plotting the seasons, and give different colors for the lines and specify the *y*-axis range to be the same for all subplots. We can do this using what we know and some parameters below:

- The `c` parameter changes the color of the line. Matplotlib has a [large list of named colors](https://matplotlib.org/stable/gallery/color/named_colors.html) you can consult to customize your color scheme.
- The `lw` parameter controls the width of the lines
- The `ylim` parameter controls the y-axis range

In [None]:
# Set the plot line width
line_width = 1.5

# Plot data
winter_temps.plot(ax=ax11, c="blue", lw=line_width, ylim=[min_temp, max_temp])
spring_temps.plot(ax=ax12, c="orange", lw=line_width, ylim=[min_temp, max_temp])
summer_temps.plot(ax=ax21, c="green", lw=line_width, ylim=[min_temp, max_temp])
autumn_temps.plot(ax=ax22, c="brown", lw=line_width, ylim=[min_temp, max_temp])

# Display the figure
fig

Great, now we have all the plots in same figure! However, we can see that there are some problems with our *x*-axis labels and a few missing items we can add. Let's do that below.

In [None]:
# Create the new figure and subplots
fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(12,8))

# Rename the axes for ease of use
ax11 = axs[0][0]
ax12 = axs[0][1]
ax21 = axs[1][0]
ax22 = axs[1][1]

Now, we'll add our seasonal temperatures to the plot commands for each time period.

In [None]:
# Set plot line width
line_width = 1.5

# Plot data
winter_temps.plot(ax=ax11, c='blue', lw=line_width, 
                  ylim=[min_temp, max_temp], grid=True)
spring_temps.plot(ax=ax12, c='orange', lw=line_width,
                  ylim=[min_temp, max_temp], grid=True)
summer_temps.plot(ax=ax21, c='green', lw=line_width,
                  ylim=[min_temp, max_temp], grid=True)
autumn_temps.plot(ax=ax22, c='brown', lw=line_width,
                  ylim=[min_temp, max_temp], grid=True)

# Set figure title
fig.suptitle('Historical seasonal average monthly temperature observations - Sheffield')

# Rotate the x-axis labels so they don't overlap
plt.setp(ax11.xaxis.get_majorticklabels(), rotation=20)
plt.setp(ax12.xaxis.get_majorticklabels(), rotation=20)
plt.setp(ax21.xaxis.get_majorticklabels(), rotation=20)
plt.setp(ax22.xaxis.get_majorticklabels(), rotation=20)

# Axis labels
ax11.set_xlabel('')
ax12.set_xlabel('')
ax21.set_xlabel('Date')
ax22.set_xlabel('Date')
ax11.set_ylabel('Temperature [°C]')
ax21.set_ylabel('Temperature [°C]')
#x_axis = ax11.axes.get_xaxis()
#x_label = x_axis.get_label()
##print isinstance(x_label, matplotlib.artist.Artist)
#x_label.set_visible(False)



# Season label text
ax11.text(pd.to_datetime('18800215'), 22.5, 'Winter')
ax12.text(pd.to_datetime('18800515'), 22.5, 'Spring')
ax21.text(pd.to_datetime('18800815'), 22.5, 'Summer')
ax22.text(pd.to_datetime('18801115'), 22.5, 'Autumn')

# Display plot
fig

Not bad.

### Check your understading

Visualize winter and summer temperatures in a 1x2 panel figure.
Save the figure as a .png file.

In [None]:
# Two subplots side-by-side

# Set plot line width

# Plot data


# Set figure title


# Rotate the x-axis labels so they don't overlap


# Axis labels


# Season label text


In [None]:
#@title Click to show code
# Two subplots side-by-side
fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(12, 4))

# Set plot line width
line_width = 1.5

# Plot data
winter_temps.plot(
    ax=axs[0], c="blue", lw=line_width, ylim=[min_temp, max_temp], grid=True
)
summer_temps.plot(
    ax=axs[1], c="green", lw=line_width, ylim=[min_temp, max_temp], grid=True
)

# Set figure title
fig.suptitle(
    "2012-2013 Winter and summer temperature observations - Sheffield"
)

# Rotate the x-axis labels so they don't overlap
plt.setp(axs[0].xaxis.get_majorticklabels(), rotation=20)
plt.setp(axs[1].xaxis.get_majorticklabels(), rotation=20)

# Axis labels
axs[0].set_xlabel("Date")
axs[1].set_xlabel("Date")
axs[0].set_ylabel("Temperature [°C]")
axs[1].set_ylabel("Temperature [°C]")

# Season label text
axs[0].text(pd.to_datetime("18800215"), 22.55, "Winter")
axs[1].text(pd.to_datetime("18800815"), 22.5, "Summer")

plt.savefig("Sheffield_WinterSummer_1882_2020v2.png")

## Extra: pandas/Matplotlib plot style sheets

One cool thing about plotting using pandas/Matplotlib is that is it possible to change the overall style of your plot to one of several available plot style options very easily. Let's consider an example below using the four-panel plot we produced in this lesson.

We will start by defining the plot style, using the `'dark_background'` plot style here.

In [None]:
plt.style.use('dark_background')

There is no output from this command, but now when we create a plot it will use the `dark_background` styling. Let's see what that looks like.

In [None]:



# Set plot line width
line_width =0.5

# Plot data
winter_temps.plot(ax=ax11, c='blue', lw=line_width,
                  ylim=[min_temp, max_temp], grid=True)
spring_temps.plot(ax=ax12, c='orange', lw=line_width,
                  ylim=[min_temp, max_temp], grid=True)
summer_temps.plot(ax=ax21, c='green', lw=line_width,
                  ylim=[min_temp, max_temp], grid=True)
autumn_temps.plot(ax=ax22, c='brown', lw=line_width,
                  ylim=[min_temp, max_temp], grid=True)

# Set figure title
fig.suptitle('Historical seasonal average monthly temperature observations - Sheffield')

# Rotate the x-axis labels so they don't overlap
plt.setp(ax11.xaxis.get_majorticklabels(), rotation=20)
plt.setp(ax12.xaxis.get_majorticklabels(), rotation=20)
plt.setp(ax21.xaxis.get_majorticklabels(), rotation=20)
plt.setp(ax22.xaxis.get_majorticklabels(), rotation=20)

# Axis labels
ax11.set_xlabel('')
ax12.set_xlabel('')
ax21.set_xlabel('Date')
ax22.set_xlabel('Date')
ax11.set_ylabel('Temperature [°C]')
ax21.set_ylabel('Temperature [°C]')
#x_axis = ax11.axes.get_xaxis()
#x_label = x_axis.get_label()
##print isinstance(x_label, matplotlib.artist.Artist)
#x_label.set_visible(False)



# Season label text
ax11.text(pd.to_datetime('18800215'), 22.5, 'Winter')
ax12.text(pd.to_datetime('18800515'), 22.5, 'Spring')
ax21.text(pd.to_datetime('18800815'), 22.5, 'Summer')
ax22.text(pd.to_datetime('18801115'), 22.5, 'Autumn')

# Display plot
fig

As you can see, the plot format has now changed to use the `dark_background` style. You can find other plot style options in the [complete list of available Matplotlib style sheets](https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html). 

Finally, let's have a look at a section of our data again. Here we are going to look at the `winter_temps` data but only for the period between 1980 and 1982. We are going to add markers for the points, as well.

In [None]:
#switch back to 'default' Matplotlib style
plt.style.use('default')

In [None]:
# look at winter_temps bewteen 1980 and 1982 - remember, this should only be winter months ie December, January and February
ax1 = winter_temps.plot(c="blue", marker='o',lw=line_width, xlim=[pd.to_datetime('19800101'),pd.to_datetime('19820101')],ylim=[min_temp, max_temp], grid=True)


Becasue we are only plotting 'winter' month temperatures, drawing a line between them might be misleading. To omit the 'line', we can use the `linestyle='none'` parameter. 

In [None]:
ax1 = winter_temps.plot(c="blue", marker='o',linestyle='none', xlim=[pd.to_datetime('19800101'),pd.to_datetime('19820101')],ylim=[min_temp, max_temp], grid=True)

Let's apply this style to the whole dataset. First we need to 'clear' the plots. hen we will re-run the code, but with the `linestyle` parameter included.

In [None]:
#clear the previous plots in ax11,ax12,ax21 and ax22
ax11.clear()
ax12.clear()
ax21.clear()
ax22.clear()

In [None]:
# Set plot line width
line_width =1.5

#ax11.clear()

# Plot data
winter_temps.plot(ax=ax11, c='blue',marker ='.',linestyle='none',
                  ylim=[min_temp, max_temp], grid=True)
spring_temps.plot(ax=ax12, c='orange',marker ='.',linestyle='none',
                  ylim=[min_temp, max_temp], grid=True)
summer_temps.plot(ax=ax21, c='green', marker ='.',linestyle='none',
                  ylim=[min_temp, max_temp], grid=True)
autumn_temps.plot(ax=ax22, c='brown', marker ='.',linestyle='none',
                  ylim=[min_temp, max_temp], grid=True)

# Set figure title
fig.suptitle('Historical seasonal average monthly temperature observations - Sheffield')

# Rotate the x-axis labels so they don't overlap
plt.setp(ax11.xaxis.get_majorticklabels(), rotation=20)
plt.setp(ax12.xaxis.get_majorticklabels(), rotation=20)
plt.setp(ax21.xaxis.get_majorticklabels(), rotation=20)
plt.setp(ax22.xaxis.get_majorticklabels(), rotation=20)

# Axis labels
ax11.set_xlabel('')
ax12.set_xlabel('')
ax21.set_xlabel('Date')
ax22.set_xlabel('Date')
ax11.set_ylabel('Temperature [°C]')
ax21.set_ylabel('Temperature [°C]')
#x_axis = ax11.axes.get_xaxis()
#x_label = x_axis.get_label()
##print isinstance(x_label, matplotlib.artist.Artist)
#x_label.set_visible(False)



# Season label text
ax11.text(pd.to_datetime('18800215'), 22.5, 'Winter')
ax12.text(pd.to_datetime('18800515'), 22.5, 'Spring')
ax21.text(pd.to_datetime('18800815'), 22.5, 'Summer')
ax22.text(pd.to_datetime('18801115'), 22.5, 'Autumn')

# Display plot
fig

### Histograms
We have seen how we can plot time seris data using the pandas `plot()` method but we can also use pandas to create other types of plot. For example, the `hist()` function will create histograms.

>**Remember** A histogram is a graph showing frequency distributions. It is a graph showing the number of observations within each given interval.

In [None]:
#create a histogram of average temperature for winter months
winter_temps.hist()


We can use the parameters of the `hist()` method to change the look of our plot, for example if we want to change the number of *bins*, we can set the `bins` paramter. For more information about the options available, see the [documentation](https:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.hist.html//).

In [None]:
#create a histogram of average temperature for winter months
winter_temps.hist(bins=20,)

We can set up a panel plot of histograms of the seasonal data in the same way that we did for the time series plots. Note, however, that some of the parameter options are different. Here, if we want to set the limits of x-axis (to make comparisons easier), we use the `range` parameter (rather than `xlim`).

In [None]:
# Create the new figure and subplots
figh, axsh = plt.subplots(nrows=2, ncols=2, figsize=(12,8))

# Rename the axes for ease of use
axh11 = axsh[0][0]
axh12 = axsh[0][1]
axh21 = axsh[1][0]
axh22 = axsh[1][1]

In [None]:
# Set plot line width
line_width =1.5

#ax11.clear()

# Plot data
pd.DataFrame(winter_temps).hist(ax=axh11 , range=[min_temp, max_temp])
pd.DataFrame(spring_temps).hist(ax=axh12,range=[min_temp, max_temp])
pd.DataFrame(summer_temps).hist(ax=axh21,range=[min_temp, max_temp])
pd.DataFrame(autumn_temps).hist(ax=axh22,range=[min_temp, max_temp])

# Set figure title
figh.suptitle('Historical seasonal average monthly temperature observations - Sheffield')


# Axis labels
axh11.set_xlabel('Average monthly temperature')
axh12.set_xlabel('')
axh21.set_xlabel('Average monthly temperature')
axh22.set_xlabel('Average monthly temperature')
axh11.set_ylabel('Frequency')
axh21.set_ylabel('Frequency')



# Season label text
axh11.set_title('Winter')
axh12.set_title('Spring')
axh21.set_title('Summer')
axh22.set_title('Autumn')

# Display plot
figh

The example we've just seen creates plots for 4 different series. We can also create subplots using categories *within* or data. We can convert our `winter_temps`, `spring_temps`, `summer_temps` and `autumn_temps` *series* into data frames, adding a `'season'` column to record the season each temperature falls in. 

In [None]:

df_winter=pd.DataFrame({'MID':winter_temps, 'season':"winter"})
df_spring=pd.DataFrame({'MID':spring_temps, 'season':"spring"})
df_summer=pd.DataFrame({'MID':summer_temps, 'season':"summer"})
df_autumn=pd.DataFrame({'MID':autumn_temps, 'season':"autumn"})

We then use the `pd.concat()` methods to combine the dataframes into a single one (by *rows* i.e. consecutively) with two columns, `'MID'` and `'season'`.

In [None]:
df_all=pd.concat ([df_winter,df_summer,df_spring,df_autumn], axis=0)


In [None]:
#check the dataframe using .head()
df_all.head

Finally, we can create a histogram of the temperatures in the `df_all` dataset grouped by `'season'`. We do this by setting the `by` parameter.

>**Note**, however, that it orders the groups alphabetically rather than in order of appearance.

In [None]:
#plot a histogram based on the 'season' column
df_all.hist(by=df_all['season'],range=[min_temp, max_temp])