# Understanding Matplotlib's Explicit Interface
__________________________________________________

## About:

This tutorial provides an overview of using Matplotlib's Explicit plotting interface. It is not intended as a data visualization tutorial. 

This tutorial was developed by Margaret Gratian and is adapted from Matplotlib's official documentation: https://matplotlib.org/stable/tutorials/introductory/lifecycle.html#sphx-glr-tutorials-introductory-lifecycle-py. It illustrates how Matplotlib's explicit plotting interface can be used with public award data from NIH RePORTER. The goal will be to plot unique counts of NCI application IDs and base projects each fiscal year.


## Inputs:
- Input Filepath 1: "../data/public_nih_reporter_data.csv"
    - Public NCI R01 awards in FY 2022 - 2024 from NIH RePORTER. Data is as of 3/13/2025. 

## Import Packages

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

## Read in Data

In [None]:
raw_reporter_df = pd.read_csv("../data/public_nih_reporter_data.csv", skiprows=1, index_col=0)

# See the shape
print(raw_reporter_df.shape)

# Preview the data
raw_reporter_df.head()

## Dataset Development

### Prep Data for Analysis

In [None]:
# Check for duplicates and drop any
# Note we make a copy and do not modify our original input data
reporter_df = raw_reporter_df.copy().drop_duplicates()

reporter_df.shape

In [None]:
# Check if the data is unique for Appl Id, a unique identifier in the NIH data
# This should match the shape
reporter_df["appl_id"].nunique()

### Group Data for Plotting

In [None]:
# Group by fiscal year and count unique appl ids and unique project serial numbers (also known as base project numbers in the NIH data)
grouped_df = reporter_df.groupby(["fiscal_year"], as_index=False).agg({"appl_id":  "nunique", "project_serial_num":  "nunique"})

# Reset column names
grouped_df.columns = ["fiscal_year", "appl_id_count", "project_serial_num_count"]

In [None]:
# Preview the data
grouped_df.head()

## Analyze and Extract Insights from Data

The following steps will walk through elements of a plot and how they are created with the Explicit interface. The end goal is to plot the unique counts of application IDs (appl_id_count) and base project numbers (project_serial_num_count) over the three years of the data.

 ## Explicit Plotting

In [None]:
# Create the figure and axes 
# Figure == your canvas
# Axes == part of the canvas where we'll place a visualization

# Create the figure and axes
# The subplot function returns two objects - a figure and an axes, which we're assigning to variable names fig, ax
# Note also that subplot creates a grid. By not passing in any additional parameters, 
# We're creating a grid of one row and one column 
# See more: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html
fig, ax = plt.subplots()

In [None]:
# Now, let's create an axes and figure but actually put something on it 
# Note the default x-axis is not ideal - we'll fix this in the next step

# Create the figure and axes
fig, ax = plt.subplots()

# Add a line plot on the axes
ax.plot("fiscal_year", "appl_id_count", data = grouped_df)

In [None]:
# Add customization to the plot via the axes object

# Create the figure and axes
fig, ax = plt.subplots()

# Add a line plot on the axes
ax.plot("fiscal_year", "appl_id_count", data = grouped_df)

# Set properties of the axes 
ax.set(ylim=[0, 5100], xlabel='Fiscal Year', ylabel='Application Count',
       title='NCI R01 Application Counts Per Fiscal Year, FY 2022-2024', xticks=range(2022, 2025))

In [None]:
# Adjust the figure and fontsizes

# Create the figure and axes
fig, ax = plt.subplots(figsize=(15, 5))

# Add a line plot on the axes
ax.plot("fiscal_year", "appl_id_count", data = grouped_df)

# Set properties of the axes 
# Note we add a semicolon to supress the output we saw above before our plot
ax.set(ylim=[0, 5100], xlabel='Fiscal Year', ylabel='Application Count',
       title='NCI R01 Application Counts Per Fiscal Year, FY 2022-2024', xticks=range(2022, 2025));

In [None]:
# What if we want to rotate our tick labels? 

# Create the figure and axes
fig, ax = plt.subplots(figsize=(15, 5))

# Add a line plot on the axes
ax.plot("fiscal_year", "appl_id_count", data = grouped_df)

# Set properties of the axes 
# Note we add a semicolon to supress the output we saw above before our plot
ax.set(ylim=[0, 5100], xlabel='Fiscal Year', ylabel='Application Count',
       title='NCI R01 Application Counts Per Fiscal Year, FY 2022-2024', xticks=range(2022, 2025));

## ADJUSTING TICK LABELS ##
# We need to get the tick labels that exist on the axes and then adjust them
labels = ax.get_xticklabels()
plt.setp(labels, rotation=45);

In [None]:
# Earlier, we used ax.set() to add a lot of things at once to the plot
# We could also individually add them to the axes and gain even more control over adjusting them

# Create the figure and axes
fig, ax = plt.subplots(figsize=(15, 5))

# Add a line plot on the axes
ax.plot("fiscal_year", "appl_id_count", data = grouped_df)

# Set the title
# Move it a little higher with y = 1.05 and set the fontsize with the size parameter
ax.set_title('NCI R01 Application Counts Per Fiscal Year, FY 2022-2024', y=1.05, size=20)

# Set the ylim (min and max on y axis)
ax.set_ylim([0, 5100])

# Set the x and y tick fontsize. We could also set these separately by specifying either 'x' or 'y' for axis
ax.tick_params(axis='both', labelsize=12)

# Adjust the x ticks to label each fiscal year 
ax.set(xticks=range(2022, 2025))

# Adjust the x tick labels rotation
labels = ax.get_xticklabels()
plt.setp(labels, rotation=45);

# Set the x and y labels
ax.set_xlabel("Fiscal Year", size=15)
ax.set_ylabel("Application Count", size=15);

## Adding a Second Line and Legend

In [None]:
# Create the figure and axes
fig, ax = plt.subplots(figsize=(15, 5))

# Add a line plot on the axes
ax.plot("fiscal_year", "appl_id_count", data = grouped_df, color = 'red')

# Add another line plot on the axes
ax.plot("fiscal_year", "project_serial_num_count", data = grouped_df, color = 'blue')

# Set the title
# Move it a little higher with y = 1.05 and set the fontsize with the size parameter
ax.set_title('NCI R01 Application Counts Per Fiscal Year, FY 2022-2024', y=1.05, size=20)

# Set the ylim (min and max on y axis)
ax.set_ylim([0, 5100])

# Set the x and y tick fontsize. We could also set these separately by specifying either 'x' or 'y' for axis
ax.tick_params(axis='both', labelsize=12)

# Adjust the x ticks to label each fiscal year 
ax.set(xticks=range(2022, 2025))

# Adjust the x tick labels rotation
labels = ax.get_xticklabels()
plt.setp(labels, rotation=45);

# Set the x and y labels
ax.set_xlabel("Fiscal Year", size=15)
ax.set_ylabel("Application Count", size=15);

# Add a legend
ax.legend()

## Save the figure

In [None]:
# Create the figure and axes
fig, ax = plt.subplots(figsize=(15, 5))

# Add a line plot on the axes
ax.plot("fiscal_year", "appl_id_count", data = grouped_df, color = 'red')

# Add another line plot on the axes
ax.plot("fiscal_year", "project_serial_num_count", data = grouped_df, color = 'blue')

# Set the title
# Move it a little higher with y = 1.05 and set the fontsize with the size parameter
ax.set_title('NCI R01 Application Counts Per Fiscal Year, FY 2022-2024', y=1.05, size=20)

# Set the ylim (min and max on y axis)
ax.set_ylim([0, 5100])

# Set the x and y tick fontsize. We could also set these separately by specifying either 'x' or 'y' for axis
ax.tick_params(axis='both', labelsize=12)

# Adjust the x ticks to label each fiscal year 
ax.set(xticks=range(2022, 2025))

# Adjust the x tick labels rotation
labels = ax.get_xticklabels()
plt.setp(labels, rotation=45);

# Set the x and y labels
ax.set_xlabel("Fiscal Year", size=15)
ax.set_ylabel("Application Count", size=15);

# Add a legend
ax.legend()

## SAVE THE FIGURE ##
plt.savefig("applications_and projects_per_fy.png")