# Using the RePORTER API

*This notebook and the request_reporter module were developed by Margaret Gratian with support from the NIH RePORTER team.*
________________________________________________

**Notebook Goals:**

Demonstrate how to use the request_reporter module to request NIH award data from RePORTER.

**Requirements:**

This notebook relies on a simple custom package called reporter and its request_reporter module. The module has functions for requesting data from the RePORTER API and getting back results structured as a Pandas DataFrame. This module is not comprehensive of all RePORTER API functionality and instead can be thought of as a starting point for further development. 

The reporter package must be located in the same directory as this notebook in order for this notebook to run correctly. If you move the reporter directory location, you will need to adjust the structure of the import below to tell Python where to find the package, as this is not a published package available in the Python Package Index.

**Resources:**
- RePORTER homepage: https://reporter.nih.gov/
- API information: https://api.reporter.nih.gov/

## Import Packages

In [None]:
import pandas as pd

# Import the custom package for working with the RePORTER API
from reporter import request_reporter as rr

## Demo Requesting Data from the API

The query_reporter module consists of several different functions for requesting data from the RePORTER API. Below, some examples are included. 

### Request NCI Administered R01 Grants in FY24

We can use the query_by_year_and_activity_codes function for this.

In [None]:
# See what this function does:
help(rr.request_nci_awards_by_year_and_activity_codes)

In [None]:
# Use the function to request R01 awards for FY 2024
# Note by default of this function, it will return NCI administered awards.
r01_24_results_df = rr.request_nci_awards_by_year_and_activity_codes([2024], ["R01"])
print(r01_24_results_df.shape)

r01_24_results_df.head()

In [None]:
# If you had PIs you also wanted to specify, you could use the following function, passing a list of PPIDs as an additional paramter
help(rr.request_nci_awards_by_year_activity_codes_and_ppids)

### Request R01 Equivalents for FY 2000-2020 for A Different IC

The previous example demonstrated use of a function that is specific to NCI. Here, we use the more generic "request_by_user_payload" function to specify a different IC. We use NHLBI, but note that RePORTER contains data from NIH and other Federal agencies. 

As of 3/13/2025, if you want to request projects from other NIH ICs, you can use the following IC acronynms: 
- CLC
- FIC
- NCATS
- NICCIH
- NCI
- NEI
- NHGRI
- NHLBI
- NIA
- NIAA
- NIAID
- NIAMS
- NIBIB
- NICHD
- NIDA
- NIDCD
- NIDCR
- NIDDK
- NIEHS
- NIGMS
- NIMH
- NIMHD
- NINDS
- NINR
- NLM
- OD

Another important concept demonstrated in this example is handling large requests. The RePORTER API has a maximum value of 14,999 records from one request. In this example, we demonstrate an option for splitting up a large request to meet this constraint.

In [None]:
# Define criteria

# FYs 2000-2020
years = list(range(2000,2021))

# Administering agencies
# Note this must be passed as a list even though we are specifying just one agency
agencies = ["NHLBI"]

# NIH R01 equivalents
r01_eq_codes = ["R01", "DP1", "DP2", "DP5", "R23", "R29", "R37", "R56", "RF1", "RL1", "U01"]

In [None]:
# Create the payload 
data = {
        "criteria":
        {
            "fiscal_years": years, 
            "agencies": agencies, 
            "activity_codes": r01_eq_codes
        },
        "limit":500,
        "sort_field":"project_start_date",
        "sort_order":"desc"
}

In [None]:
# Now, we're going to use the get_total_records function to see how many results we can expect

# See what the function does first
help(rr.get_total_records)

In [None]:
# Use the get_total_records function to see the expected number of results
print(rr.get_total_records(data))

In [None]:
# One strategy for approaching this is to chunk up the query by fiscal year

# Split the years_range into chunks of 3 
# We'll use a list comprehension to make lists of length 3
year_subsets = [years[i:i + 3] for i in range(0, len(years), 3)]

year_subsets

In [None]:
# We're going to use the query_by_user_payload function below 
help(rr.request_by_user_payload)

In [None]:
# Now, request data 3 years at a time
# This code saves both the results of each request as a csv 
# in addition to all request results combined into a DataFrame (that is later saved as a csv)

# Create an empty list to hold results
results = []
    
# Iterate through the entire list of lists
for subset in year_subsets: 
    print(subset)
    
    # Create the payload 
    data = {
            "criteria":
            {
                "fiscal_years": subset, # subset of fiscal years 
                "agencies":agencies, # payload criteria for funding agency, defined above 
                "activity_codes": r01_eq_codes # payload criteria for activity codes, defined above
            },
            "limit":500,
            "sort_field":"project_start_date",
            "sort_order":"desc"
    }
    
    # Request and save results to subset 
    subset_df = rr.request_by_user_payload(data)
    
    # Optionally save each subset as its own dataset 
    filename = str(subset) + ".csv"
    subset_df.to_csv(filename)
    
    # Append the DataFrame to results
    results.append(subset_df)

In [None]:
# Concat list of results into DataFrame
results_df = pd.concat(results)

# See the shape - it should match what get_total_records told us we could expect!
print(results_df.shape)

# Preview the resulting DataFrame
results_df.head()

In [None]:
# Save this output
results_df.to_csv("results.csv")