<a href="https://colab.research.google.com/github/ryanpseely/reu_nsf/blob/main/reu_nsf_code.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NSF Awards & Political Interference

Analysis of how politics may interfere with the type and amount of awards given by the NSF to researchers

Note on GitHub syncing: Using Colab with GitHub is NOT like RMarkdown files and GitHub Desktop. To "push" changes to GitHub, I must do the following steps...
1. File > Save a copy in GitHub
2. Choose the reu_nsf repository
3. Write a commit message
4. Click "OK" when ready to commit the changes.

In [None]:
import pandas as pd

In [None]:
# Data set stored in GitHub: Import into a Pandas data frame
# Changed the URL to point to the raw CSV data instead of the GitHub page
csv_file_24 = "https://raw.githubusercontent.com/ryanpseely/reu_nsf/main/Awards_JanMar24.csv"

award_data_24 = pd.read_csv(csv_file_24, encoding='latin1')

csv_file_25 = "https://raw.githubusercontent.com/ryanpseely/reu_nsf/main/Awards_JanMar25.csv"

award_data_25 = pd.read_csv(csv_file_25, encoding='latin1')

In [None]:
award_data_24.columns # Show the list of variables in the file

Index(['AwardNumber', 'Title', 'NSFOrganization', 'Program(s)', 'StartDate',
       'LastAmendmentDate', 'PrincipalInvestigator', 'State', 'Organization',
       'AwardInstrument', 'ProgramManager', 'EndDate', 'AwardedAmountToDate',
       'Co-PIName(s)', 'PIEmailAddress', 'OrganizationStreet',
       'OrganizationCity', 'OrganizationState', 'OrganizationZip',
       'OrganizationPhone', 'NSFDirectorate', 'ProgramElementCode(s)',
       'ProgramReferenceCode(s)', 'ARRAAmount', 'Abstract'],
      dtype='object')

In [None]:
award_data_25.columns

Index(['AwardNumber', 'Title', 'NSFOrganization', 'Program(s)', 'StartDate',
       'LastAmendmentDate', 'PrincipalInvestigator', 'State', 'Organization',
       'AwardInstrument', 'ProgramManager', 'EndDate', 'AwardedAmountToDate',
       'Co-PIName(s)', 'PIEmailAddress', 'OrganizationStreet',
       'OrganizationCity', 'OrganizationState', 'OrganizationZip',
       'OrganizationPhone', 'NSFDirectorate', 'ProgramElementCode(s)',
       'ProgramReferenceCode(s)', 'ARRAAmount', 'Abstract'],
      dtype='object')

In [None]:
print(len(award_data_24))
print(len(award_data_25))

1225
788


In [None]:
award_data_24.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1225 entries, 0 to 1224
Data columns (total 25 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   AwardNumber              1225 non-null   int64 
 1   Title                    1225 non-null   object
 2   NSFOrganization          1225 non-null   object
 3   Program(s)               1225 non-null   object
 4   StartDate                1225 non-null   object
 5   LastAmendmentDate        1225 non-null   object
 6   PrincipalInvestigator    1225 non-null   object
 7   State                    1225 non-null   object
 8   Organization             1225 non-null   object
 9   AwardInstrument          1225 non-null   object
 10  ProgramManager           1225 non-null   object
 11  EndDate                  1225 non-null   object
 12  AwardedAmountToDate      1225 non-null   object
 13  Co-PIName(s)             368 non-null    object
 14  PIEmailAddress           1225 non-null  

In [None]:
award_data_24.head()

Unnamed: 0,AwardNumber,Title,NSFOrganization,Program(s),StartDate,LastAmendmentDate,PrincipalInvestigator,State,Organization,AwardInstrument,...,OrganizationStreet,OrganizationCity,OrganizationState,OrganizationZip,OrganizationPhone,NSFDirectorate,ProgramElementCode(s),ProgramReferenceCode(s),ARRAAmount,Abstract
0,2348998,Collaborative Research: REU Site: Earth and Pl...,EAR,"SPECIAL PROGRAMS IN ASTRONOMY, EDUCATION AND H...",03/01/2025,02/28/2024,Denton Ebel,NY,American Museum Natural History,Standard Grant,...,200 CENTRAL PARK W,NEW YORK,NY,100245102,2127695975,GEO,"121900, 157500","1206, 1207, 7736, 9178, 9250, SMET",$0.00,This award provides renewed funding for underg...
1,2348999,Collaborative Research: REU Site: Earth and Pl...,EAR,EDUCATION AND HUMAN RESOURCES,03/01/2025,02/28/2024,Timothy Paglione,NY,CUNY Graduate School University Center,Standard Grant,...,365 5TH AVE STE 8113,NEW YORK,NY,100164309,2128177526,GEO,157500,9250,$0.00,This award provides renewed funding for underg...
2,2400789,SHINE: The Evolution of Coronal Dimmings and T...,AGS,SOLAR-TERRESTRIAL,01/01/2025,02/05/2024,Larisza Krista,CO,University of Colorado at Boulder,Continuing Grant,...,3100 MARINE ST,Boulder,CO,803090001,3034926221,GEO,152300,,$0.00,Coronal mass ejections (CMEs) are colossal pla...
3,2338139,CAREER: Balancing the global alkalinity cycle ...,EAR,"Hydrologic Sciences, Geobiology & Low-Temp Geo...",01/01/2025,02/29/2024,Mark Torres,TX,William Marsh Rice University,Continuing Grant,...,6100 MAIN ST,Houston,TX,770051827,7133484820,GEO,"157900, 729500",1045,$0.00,The water flowing in a river today is largely ...
4,2348995,REU Site: Community-Soil-Air-Water: A Geoscien...,EAR,EDUCATION AND HUMAN RESOURCES,11/01/2024,12/16/2024,Flavia Dias de Souza Moraes,GA,"Georgia State University Research Foundation, ...",Standard Grant,...,58 EDGEWOOD AVE NE,ATLANTA,GA,303032921,4044133570,GEO,157500,9250,$0.00,Georgia State University will host the Communi...


In [None]:
# Remove commas, dollar signs
award_data_24['AwardedAmountToDate'] = award_data_24['AwardedAmountToDate'].str.replace('[\$,]', '', regex=True)
award_data_25['AwardedAmountToDate'] = award_data_25['AwardedAmountToDate'].str.replace('[\$,]', '', regex=True)


In [None]:
award_data_25['AwardedAmountToDate'].head()

Unnamed: 0,AwardedAmountToDate
0,190000.0
1,85950.0
2,190000.0
3,110000.0
4,110000.0


In [None]:
award_data_24['AwardedAmountToDate'].head()

Unnamed: 0,AwardedAmountToDate
0,425132.0
1,39206.0
2,223476.0
3,211792.0
4,393759.0


In [None]:
# Add year columns for the original award year
award_data_24['Year'] = '24'
award_data_25['Year'] = '25'

In [None]:
# Glue datasets together
glued_awards = pd.concat([award_data_24, award_data_25], ignore_index=True)

In [None]:
glued_awards.head()

Unnamed: 0,AwardNumber,Title,NSFOrganization,Program(s),StartDate,LastAmendmentDate,PrincipalInvestigator,State,Organization,AwardInstrument,...,OrganizationCity,OrganizationState,OrganizationZip,OrganizationPhone,NSFDirectorate,ProgramElementCode(s),ProgramReferenceCode(s),ARRAAmount,Abstract,Year
0,2348998,Collaborative Research: REU Site: Earth and Pl...,EAR,"SPECIAL PROGRAMS IN ASTRONOMY, EDUCATION AND H...",03/01/2025,02/28/2024,Denton Ebel,NY,American Museum Natural History,Standard Grant,...,NEW YORK,NY,100245102,2127696000.0,GEO,"121900, 157500","1206, 1207, 7736, 9178, 9250, SMET",$0.00,This award provides renewed funding for underg...,24
1,2348999,Collaborative Research: REU Site: Earth and Pl...,EAR,EDUCATION AND HUMAN RESOURCES,03/01/2025,02/28/2024,Timothy Paglione,NY,CUNY Graduate School University Center,Standard Grant,...,NEW YORK,NY,100164309,2128178000.0,GEO,157500,9250,$0.00,This award provides renewed funding for underg...,24
2,2400789,SHINE: The Evolution of Coronal Dimmings and T...,AGS,SOLAR-TERRESTRIAL,01/01/2025,02/05/2024,Larisza Krista,CO,University of Colorado at Boulder,Continuing Grant,...,Boulder,CO,803090001,3034926000.0,GEO,152300,,$0.00,Coronal mass ejections (CMEs) are colossal pla...,24
3,2338139,CAREER: Balancing the global alkalinity cycle ...,EAR,"Hydrologic Sciences, Geobiology & Low-Temp Geo...",01/01/2025,02/29/2024,Mark Torres,TX,William Marsh Rice University,Continuing Grant,...,Houston,TX,770051827,7133485000.0,GEO,"157900, 729500",1045,$0.00,The water flowing in a river today is largely ...,24
4,2348995,REU Site: Community-Soil-Air-Water: A Geoscien...,EAR,EDUCATION AND HUMAN RESOURCES,11/01/2024,12/16/2024,Flavia Dias de Souza Moraes,GA,"Georgia State University Research Foundation, ...",Standard Grant,...,ATLANTA,GA,303032921,4044134000.0,GEO,157500,9250,$0.00,Georgia State University will host the Communi...,24


In [None]:
glued_awards.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2013 entries, 0 to 2012
Data columns (total 26 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   AwardNumber              2013 non-null   int64  
 1   Title                    2013 non-null   object 
 2   NSFOrganization          2013 non-null   object 
 3   Program(s)               2013 non-null   object 
 4   StartDate                2013 non-null   object 
 5   LastAmendmentDate        2013 non-null   object 
 6   PrincipalInvestigator    2013 non-null   object 
 7   State                    2007 non-null   object 
 8   Organization             2013 non-null   object 
 9   AwardInstrument          2013 non-null   object 
 10  ProgramManager           2013 non-null   object 
 11  EndDate                  2013 non-null   object 
 12  AwardedAmountToDate      2013 non-null   object 
 13  Co-PIName(s)             563 non-null    object 
 14  PIEmailAddress          

In [None]:
# How many awarded each year?
print(f"{len(award_data_24)} awards in 2024")
print(f"{len(award_data_25)} awards in 2025")

1225 awards in 2024
788 awards in 2025


In [None]:
glued_awards['AwardedAmountToDate'] = pd.to_numeric(glued_awards['AwardedAmountToDate'], errors='coerce')


In [None]:
glued_awards.groupby('Year')['AwardedAmountToDate'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
24,1225.0,545383.89551,1423114.0,0.0,205847.0,360000.0,549445.0,19999300.0
25,788.0,366197.935279,603237.4,0.0,99999.0,248478.0,450068.25,9300395.0


In [None]:
glued_awards['Abstract'].str.len().mean()

np.float64(2772.440854870775)

In [None]:
award_data_25['Abstract'].str.len().mean()


np.float64(2607.362134688691)

In [None]:
award_data_24['Abstract'].str.len().mean()


np.float64(2878.4955102040817)

In [None]:
# new idea : run a two sample t test on the difference in AmountAwardedToDate variable