### Notebook to analyze GRFP Awardee results
The files are stored in this repository and so you don't need to download them, but if you wanted to:
First, select the year and type of results you want, and then follow this URL to get the CSV file: https://www.research.gov/grfp/AwardeeList.do?method=sort&exportType=1


Note that NSF used commas as the separator AND inside the fields, which makes importing the data very annoying. As such, I've only imported the first three columns here (Last name, First name, Current Institution). 

In [2]:
import pandas as pd

In [5]:
GRFP = pd.read_csv('2021GRFPAwardeeList.csv', sep=',',
                   names=['Last', 'First', 'Current', 'Extra', 'Extra2', 'Extra3', 'Extra4']
                  )
GRFP

Unnamed: 0,Last,First,Current,Extra,Extra2,Extra3,Extra4
0,Abdo,Emily Eugenia,Princeton University,Engineering - Chemical Engineering,,,
1,Abed,Ahmad Matar,University of Puerto Rico at Humacao,Materials Research - Electronic Materials,University of Michigan - Ann Arbor,,
2,Abel,Charlotte magalie,California Polytechnic State University,Social Sciences - Sociology,University of California-Los Angeles,,
3,Abellera,Marriah,University of California-Santa Barbara,Engineering - Environmental Engineering,,,
4,Abubakare,Oluwatobi,University of Rochester,Psychology - Social Psychology,Harvard University,,
...,...,...,...,...,...,...,...
2069,Zubatiy,Tamara,UNIVERSITY OF CALIFORNIA SAN DIEGO,Comp/IS/Eng - Other (specify) - Human Centered...,Georgia Institute of Technology,,
2070,Zuckerman,Joseph Harry,Harvard University,Comp/IS/Eng - Computer Architecture,Columbia University,,
2071,Zuetell,Emily J,University of Colorado at Boulder,Engineering - Mechanical Engineering,University of Colorado at Boulder,,
2072,Zureick,Nadine,Georgia Institute of Technology,Engineering - Biomedical Engineering,Georgia Institute of Technology,,


In [37]:
GRFP_HM = pd.read_csv('2021GRFPHonorableMention.csv', sep=',',
                   names=['Last', 'First', 'Current', 'Extra', 'Extra2', 'Extra3', 'Extra4']
                  )
GRFP_HM

Unnamed: 0,Last,First,Current,Extra,Extra2,Extra3,Extra4
0,Abasi,Lannah,CALIFORNIA STATE UNIVERSITY,NORTHRIDGE,Life Sciences - Structural Biology,UNIVERSITY OF CALIFORNIA SAN DIEGO,
1,Abbate,Jewel Alessandra,University of California-Los Angeles,Geosciences - Geophysics,University of California-Los Angeles,,
2,Abber,Sophie Rose,University of San Diego,Psychology - Neuropsychology,DREXEL UNIVERSITY,,
3,Abbott,Caroline Patterson,College of William and Mary,Life Sciences - Evolutionary Biology,University of Chicago,,
4,Abele,Ethan John,Oklahoma State University,Engineering - Electrical and Electronic Engine...,,,
...,...,...,...,...,...,...,...
1778,Zmich,Anna,Iowa State University,Engineering - Bioengineering,University of Wisconsin-Madison,,
1779,Zoltek,Madeline Abby,Yale University,Life Sciences - Biochemistry,UNIVERSITY OF CALIFORNIA,BERKELEY,
1780,Zuercher,Madeleine Elise,UNIVERSITY OF CALIFORNIA,BERKELEY,Life Sciences - Ecology,University of California-Los Angeles,
1781,Zuniga,LeAnn Xiomara,California State L A University Auxiliary Serv...,Geosciences - Hydrology,University of Massachusetts Amherst,,


### Which current instutution had the most awardees?
As I said above, they used commas as both their separator and in the fields. This creates issues for everything downstream of current institution. Creative solutions welcome...

In [32]:
GRFP_byCurrentInst = GRFP.groupby('Current').size().sort_values(ascending=False)
GRFP_byCurrentInst

Current
UNIVERSITY OF CALIFORNIA                 64
Massachusetts Institute of Technology    61
Stanford University                      42
Georgia Institute of Technology          33
University of Chicago                    33
                                         ..
Oakland University                        1
OKLAHOMA STATE UNIVERSITY                 1
Nova Southeastern University              1
Northern Kentucky University              1
Youngstown State University               1
Length: 477, dtype: int64

### Did anyone at USF get an award?

In [42]:
GRFP[GRFP['Current'].str.contains("University of South Florida")]

Unnamed: 0,Last,First,Current,Extra,Extra2,Extra3,Extra4
1984,Withers,Zachary Hoyt,University of South Florida,Physics and Astronomy - Solid State Physics,University of South Florida,,


### Which current instutution had the most honorable mentions?

In [43]:
GRFP_HMbyCurrentInst = GRFP_HM.groupby('Current').size().sort_values(ascending=False)
GRFP_HMbyCurrentInst

Current
UNIVERSITY OF CALIFORNIA                         67
Massachusetts Institute of Technology            31
University of California-Los Angeles             29
University of Texas at Austin                    26
Yale University                                  24
                                                 ..
North Dakota State University Fargo               1
North Carolina Central University                 1
New Mexico Institute of Mining and Technology     1
New Jersey Institute of Technology                1
Youngstown State University                       1
Length: 489, dtype: int64

### Let's look at just Florida institutions

In [44]:
GRFP_byCurrentFLInst = GRFP[GRFP['Current'].str.contains("Florida")].groupby('Current').size().sort_values(ascending=False)
GRFP_byCurrentFLInst

Current
University of Florida                                  28
Florida International University                        9
New College of Florida                                  5
University of Central Florida                           4
Florida Atlantic University                             2
Florida Gulf Coast University                           2
Florida State University                                2
Florida Agricultural and Mechanical University          1
Florida Southern College                                1
The University of Central Florida Board of Trustees     1
University of North Florida                             1
University of South Florida                             1
dtype: int64

In [45]:
GRFP_HMbyCurrentFLInst = GRFP_HM[GRFP_HM['Current'].str.contains("Florida")].groupby('Current').size().sort_values(ascending=False)
GRFP_HMbyCurrentFLInst

Current
University of Florida               19
Florida International University     5
University of Central Florida        3
University of South Florida          3
Florida State University             2
New College of Florida               2
Florida Atlantic University          1
Florida Gulf Coast University        1
dtype: int64

### Why did UF get so many awards? Maybe it's because they have a specific type of program.
Remember, this year they encouraged "computationally intensive research." https://www.nature.com/articles/d41586-020-02272-x

In [46]:
GRFP[GRFP['Current'].str.contains("University of Florida")]

Unnamed: 0,Last,First,Current,Extra,Extra2,Extra3,Extra4
29,Allen,Anthony,University of Florida,Engineering - Aeronautical and Aerospace Engin...,University of Florida,,
31,Alomar,Nathalie Marie,University of Florida,Life Sciences - Ecology,University of Florida,,
68,Astrab,Leilani,University of Florida,Engineering - Biomedical Engineering,UNIVERSITY OF VIRGINIA,,
106,Beaudry,David,University of Florida,Engineering - Materials Engineering,Johns Hopkins University,,
205,Buckner,Samuel Clark,University of Florida,Engineering - Aeronautical and Aerospace Engin...,University of Florida,,
251,Carroll,Katherine Caprice,University of Florida,Life Sciences - Ecology,University of Florida,,
424,Diaz,Maximillian,University of Florida,Engineering - Biomedical Engineering,University of Florida,,
488,Elie,Anne-Ketura,University of Florida,Psychology - Social Psychology,,,
538,Ficarrotta,Joseph Michael,University of Florida,Engineering - Biomedical Engineering,UNIVERSITY OF VIRGINIA,,
641,Gonzalez,Natalia Pilar,University of Florida,Engineering - Mechanical Engineering,University of Florida,,


#### Same story in honorable mentions...

In [49]:
GRFP_HM[GRFP_HM['Current'].str.contains("University of Florida")]

Unnamed: 0,Last,First,Current,Extra,Extra2,Extra3,Extra4
78,Banks,Claudia Lynn,University of Florida,Geosciences - Sedimentary Geology,University of Texas at Austin,,
158,Brennan,Liam,University of Florida,Physics and Astronomy - Particle Physics,University of Florida,,
355,Davis,Matthew William,University of Florida,Life Sciences - Systems and Molecular Biology,University of Florida,,
512,Ghaffari,Kimia,University of Florida,Engineering - Mechanical Engineering,University of Florida,,
571,Gunson,Jessica Lauren,University of Florida,Social Sciences - Biological Anthropology,New York University,,
719,Johnston,Jeremy Allen,University of Florida,Engineering - Electrical and Electronic Engine...,Columbia University,,
850,LeClare,Shelby,University of Florida,Life Sciences - Ecology,,,
986,McClellan,Brian Connor,University of Florida,Physics and Astronomy - Astronomy and Astrophy...,UNIVERSITY OF VIRGINIA,,
1044,Mizell,Gabriella Marie,University of Florida,Life Sciences - Ecology,Board of Regents,NSHE,obo University of Nevada
1056,Moring,Hannah Esther,University of Florida,Engineering - Electrical and Electronic Engine...,Regents of the University of Michigan - Ann Arbor,,


### Who from USF got honorable mention?

In [60]:
GRFP_HM[GRFP_HM['Current'].str.contains("University of South Florida")]

Unnamed: 0,Last,First,Current,Extra,Extra2,Extra3,Extra4
417,Edwards,Jack Thomas,University of South Florida,Life Sciences - Bioinformatics and Computation...,University of South Florida,,
1079,Muni-Morgan,Amanda,University of South Florida,Geosciences - Other (specify) - Soil and water...,University of Florida,,
1199,Peak,Stephanie Lynne,University of South Florida,Life Sciences - Cell Biology,University of South Florida,,


### Did any people with the words "ocean" in their current for future department's name get the award?

In [64]:
GRFP[GRFP.apply(lambda row: row.astype(str).str.contains('Ocean').any(), axis=1)]

Unnamed: 0,Last,First,Current,Extra,Extra2,Extra3,Extra4
159,Bonan,David B,University of Washington,Geosciences - Physical Oceanography,California Institute of Technology,,
402,de Leon Sanchez,Erin Esther,University of California-Davis,Geosciences - Biological Oceanography,University of California-Santa Barbara,,
563,Formby-Fernandez,Adriana Denise,Embry-Riddle Aeronautical University,Geosciences - Physical Oceanography,Embry-Riddle Aeronautical University,,
686,Guerra,Alexis Danielle,University of California-Irvine,Geosciences - Biological Oceanography,University of California-Irvine,,
690,Gunnells,Shelby Ann,North Dakota State University Fargo,Geosciences - Chemical Oceanography,North Dakota State University Fargo,,
1028,Layton,Janelle Monet,Hampton University,Geosciences - Biological Oceanography,Oregon State University,,
1085,Litle,John,Pomona College,Geosciences - Biological Oceanography,UNIVERSITY OF WASHINGTON,,
1210,McDonald,Adriane Michelle,Spelman College,Geosciences - Biological Oceanography,University of California-Santa Barbara,,
1414,Perez,Elena Kathleen,Rensselaer Polytechnic Institute,Geosciences - Physical Oceanography,,,
1573,Rogers,Mason,Stanford University,Geosciences - Physical Oceanography,Massachusetts Institute of Technology,,


### How about "Earth"

In [65]:
GRFP[GRFP.apply(lambda row: row.astype(str).str.contains('Earth').any(), axis=1)]

Unnamed: 0,Last,First,Current,Extra,Extra2,Extra3,Extra4
1402,Pavur,Gertrude K,Georgia Institute of Technology,Engineering - Other (specify) - Earth Observat...,Natural Disasters,UNIVERSITY OF VIRGINIA,


### Engineering?

In [15]:
GRFP[GRFP.apply(lambda row: row.astype(str).str.contains('Engineering').any(), axis=1)].count()

Last       658
First      658
Current    658
Extra      658
Extra2     629
Extra3      44
Extra4      17
dtype: int64

In [23]:
GRFP.count()

Last       2074
First      2074
Current    2074
Extra      2074
Extra2     1848
Extra3      233
Extra4      111
dtype: int64

## Now look at 2020 as a comparison

In [9]:
GRFP2020 = pd.read_csv('2020GRFPAwardeeList.csv', sep=',',
                   names=['Last', 'First', 'Current', 'Extra', 'Extra2', 'Extra3', 'Extra4']
                  )
GRFP_HM2020 = pd.read_csv('2020GRFPHonorableMention.csv', sep=',',
                   names=['Last', 'First', 'Current', 'Extra', 'Extra2', 'Extra3', 'Extra4']
                  )

In [10]:
GRFP_byCurrentInst2020 = GRFP2020.groupby('Current').size().sort_values(ascending=False)
GRFP_byCurrentInst2020

Current
Massachusetts Institute of Technology    68
UNIVERSITY OF CALIFORNIA                 65
University of Michigan Ann Arbor         34
Stanford University                      33
Columbia University                      32
                                         ..
University College London                 1
LeTourneau University                     1
Lawrence Technological University         1
Knox College                              1
Youngstown State University               1
Length: 477, dtype: int64

In [11]:
GRFP2020[GRFP2020['Current'].str.contains("University of South Florida")]

Unnamed: 0,Last,First,Current,Extra,Extra2,Extra3,Extra4
159,Blackwell,Keller Lloyd,University of South Florida,Comp/IS/Eng - Algorithms and Theoretical Found...,University of South Florida,,
1157,McClinton,Willie B,University of South Florida,Comp/IS/Eng - Machine Learning,University of South Florida,,


In [12]:
GRFP2020[GRFP2020['Current'].str.contains("Florida")]

Unnamed: 0,Last,First,Current,Extra,Extra2,Extra3,Extra4
28,Albo,Jonathan Eric,Florida State University,Engineering - Bioengineering,Florida State University,,
30,Alderson,Hannah Elizabeth,Florida State University,Engineering - Biomedical Engineering,Florida State University,,
64,Aponte,Destinee,New College of Florida,Life Sciences - Neurosciences,,,
159,Blackwell,Keller Lloyd,University of South Florida,Comp/IS/Eng - Algorithms and Theoretical Found...,University of South Florida,,
166,Boebinger,Scott Everett,Florida State University,Engineering - Biomedical Engineering,Georgia Institute of Technology,,
181,BOTELLO,JORDY FELIPE,University of Florida,Life Sciences - Cell Biology,University of Florida,,
207,Brooks,Ashley Jammel,Florida Agricultural and Mechanical University,Physics and Astronomy - Particle Physics,Indiana University,,
229,Burke,Kristen Lagasse,University of Florida,Social Sciences - Sociology,University of Texas at Austin,,
357,Crichlow,Queenisha,Florida Memorial University,Psychology - Developmental Psychology,University of Alabama at Birmingham,,
377,Damas,Stephanie Ashley,Florida State University,STEM Education and Learning Research - Enginee...,Florida State University,,


In [13]:
GRFP2020[GRFP2020['Current'].str.contains("University of Florida")]

Unnamed: 0,Last,First,Current,Extra,Extra2,Extra3,Extra4
181,BOTELLO,JORDY FELIPE,University of Florida,Life Sciences - Cell Biology,University of Florida,,
229,Burke,Kristen Lagasse,University of Florida,Social Sciences - Sociology,University of Texas at Austin,,
387,Davis,Zo� Indiana,University of Florida,Engineering - Bioengineering,University of Florida,,
478,El Basha,Mohammad Daniel,University of Florida,Engineering - Biomedical Engineering,MD Anderson Cancer Center,,
504,Fares,Wisam,University of Florida,Engineering - Biomedical Engineering,UNIVERSITY OF VIRGINIA,,
734,Hester,Holley Grace,University of Florida,Chemistry - Macromolecular,Supramolecular,and Nanochemistry,Cornell University
889,Kempfert,Katherine Candice,University of Florida,Mathematical Sciences - Statistics,UNIVERSITY OF CALIFORNIA,BERKELEY,
1071,Loring,Kaden Jay,University of Florida,Physics and Astronomy - Astronomy and Astrophy...,University of Florida,,
1160,McCourt,Kelli Marie,University of Florida,Engineering - Environmental Engineering,University of Florida,,
1221,Molinaro,Dean Devine,University of Florida,Engineering - Mechanical Engineering,Georgia Institute of Technology,,


In [24]:
GRFP2020[GRFP2020.apply(lambda row: row.astype(str).str.contains('Engineering').any(), axis=1)].count()

Last       655
First      655
Current    655
Extra      655
Extra2     627
Extra3      43
Extra4      17
dtype: int64

In [22]:
GRFP2020.count()

Last       2076
First      2076
Current    2076
Extra      2076
Extra2     1872
Extra3      263
Extra4      122
dtype: int64