### Jupyter notebook to distill GRFP Awardee results

I made this notebook because I was curious about the distribution of GRFP awards by institution and discipline and I'm currently learning Python and git. If you have suggestions or changes, please feel free to provide feedback.

The output is all below, but if you wish to rerun it yourself (perhaps with a different year), then make your changes and press the "Run" button above to step through each cell. Press the "fast forward" button to rerun the whole notebook from start to finish.

The fields in the data file should be "Name", "Baccalaureate Institution", "Field of Study", and "Current Institution". 

Note that NSF used commas as the separator AND inside the fields, which makes importing the data very annoying. As such, I've only labeled the first three columns here. The other columns are labeled "Extra*" and will be used when we search for key words like "engineering" or "ocean".

In [1]:
import pandas as pd

In [2]:
year = 2021 #Change the year here to 2020 or 2019 and rerun the code 
GRFP = pd.read_csv(str(year) + 'GRFPAwardeeList.csv', sep=',',
                   names=['Last', 'First', 'BaccalaureateInstitution', 
                          'Extra', 'Extra2', 'Extra3', 'Extra4']
                  )
GRFP.head() # preview the head of the GRFP dataframe just created

Unnamed: 0,Last,First,BaccalaureateInstitution,Extra,Extra2,Extra3,Extra4
0,Abdo,Emily Eugenia,Princeton University,Engineering - Chemical Engineering,,,
1,Abed,Ahmad Matar,University of Puerto Rico at Humacao,Materials Research - Electronic Materials,University of Michigan - Ann Arbor,,
2,Abel,Charlotte magalie,California Polytechnic State University,Social Sciences - Sociology,University of California-Los Angeles,,
3,Abellera,Marriah,University of California-Santa Barbara,Engineering - Environmental Engineering,,,
4,Abubakare,Oluwatobi,University of Rochester,Psychology - Social Psychology,Harvard University,,


### Which baccalaureate instututions had the most awardees?
As I said above, they used commas as both their separator and in the fields. This creates issues for everything downstream of baccalaureate institution. Creative solutions welcome...

This call includes engineering, which seems to swamp the other disciplines.

In [3]:
GRFP_byBaccInst = GRFP.groupby('BaccalaureateInstitution').size().sort_values(ascending=False)
GRFP_byBaccInst.head(20)

BaccalaureateInstitution
UNIVERSITY OF CALIFORNIA                             64
Massachusetts Institute of Technology                61
Stanford University                                  42
Georgia Institute of Technology                      33
University of Chicago                                33
Yale University                                      31
Cornell University                                   29
University of Texas at Austin                        29
Regents of the University of Michigan - Ann Arbor    29
University of Florida                                28
Harvard University                                   27
UNIVERSITY OF WASHINGTON                             27
Princeton University                                 26
Brown University                                     25
Columbia University                                  25
University of Minnesota-Twin Cities                  22
Northwestern University                              21
University of Illinois 

### Did anyone who got a Bachelor's at USF get an award?

In [4]:
GRFP[GRFP['BaccalaureateInstitution'].str.contains("University of South Florida", case=False)]

Unnamed: 0,Last,First,BaccalaureateInstitution,Extra,Extra2,Extra3,Extra4
1984,Withers,Zachary Hoyt,University of South Florida,Physics and Astronomy - Solid State Physics,University of South Florida,,


### Did anyone who is currently at USF get an award?
(This will also include those who got bachelor's from USF)

In [5]:
GRFP[GRFP.apply(lambda row: row.astype(str).str.contains('University of South Florida', case=False).any(), axis=1)]

Unnamed: 0,Last,First,BaccalaureateInstitution,Extra,Extra2,Extra3,Extra4
1984,Withers,Zachary Hoyt,University of South Florida,Physics and Astronomy - Solid State Physics,University of South Florida,,


### Let's look at just Florida institutions

In [6]:
GRFP_byBaccFLInst = GRFP[GRFP['BaccalaureateInstitution'].str.contains("Florida")].groupby('BaccalaureateInstitution').size().sort_values(ascending=False)
GRFP_byBaccFLInst

BaccalaureateInstitution
University of Florida                                  28
Florida International University                        9
New College of Florida                                  5
University of Central Florida                           4
Florida Atlantic University                             2
Florida Gulf Coast University                           2
Florida State University                                2
Florida Agricultural and Mechanical University          1
Florida Southern College                                1
The University of Central Florida Board of Trustees     1
University of North Florida                             1
University of South Florida                             1
dtype: int64

### Which programs at UF are producing lots of grads who receive awards?

In [7]:
GRFP[GRFP['BaccalaureateInstitution'].str.contains("University of Florida", case=False)]

Unnamed: 0,Last,First,BaccalaureateInstitution,Extra,Extra2,Extra3,Extra4
29,Allen,Anthony,University of Florida,Engineering - Aeronautical and Aerospace Engin...,University of Florida,,
31,Alomar,Nathalie Marie,University of Florida,Life Sciences - Ecology,University of Florida,,
68,Astrab,Leilani,University of Florida,Engineering - Biomedical Engineering,UNIVERSITY OF VIRGINIA,,
106,Beaudry,David,University of Florida,Engineering - Materials Engineering,Johns Hopkins University,,
205,Buckner,Samuel Clark,University of Florida,Engineering - Aeronautical and Aerospace Engin...,University of Florida,,
251,Carroll,Katherine Caprice,University of Florida,Life Sciences - Ecology,University of Florida,,
424,Diaz,Maximillian,University of Florida,Engineering - Biomedical Engineering,University of Florida,,
488,Elie,Anne-Ketura,University of Florida,Psychology - Social Psychology,,,
538,Ficarrotta,Joseph Michael,University of Florida,Engineering - Biomedical Engineering,UNIVERSITY OF VIRGINIA,,
641,Gonzalez,Natalia Pilar,University of Florida,Engineering - Mechanical Engineering,University of Florida,,


## Remember, for 2021 (due in Oct 2020) they had priority areas which were "Artificial Intelligence, Quantum Information Science, and Computationally Intensive Research"
https://www.nsf.gov/pubs/2020/nsf20587/nsf20587.htm
https://www.nature.com/articles/d41586-020-02272-x

We might expect that this would lead to more awardees in certain types of departments. How many applicants have "engineering" in their discipline, baccalaureate, or current institution name?

You can rerun this notebook again to get results for 2019 and 2020, but here are the three years of results:

For 2021, it's 658 out of 2074 awardees.

For 2020, it's 655 out of 2076 awardees.

For 2019, it's 605 out of 2052 awardees.

In [8]:
GRFP[GRFP.apply(lambda row: row.astype(str).str.contains('Engineering', case=False).any(), axis=1)].count()

Last                        658
First                       658
BaccalaureateInstitution    658
Extra                       658
Extra2                      629
Extra3                       44
Extra4                       17
dtype: int64

In [9]:
GRFP.count() #total applicants

Last                        2074
First                       2074
BaccalaureateInstitution    2074
Extra                       2074
Extra2                      1848
Extra3                       233
Extra4                       111
dtype: int64

## Now let's look only at geosciences
There were 99 awards total.

In [10]:
GRFP_Geo = GRFP[GRFP.apply(lambda row: row.astype(str).str.contains('Geosciences -', case=False).any(), axis=1)]
GRFP_Geo.count()

Last                        99
First                       99
BaccalaureateInstitution    99
Extra                       99
Extra2                      74
Extra3                       5
Extra4                       2
dtype: int64

In [11]:
GRFP_Geo_byBaccInst = GRFP_Geo.groupby('BaccalaureateInstitution').size().sort_values(ascending=False)
GRFP_Geo_byBaccInst.head(20)

BaccalaureateInstitution
UNIVERSITY OF CALIFORNIA              3
William Marsh Rice University         2
Florida International University      2
Princeton University                  2
UNIVERSITY OF WASHINGTON              2
Northwestern University               2
University of California-Berkeley     2
University of California-Davis        2
Hampton University                    2
Harvard University                    2
Stanford University                   2
California Institute of Technology    2
Carleton College                      2
University of Texas at Austin         2
UNIVERSITY OF VIRGINIA                1
University of California - Merced     1
University of Arizona                 1
University of  Puget Sound            1
University of Texas at Dallas         1
UNIVERSITY OF MIAMI                   1
dtype: int64

### How many awards in Oceanography?
In 2021, there were 15, and only 2 in chemical oceanography.

In 2020, there were 17, and 6 in chemical oceanography.

In 2019, there were 13, and 4 in chemical oceanography.

In [12]:
GRFP_Geo[GRFP_Geo.apply(lambda row: row.astype(str).str.contains('Oceanography', case=False).any(), axis=1)]

Unnamed: 0,Last,First,BaccalaureateInstitution,Extra,Extra2,Extra3,Extra4
159,Bonan,David B,University of Washington,Geosciences - Physical Oceanography,California Institute of Technology,,
402,de Leon Sanchez,Erin Esther,University of California-Davis,Geosciences - Biological Oceanography,University of California-Santa Barbara,,
563,Formby-Fernandez,Adriana Denise,Embry-Riddle Aeronautical University,Geosciences - Physical Oceanography,Embry-Riddle Aeronautical University,,
686,Guerra,Alexis Danielle,University of California-Irvine,Geosciences - Biological Oceanography,University of California-Irvine,,
690,Gunnells,Shelby Ann,North Dakota State University Fargo,Geosciences - Chemical Oceanography,North Dakota State University Fargo,,
1028,Layton,Janelle Monet,Hampton University,Geosciences - Biological Oceanography,Oregon State University,,
1085,Litle,John,Pomona College,Geosciences - Biological Oceanography,UNIVERSITY OF WASHINGTON,,
1210,McDonald,Adriane Michelle,Spelman College,Geosciences - Biological Oceanography,University of California-Santa Barbara,,
1414,Perez,Elena Kathleen,Rensselaer Polytechnic Institute,Geosciences - Physical Oceanography,,,
1573,Rogers,Mason,Stanford University,Geosciences - Physical Oceanography,Massachusetts Institute of Technology,,


In [13]:
GRFP_Geo[GRFP_Geo.apply(lambda row: row.astype(str).str.contains('Oceanography', case=False).any(), axis=1)].count()

Last                        15
First                       15
BaccalaureateInstitution    15
Extra                       15
Extra2                      12
Extra3                       1
Extra4                       1
dtype: int64

## How many in Marine Biology?
In 2021, there were 12.

In 2020, there were 10.

In 2019, there were 12.

In [14]:
GRFP_Geo[GRFP_Geo.apply(lambda row: row.astype(str).str.contains('Marine Biology', case=False).any(), axis=1)]

Unnamed: 0,Last,First,BaccalaureateInstitution,Extra,Extra2,Extra3,Extra4
218,Bushnell,Elizabeth Josephine,University of San Diego,Geosciences - Marine Biology,,,
230,Caldwell,Aliya Everest,Rutgers University New Brunswick,Geosciences - Marine Biology,University of New Hampshire,,
341,Collins,Stormie Blayze,Florida International University,Geosciences - Marine Biology,,,
350,Cook McNab,Aimee Arielle,Texas A&M University at Galveston,Geosciences - Marine Biology,,,
531,Fenwick,Ileana Faye,Hampton University,Geosciences - Marine Biology,UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL,,
844,Jarman,Cheyenne Nicole,University of California-Santa Cruz,Geosciences - Marine Biology,Oregon State University,,
862,Johnson,Carter,UNIVERSITY OF WASHINGTON,Geosciences - Marine Biology,,,
1218,McLean,Josette Elena Trisha,St. George's University,Geosciences - Marine Biology,Hampton University,,
1273,Moore,Malia Leilani,University of California-Berkeley,Geosciences - Marine Biology,UNIVERSITY OF CALIFORNIA SAN DIEGO,,
1332,Nodal,Andrea Alejandra,Florida International University,Geosciences - Marine Biology,,,


In [15]:
GRFP_Geo[GRFP_Geo.apply(lambda row: row.astype(str).str.contains('Marine Biology', case=False).any(), axis=1)].count()

Last                        12
First                       12
BaccalaureateInstitution    12
Extra                       12
Extra2                       7
Extra3                       0
Extra4                       0
dtype: int64