### Jupyter notebook to distill GRFP Awardee results

I made this notebook because I was curious about the distribution of GRFP awards by institution and discipline and I'm currently learning Python and git. If you have suggestions or changes, please feel free to provide feedback.

The output is all below, but if you wish to rerun it yourself (perhaps with a different year), then make your changes and press the "Run" button above to step through each cell. Press the "fast forward" button to rerun the whole notebook from start to finish.

The fields in the data file should be "Name", "Baccalaureate Institution", "Field of Study", and "Current Institution". 

Note that NSF used commas as the separator AND inside the fields, which makes importing the data very annoying. As such, I've only labeled the first three columns here. The other columns are labeled "Extra*" and will be used when we search for key words like "engineering" or "ocean".

In [1]:
import pandas as pd

In [2]:
year = 2019 #Change the year here to 2020 or 2019 and rerun the code 
GRFP = pd.read_csv(str(year) + 'GRFPAwardeeList.csv', sep=',',
                   names=['Last', 'First', 'BaccalaureateInstitution', 
                          'Extra', 'Extra2', 'Extra3', 'Extra4']
                  )
GRFP.head() # preview the head of the GRFP dataframe just created

Unnamed: 0,Last,First,BaccalaureateInstitution,Extra,Extra2,Extra3,Extra4
0,Abbott,Michael Eiji,University of California-Santa Barbara,Engineering - Mechanical Engineering,University of California-Berkeley,,
1,Abraham,Joel Opila,Yale University,Life Sciences - Ecology,,,
2,Abramson,Rose Antoinette,Massachusetts Institute of Technology,Engineering - Electrical and Electronic Engine...,UNIVERSITY OF CALIFORNIA,BERKELEY,
3,Abratenko,Polina,University of Michigan Ann Arbor,Physics and Astronomy - Particle Physics,University of Michigan Ann Arbor,,
4,Adams,Hallie Rose,Western Washington University,Geosciences - other (specify) - Ecohydrology,,,


### Which baccalaureate instututions had the most awardees?
As I said above, they used commas as both their separator and in the fields. This creates issues for everything downstream of baccalaureate institution. Creative solutions welcome...

This call includes engineering, which seems to swamp the other disciplines.

In [3]:
GRFP_byBaccInst = GRFP.groupby('BaccalaureateInstitution').size().sort_values(ascending=False)
GRFP_byBaccInst.head(20)

BaccalaureateInstitution
UNIVERSITY OF CALIFORNIA                      91
Massachusetts Institute of Technology         78
Stanford University                           42
University of Michigan Ann Arbor              38
Princeton University                          33
Cornell University                            31
University of Texas at Austin                 29
University of Chicago                         26
Yale University                               24
Columbia University                           23
Harvard University                            23
California Institute of Technology            22
Northwestern University                       22
University of Wisconsin-Madison               22
University of California-Davis                21
UNIVERSITY OF CALIFORNIA SAN DIEGO            20
UNIVERSITY OF WASHINGTON                      20
North Carolina State University               20
University of Illinois at Urbana-Champaign    20
University of Florida                       

### Did anyone who got a Bachelor's at USF get an award?

In [4]:
GRFP[GRFP['BaccalaureateInstitution'].str.contains("University of South Florida", case=False)]

Unnamed: 0,Last,First,BaccalaureateInstitution,Extra,Extra2,Extra3,Extra4
364,Costantino,Alexandria Nicole,University of South Florida,Physics and Astronomy - Particle Physics,University of California-Riverside,,
803,Icenhour,Daniel Gregory,University of South Florida,Chemistry - Chemistry of Life Processes,University of South Florida,,
881,Kearney,Kalyn Marie,University of South Florida,Engineering - Biomedical Engineering,University of Florida,,
1286,Noble,Mark Alan,University of South Florida,Life Sciences - Genetics,Yale University,,
1535,Schlafly,Millicent K,University of South Florida,Engineering - Mechanical Engineering,Northwestern University,,


### Did anyone who is currently at USF get an award?
(This will also include those who got bachelor's from USF)

In [5]:
GRFP[GRFP.apply(lambda row: row.astype(str).str.contains('University of South Florida', case=False).any(), axis=1)]

Unnamed: 0,Last,First,BaccalaureateInstitution,Extra,Extra2,Extra3,Extra4
364,Costantino,Alexandria Nicole,University of South Florida,Physics and Astronomy - Particle Physics,University of California-Riverside,,
803,Icenhour,Daniel Gregory,University of South Florida,Chemistry - Chemistry of Life Processes,University of South Florida,,
881,Kearney,Kalyn Marie,University of South Florida,Engineering - Biomedical Engineering,University of Florida,,
1286,Noble,Mark Alan,University of South Florida,Life Sciences - Genetics,Yale University,,
1535,Schlafly,Millicent K,University of South Florida,Engineering - Mechanical Engineering,Northwestern University,,


### Let's look at just Florida institutions

In [6]:
GRFP_byBaccFLInst = GRFP[GRFP['BaccalaureateInstitution'].str.contains("Florida")].groupby('BaccalaureateInstitution').size().sort_values(ascending=False)
GRFP_byBaccFLInst

BaccalaureateInstitution
University of Florida               20
University of Central Florida        7
Florida International University     5
University of South Florida          5
Florida State University             4
Florida Atlantic University          1
Florida Gulf Coast University        1
Florida Southern College             1
dtype: int64

### Which programs at UF are producing lots of grads who receive awards?

In [7]:
GRFP[GRFP['BaccalaureateInstitution'].str.contains("University of Florida", case=False)]

Unnamed: 0,Last,First,BaccalaureateInstitution,Extra,Extra2,Extra3,Extra4
63,Atassi,Amalie,University of Florida,Engineering - Materials Engineering,University of Florida,,
169,Boucaud,Nayah Mesci,University of Florida,STEM Education and Learning Research - Technol...,University of Florida,,
267,Ceron,Steven,University of Florida,Engineering - Mechanical Engineering,CORNELL UNIVERSITY,,
340,Cohen,Max Harrison,University of Florida,Engineering - Mechanical Engineering,University of Florida,,
375,Crowell,Anne Denise,University of Florida,Engineering - Chemical Engineering,University of Florida,,
441,Dill,Michele N,University of Florida,Engineering - Materials Engineering,University of Florida,,
492,Elliott,Kiona Rajene,University of Florida,Life Sciences - Microbial Biology,Washington University,,
509,Ewing,Jacob,University of Florida,Materials Research - Materials,other (specify) - Magnetic Materials,University of Florida,
515,Famiglietti,Jack,University of Florida,Engineering - Mechanical Engineering,University of Florida,,
682,Guevara,Maria Valentina,University of Florida,Engineering - Biomedical Engineering,University of Florida,,


## Remember, for 2021 (due in Oct 2020) they had priority areas which were "Artificial Intelligence, Quantum Information Science, and Computationally Intensive Research"
https://www.nsf.gov/pubs/2020/nsf20587/nsf20587.htm

https://www.nature.com/articles/d41586-020-02272-x

We might expect that this would lead to more awardees in certain types of departments. How many applicants have "engineering" in their discipline, baccalaureate, or current institution name?

You can rerun this notebook again to get results for 2019 and 2020, but here are the three years of results:

For 2021, it's 658 out of 2074 awardees.

For 2020, it's 655 out of 2076 awardees.

For 2019, it's 605 out of 2052 awardees.

In [8]:
GRFP[GRFP.apply(lambda row: row.astype(str).str.contains('Engineering', case=False).any(), axis=1)].count()

Last                        605
First                       605
BaccalaureateInstitution    605
Extra                       605
Extra2                      577
Extra3                       43
Extra4                       17
dtype: int64

In [9]:
GRFP.count() #total applicants

Last                        2052
First                       2052
BaccalaureateInstitution    2052
Extra                       2052
Extra2                      1875
Extra3                       275
Extra4                       134
dtype: int64

##### What about for "Computer"?
You can rerun this notebook again to get results for 2019 and 2020, but here are the three years of results:

For 2021, it's 66 out of 2074 awardees.

For 2020, it's 67 out of 2076 awardees.

For 2019, it's 65 out of 2052 awardees.

In [10]:
GRFP[GRFP.apply(lambda row: row.astype(str).str.contains('Computer', case=False).any(), axis=1)]

Unnamed: 0,Last,First,BaccalaureateInstitution,Extra,Extra2,Extra3,Extra4
89,Barowsky,Madeleine Anna,Wellesley College,Comp/IS/Eng - Computer Security and Privacy,,,
113,Benmalek,Ryan Y,UNIVERSITY OF WASHINGTON,Comp/IS/Eng - other (specify) - Computer Visio...,CORNELL UNIVERSITY,,
145,Blanco,Mark,Rensselaer Polytechnic Institute,Engineering - Computer Engineering,Carnegie-Mellon Institute of Technology,,
168,Boubin,Jayson Gerard,MIAMI UNIVERSITY,Comp/IS/Eng - Computer Systems and Embedded Sy...,OHIO STATE UNIVERSITY,,
226,Cabrera,�ngel Alexander,Georgia Institute of Technology,Comp/IS/Eng - Human Computer Interaction,Georgia Institute of Technology,,
...,...,...,...,...,...,...,...
1971,Wu,Shiyao Philipp,UNIVERSITY OF CALIFORNIA,BERKELEY,Comp/IS/Eng - Robotics and Computer Vision,UNIVERSITY OF CALIFORNIA,BERKELEY
1976,Xie,Annie,UNIVERSITY OF CALIFORNIA,BERKELEY,Comp/IS/Eng - Robotics and Computer Vision,UNIVERSITY OF CALIFORNIA,BERKELEY
1992,Yang,William,University of Michigan Ann Arbor,Comp/IS/Eng - Robotics and Computer Vision,University of Michigan Ann Arbor,,
2029,Zhang,Yongqi,University of Massachusetts Boston,Comp/IS/Eng - Human Computer Interaction,University of Massachusetts Boston,,


In [11]:
GRFP[GRFP.apply(lambda row: row.astype(str).str.contains('Computer', case=False).any(), axis=1)].count()

Last                        65
First                       65
BaccalaureateInstitution    65
Extra                       65
Extra2                      60
Extra3                       6
Extra4                       3
dtype: int64

### I see no significant difference in the number awardees in engineering and computer science in 2021 relative to 2020 and 2019.  

### Now let's look only at geosciences:
There were 99 awards total in Geosciences in 2021.

In [18]:
GRFP_Geo = GRFP[GRFP.apply(lambda row: row.astype(str).str.contains('Geosciences -', case=False).any(), axis=1)]
GRFP_Geo.count()

Last                        95
First                       95
BaccalaureateInstitution    95
Extra                       95
Extra2                      78
Extra3                       5
Extra4                       0
dtype: int64

In [19]:
GRFP_Geo_byBaccInst = GRFP_Geo.groupby('BaccalaureateInstitution').size().sort_values(ascending=False)
GRFP_Geo_byBaccInst.head(20)

BaccalaureateInstitution
Massachusetts Institute of Technology      3
Bowdoin College                            3
University of California-Santa Cruz        3
University of Texas at Austin              3
Duke University                            2
Wesleyan University                        2
California Polytechnic State University    2
Scripps College                            2
University of California-Irvine            2
UNIVERSITY OF CALIFORNIA                   2
Columbia University                        2
Stanford University                        2
University of Delaware                     2
University of Colorado at Boulder          1
University of Chicago                      1
University of Connecticut                  1
University of California-Riverside         1
Boston College                             1
University of California-Davis             1
University of California-Berkeley          1
dtype: int64

### How many awards in Oceanography?
In 2021, there were 15, and only 2 in chemical oceanography.

In 2020, there were 17, and 6 in chemical oceanography.

In 2019, there were 13, and 4 in chemical oceanography.

In [20]:
GRFP_Geo[GRFP_Geo.apply(lambda row: row.astype(str).str.contains('Oceanography', case=False).any(), axis=1)]

Unnamed: 0,Last,First,BaccalaureateInstitution,Extra,Extra2,Extra3,Extra4
21,Aleman,Melody,Millersville University,Geosciences - Biological Oceanography,University of Southern California,,
36,Amaral,Vinicius Jos�,Princeton University,Geosciences - Chemical Oceanography,University of California-Santa Cruz,,
158,Boles,Elisabeth Lara,Massachusetts Institute of Technology,Geosciences - Physical Oceanography,,,
333,Clay,Jacinta M,Brown University,Geosciences - Physical Oceanography,Brown University,,
584,Gangrade,Shailja,University of Delaware,Geosciences - Biological Oceanography,University of Delaware,,
1013,Lepori-Bui,Michelle,University of Delaware,Geosciences - Biological Oceanography,,,
1033,Light,Tricia Mitchelle,Scripps College,Geosciences - Chemical Oceanography,University of California-San Diego Scripps Ins...,,
1035,Lim,Stephanie Michelle,Scripps College,Geosciences - Biological Oceanography,,,
1292,Nowakowski,Catherine Gibson,University of Connecticut,Geosciences - Biological Oceanography,University of Rhode Island,,
1382,Pitt,Jordan Avery,SUNY College of Environmental Science and Fore...,Geosciences - Biological Oceanography,Massachusetts Institute of Technology,,


In [15]:
GRFP_Geo[GRFP_Geo.apply(lambda row: row.astype(str).str.contains('Oceanography', case=False).any(), axis=1)].count()

Last                        13
First                       13
BaccalaureateInstitution    13
Extra                       13
Extra2                       9
Extra3                       0
Extra4                       0
dtype: int64

## How many in Marine Biology?
In 2021, there were 12.

In 2020, there were 10.

In 2019, there were 12.

In [16]:
GRFP_Geo[GRFP_Geo.apply(lambda row: row.astype(str).str.contains('Marine Biology', case=False).any(), axis=1)]

Unnamed: 0,Last,First,BaccalaureateInstitution,Extra,Extra2,Extra3,Extra4
100,Becker,Danielle M,Eckerd College,Geosciences - Marine Biology,CALIFORNIA STATE UNIVERSITY,NORTHRIDGE,
114,Benson,Brooke E,University of North Carolina at Chapel Hill,Geosciences - Marine Biology,,,
274,Chamorro,Jannine Danielle,University of California-Davis,Geosciences - Marine Biology,University of California-Santa Barbara,,
328,Clare,Xochitl,University of California-Santa Cruz,Geosciences - Marine Biology,University of California-Santa Barbara,,
352,Cones,Seth Frederick,Ohio University,Geosciences - Marine Biology,,,
445,Dinh,Jason Phanliem,Duke University,Geosciences - Marine Biology,Duke University,,
647,Goodman,Maurice Codespoti,California Polytechnic State University,Geosciences - Marine Biology,Stanford University,,
952,LaBua,Savannah Marie,STONY BROOK UNIVERSITY,Geosciences - Marine Biology,,,
1226,Munoz,Josefa Marie,University of Guam,Geosciences - Marine Biology,,,
1413,Quinlan,Zachary Abraham-Divino,University of Hawaii,Geosciences - Marine Biology,San Diego State University Foundation,,


In [17]:
GRFP_Geo[GRFP_Geo.apply(lambda row: row.astype(str).str.contains('Marine Biology', case=False).any(), axis=1)].count()

Last                        12
First                       12
BaccalaureateInstitution    12
Extra                       12
Extra2                       7
Extra3                       1
Extra4                       0
dtype: int64