### Jupyter notebook to distill GRFP Awardee results

I made this notebook because I was curious about the distribution of GRFP awards by institution and discipline and I'm currently learning Python and git. If you have suggestions or changes, please feel free to provide feedback.

The output is all below, but if you wish to rerun it yourself (perhaps with a different year), then make your changes and press the "Run" button above to step through each cell. Press the "fast forward" button to rerun the whole notebook from start to finish.

In [1]:
import pandas as pd

In [2]:
year = 2021 #Change the year here to any year 2011-2021 and rerun the code 
data_dir = 'data/'
GRFP = pd.read_csv(data_dir + str(year) + 'AwardeeList.do', sep='\t')
GRFP.head() # preview the head of the GRFP dataframe just created

Unnamed: 0,Name,Baccalaureate Institution,Field of Study,Current Institution
0,"Abdo, Emily Eugenia",Princeton University,Engineering - Chemical Engineering,
1,"Abed, Ahmad Matar",University of Puerto Rico at Humacao,Materials Research - Electronic Materials,University of Michigan - Ann Arbor
2,"Abel, Charlotte magalie",California Polytechnic State University,Social Sciences - Sociology,University of California-Los Angeles
3,"Abellera, Marriah",University of California-Santa Barbara,Engineering - Environmental Engineering,
4,"Abubakare, Oluwatobi",University of Rochester,Psychology - Social Psychology,Harvard University


### How many awardees this year?

In [3]:
Awardees = GRFP['Name'].count() #total awardees
Awardees

2074

### Which Baccalaureate Institutions had the most awardees?
Note that this call includes engineering, which seems to swamp the other disciplines.

In [4]:
GRFP_byBaccInst = GRFP.groupby('Baccalaureate Institution').size().sort_values(ascending=False)
GRFP_byBaccInst.head(20)

Baccalaureate Institution
UNIVERSITY OF CALIFORNIA, BERKELEY                   64
Massachusetts Institute of Technology                61
Stanford University                                  42
Georgia Institute of Technology                      33
University of Chicago                                33
Yale University                                      31
Cornell University                                   29
University of Texas at Austin                        29
Regents of the University of Michigan - Ann Arbor    29
University of Florida                                28
UNIVERSITY OF WASHINGTON                             27
Harvard University                                   27
Princeton University                                 26
Brown University                                     25
Columbia University                                  25
University of Minnesota-Twin Cities                  22
Northwestern University                              21
William Marsh Rice Uni

### Which Current Institutions had the most awardees?

In [5]:
GRFP_byBaccInst = GRFP.groupby('Current Institution').size().sort_values(ascending=False)
GRFP_byBaccInst.head(20)

Current Institution
Stanford University                                  81
Massachusetts Institute of Technology                76
UNIVERSITY OF CALIFORNIA, BERKELEY                   70
Georgia Institute of Technology                      44
Harvard University                                   43
Columbia University                                  39
University of Colorado at Boulder                    38
Princeton University                                 34
Regents of the University of Michigan - Ann Arbor    33
University of Texas at Austin                        33
University of California-Los Angeles                 31
Northwestern University                              30
Cornell University                                   29
California Institute of Technology                   29
UNIVERSITY OF WASHINGTON                             29
Carnegie-Mellon University                           27
University of Illinois at Urbana-Champaign           26
University of California-Irv

### Did anyone who got a Bachelor's at USF get an award?

In [6]:
GRFP[GRFP['Baccalaureate Institution'].str.contains("University of South Florida", case=False, na=False)]

Unnamed: 0,Name,Baccalaureate Institution,Field of Study,Current Institution
1984,"Withers, Zachary Hoyt",University of South Florida,Physics and Astronomy - Solid State Physics,University of South Florida


### Did anyone who is currently at USF get an award?

In [7]:
GRFP[GRFP['Current Institution'].str.contains("University of South Florida", case=False, na=False)]

Unnamed: 0,Name,Baccalaureate Institution,Field of Study,Current Institution
1984,"Withers, Zachary Hoyt",University of South Florida,Physics and Astronomy - Solid State Physics,University of South Florida


### Let's look at just Florida institutions

In [8]:
GRFP_byBaccFLInst = GRFP[GRFP['Baccalaureate Institution'].str.contains("Florida")].groupby('Baccalaureate Institution').size().sort_values(ascending=False)
GRFP_byBaccFLInst

Baccalaureate Institution
University of Florida                                  28
Florida International University                        9
New College of Florida                                  5
University of Central Florida                           4
Florida Atlantic University                             2
Florida Gulf Coast University                           2
Florida State University                                2
Florida Agricultural and Mechanical University          1
Florida Southern College                                1
The University of Central Florida Board of Trustees     1
University of North Florida                             1
University of South Florida                             1
dtype: int64

### Which programs at UF are producing lots of grads who receive awards?

In [9]:
GRFP[GRFP['Baccalaureate Institution'].str.contains("University of Florida", case=False, na=False)]

Unnamed: 0,Name,Baccalaureate Institution,Field of Study,Current Institution
29,"Allen, Anthony",University of Florida,Engineering - Aeronautical and Aerospace Engin...,University of Florida
31,"Alomar, Nathalie Marie",University of Florida,Life Sciences - Ecology,University of Florida
68,"Astrab, Leilani",University of Florida,Engineering - Biomedical Engineering,UNIVERSITY OF VIRGINIA
106,"Beaudry, David",University of Florida,Engineering - Materials Engineering,Johns Hopkins University
205,"Buckner, Samuel Clark",University of Florida,Engineering - Aeronautical and Aerospace Engin...,University of Florida
251,"Carroll, Katherine Caprice",University of Florida,Life Sciences - Ecology,University of Florida
424,"Diaz, Maximillian",University of Florida,Engineering - Biomedical Engineering,University of Florida
488,"Elie, Anne-Ketura",University of Florida,Psychology - Social Psychology,
538,"Ficarrotta, Joseph Michael",University of Florida,Engineering - Biomedical Engineering,UNIVERSITY OF VIRGINIA
641,"Gonzalez, Natalia Pilar",University of Florida,Engineering - Mechanical Engineering,University of Florida


## Remember, for 2021 (due in Oct 2020) they had priority areas which were "Artificial Intelligence, Quantum Information Science, and Computationally Intensive Research"
https://www.nsf.gov/pubs/2020/nsf20587/nsf20587.htm

https://www.nature.com/articles/d41586-020-02272-x

We might expect that this would lead to more awardees in certain types of departments. How many applicants have "engineering" in their Field of Study?

You can rerun this notebook again to get results for other years, but here are the last three years of results:

For 2021, it's 32% of awardees.

For 2020, it's 32% of awardees.

For 2019, it's 29% of awardees.

In [10]:
GRFP_engin = GRFP[GRFP['Field of Study'].str.contains("Engineering", case=False, na=False)]
GRFP_engin['Name'].count() / Awardees * 100

31.58148505303761

### What is the most common engineering program?

In [11]:
GRFP_engin_byField = GRFP_engin.groupby('Field of Study').size().sort_values(ascending=False)
GRFP_engin_byField

Field of Study
Engineering - Biomedical Engineering                                         128
Engineering - Mechanical Engineering                                         116
Engineering - Chemical Engineering                                            99
Engineering - Bioengineering                                                  60
Engineering - Electrical and Electronic Engineering                           49
Engineering - Aeronautical and Aerospace Engineering                          45
Engineering - Environmental Engineering                                       38
Engineering - Materials Engineering                                           30
Engineering - Civil Engineering                                               23
Comp/IS/Eng - Computational Science and Engineering                            8
Engineering - Artificial Intelligence                                          7
Engineering - Computer Engineering                                             6
Engineering -

##### What about for Computer Science? This is Field of Study is called "Comp/IS/Eng"
You can rerun this notebook again to get results for 2019 and 2020, but here are the three years of results:

For 2021, it's 7.4% of awardees.

For 2020, it's 7.7% awardees.

For 2019, it's 7.6% awardees.

In [12]:
GRFP_comput = GRFP[GRFP['Field of Study'].str.contains("Comp/", case=False, na=False)]
GRFP_comput['Name'].count() / Awardees * 100

7.425265188042431

In [13]:
GRFP_comput_byField = GRFP_comput.groupby('Field of Study').size().sort_values(ascending=False)
GRFP_comput_byField

Field of Study
Comp/IS/Eng - Artificial Intelligence                                               22
Comp/IS/Eng - Robotics and Computer Vision                                          21
Comp/IS/Eng - Human Computer Interaction                                            17
Comp/IS/Eng - Machine Learning                                                      17
Comp/IS/Eng - Computer Security and Privacy                                         11
Comp/IS/Eng - Algorithms and Theoretical Foundations                                11
Comp/IS/Eng - Bioinformatics and other Informatics                                  11
Comp/IS/Eng - Computational Science and Engineering                                  8
Comp/IS/Eng - Natural Language Processing                                            8
Comp/IS/Eng - Formal Methods, Verification, and Programming Languages                4
Comp/IS/Eng - Software Engineering                                                   3
Comp/IS/Eng - Computer Syste

### What about "computationally intensive"?
It appears as though there was a separate cartegory for Computationally intensive Research and there were 41 such awards in 2021 and 4 of them are in geosciences.

In [14]:
GRFP_CompIntensive = GRFP[GRFP['Field of Study'].str.contains("Computationally Intensive", case=False, na=False)]
GRFP_CompIntensive

Unnamed: 0,Name,Baccalaureate Institution,Field of Study,Current Institution
95,"Bartholomay, Kathryn",North Dakota State University Fargo,Life Sciences - Computationally Intensive Rese...,University of Colorado at Denver
114,"Bell, Tolson Hallauer",Georgia Institute of Technology,Mathematical Sciences - Computationally Intens...,Georgia Institute of Technology
265,"Cech, Lauren",University of California-Davis,Life Sciences - Computationally Intensive Rese...,"UNIVERSITY OF CALIFORNIA, SAN FRANCISCO"
322,"Chung, Maya Victoria","HARVARD COLLEGE, PRESIDENT & FELLOWS OF",Geosciences - Computationally Intensive Research,Princeton University
362,"Corsetti, Sabrina Maria",Regents of the University of Michigan - Ann Arbor,Physics and Astronomy - Computationally Intens...,Regents of the University of Michigan - Ann Arbor
396,"Davis, Kelcey Susan",SAN DIEGO STATE UNIVERSITY,Physics and Astronomy - Computationally Intens...,SAN DIEGO STATE UNIVERSITY
423,"Di Domenico, Nicolle Taylor",Kent State University,Geosciences - Computationally Intensive Research,
458,"Dudek, Max Franklin",University of Pittsburgh,Life Sciences - Computationally Intensive Rese...,University of Pittsburgh
468,"Dutta, Mayanka Sophia",University of Chicago,Engineering - Computationally Intensive Research,
532,"Fereidooni, Saman",Yale University,Psychology - Computationally Intensive Research,Yale University


In [15]:
GRFP_GeoCompIntensive = GRFP[GRFP['Field of Study'].str.contains("Geosciences - Computationally Intensive", case=False, na=False)]
GRFP_GeoCompIntensive

Unnamed: 0,Name,Baccalaureate Institution,Field of Study,Current Institution
322,"Chung, Maya Victoria","HARVARD COLLEGE, PRESIDENT & FELLOWS OF",Geosciences - Computationally Intensive Research,Princeton University
423,"Di Domenico, Nicolle Taylor",Kent State University,Geosciences - Computationally Intensive Research,
1167,"Marrs, Ian James",Indiana University-Purdue University Indianapolis,Geosciences - Computationally Intensive Research,Indiana University-Purdue University Indianapolis
1723,"Spero, Hannah Rose",Boise State University,Geosciences - Computationally Intensive Research,Boise State University


In [16]:
GRFP_CompIntensive['Name'].count() / Awardees * 100

1.9768563162970105

In [17]:
GRFP_CompIntensive_byField = GRFP_CompIntensive.groupby('Field of Study').size().sort_values(ascending=False)
GRFP_CompIntensive_byField

Field of Study
Life Sciences - Computationally Intensive Research                           12
Chemistry - Computationally Intensive Research                                5
Engineering - Computationally Intensive Research                              5
Geosciences - Computationally Intensive Research                              4
Materials Research - Computationally Intensive Research                       4
Physics and Astronomy - Computationally Intensive Research                    3
Psychology - Computationally Intensive Research                               3
Mathematical Sciences - Computationally Intensive Research                    2
Comp/IS/Eng - Computationally Intensive Research                              1
STEM Education and Learning Research - Computationally Intensive Research     1
Social Sciences - Computationally Intensive Research                          1
dtype: int64

### Which current institutions received these "Computationally Intensive" awards?

In [18]:
GRFP_CompIntensive_byCurrInst = GRFP_CompIntensive.groupby('Current Institution').size().sort_values(ascending=False)
GRFP_CompIntensive_byCurrInst

Current Institution
Georgia Institute of Technology                        3
University of Pittsburgh                               3
Northwestern University                                3
Yale University                                        2
Princeton University                                   2
Regents of the University of Michigan - Ann Arbor      2
UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL            1
Virginia Polytechnic Institute and State University    1
University of Vermont & State Agricultural College     1
University of Utah                                     1
University of Notre Dame                               1
University of Iowa                                     1
University of Illinois at Chicago                      1
University of Colorado at Denver                       1
University of California-Los Angeles                   1
University of California - Merced                      1
Boise State University                                 1
UNIVERSITY 

### What about Artifical Intelligence?
2.7% of awardees in 2021

In [33]:
GRFP_AI = GRFP[GRFP['Field of Study'].str.contains("- Artificial Intelligence", case=False, na=False)]
GRFP_AI

Unnamed: 0,Name,Baccalaureate Institution,Field of Study,Current Institution
7,"Adair, Amy Elise",Louisiana State University,STEM Education and Learning Research - Artific...,Rutgers University New Brunswick
16,"Ahmed, Ahmed M",Stanford University,Comp/IS/Eng - Artificial Intelligence,Stanford University
138,"Billings, Wendy",Brigham Young University,Chemistry - Artificial Intelligence,Brigham Young University
164,"Boppana, Avinash",Princeton University,Life Sciences - Artificial Intelligence,Princeton University
274,"Chandra, Kartik",Stanford University,Comp/IS/Eng - Artificial Intelligence,Stanford University
281,"Chaudhari, Shreyas",The Ohio State University/CETE,Engineering - Artificial Intelligence,Carnegie-Mellon University
288,"Chen, Annie Shao",Stanford University,Comp/IS/Eng - Artificial Intelligence,Stanford University
301,"Chiel, Joshua",Case Western Reserve University,Physics and Astronomy - Artificial Intelligence,UNIVERSITY OF MARYLAND
317,"Christianson, Nicolas Henry",Harvard University,Mathematical Sciences - Artificial Intelligence,California Institute of Technology
323,"Chung, Heejung Wang",Stanford University,Materials Research - Artificial Intelligence,Stanford University


In [34]:
GRFP_AI['Name'].count() / Awardees * 100

2.74831243972999

### What about Quantum Information Science?
1.4% of awardees in 2021.

In [35]:
GRFP_QIS = GRFP[GRFP['Field of Study'].str.contains("- Quantum Information Science", case=False, na=False)]
GRFP_QIS

Unnamed: 0,Name,Baccalaureate Institution,Field of Study,Current Institution
50,"Anikeeva, Nina",Stanford University,Physics and Astronomy - Quantum Information Sc...,Stanford University
54,"Apelian, Arsineh",University of California-Irvine,Materials Research - Quantum Information Science,
90,"Barney, Richard Dean",Brigham Young University,Physics and Astronomy - Quantum Information Sc...,University of Maryland
94,"Bart, Manon",Louisiana State University,Physics and Astronomy - Quantum Information Sc...,Tulane University
280,"Chattopadhyay, Sambuddha",Harvard University,Physics and Astronomy - Quantum Information Sc...,Harvard University
314,"Chowdhury, Shoumik Dutta",Yale University,Physics and Astronomy - Quantum Information Sc...,Yale University
325,"Cieszynski, Mari R",University of Wisconsin-Madison,Physics and Astronomy - Quantum Information Sc...,University of Illinois at Urbana-Champaign
329,"Clayton, Connor Bennett",Carnegie-Mellon University,Comp/IS/Eng - Quantum Information Science,Carnegie-Mellon University
390,"Dallas, Jax Dylan",University of Mississippi,Chemistry - Quantum Information Science,University of Mississippi
427,"Dickerson, Claire Elizabeth",University of Rochester,Chemistry - Quantum Information Science,University of California-Los Angeles


In [36]:
GRFP_QIS['Name'].count() / Awardees * 100

1.398264223722276

### I see no significant difference in the number awardees in engineering and computer science in 2021 relative to 2020 and 2019, but there were new "Fields of study" for "Computationally intensive Research", "Artificaly Intelligence", and "Quantum Information Science" which totaled to 6.1% of the awards.

### Now let's look only at geosciences:

In [19]:
GRFP_Geo = GRFP[GRFP['Field of Study'].str.contains("Geosciences -", case=False, na=False)]
GeoAwardees = GRFP_Geo['Name'].count()
GRFP_Geo

Unnamed: 0,Name,Baccalaureate Institution,Field of Study,Current Institution
14,"Aguilar, Jerod",University of Oregon Eugene,Geosciences - Geochemistry,University of Oregon Eugene
18,"Aikin, Nicole Marie",University of California-Santa Barbara,Geosciences - Petrology,
40,"Anadu, Joshua Somtochukwu",Oklahoma State University,Geosciences - Geobiology,Oklahoma State University
108,"Beethe, Sarah Marie","GEORGE WASHINGTON UNIVERSITY, THE",Geosciences - Other (specify) - Volcanology,"GEORGE WASHINGTON UNIVERSITY, THE"
159,"Bonan, David B",University of Washington,Geosciences - Physical Oceanography,California Institute of Technology
...,...,...,...,...
1880,"Vejar, Manuel Rafael",California State Polytechnic University-Pomona,Geosciences - Other (specify) - Actinide geoch...,University of Notre Dame
1881,"Velazquez, Diana",Northwestern University,Geosciences - Other (specify) - Biogeochemistry,Northwestern University
1919,"Wang, Terrance","UNIVERSITY OF CALIFORNIA, BERKELEY",Geosciences - Marine Biology,
1932,"Warren, Treasure A",University of California-Davis,Geosciences - Chemical Oceanography,


In [20]:
GRFP_Geo_byBaccInst = GRFP_Geo.groupby('Baccalaureate Institution').size().sort_values(ascending=False)
GRFP_Geo_byBaccInst.head(20)

Baccalaureate Institution
UNIVERSITY OF CALIFORNIA, BERKELEY    3
William Marsh Rice University         2
Florida International University      2
Princeton University                  2
UNIVERSITY OF WASHINGTON              2
Northwestern University               2
University of California-Berkeley     2
University of California-Davis        2
Hampton University                    2
Harvard University                    2
Stanford University                   2
California Institute of Technology    2
Carleton College                      2
University of Texas at Austin         2
UNIVERSITY OF VIRGINIA                1
University of California - Merced     1
University of Arizona                 1
University of  Puget Sound            1
University of Texas at Dallas         1
UNIVERSITY OF MIAMI                   1
dtype: int64

In [21]:
GRFP_Geo_byCurrInst = GRFP_Geo.groupby('Current Institution').size().sort_values(ascending=False)
GRFP_Geo_byCurrInst.head(20)

Current Institution
Massachusetts Institute of Technology       4
University of California-Santa Barbara      3
Columbia University                         3
University of Colorado at Boulder           3
Oregon State University                     3
University of California-Santa Cruz         3
Montana State University                    2
University of California-Los Angeles        2
University of California-Irvine             2
UNIVERSITY OF WASHINGTON                    2
Boise State University                      2
UNIVERSITY OF CALIFORNIA, BERKELEY          2
Princeton University                        2
Arizona State University                    2
Northwestern University                     2
California Institute of Technology          2
University of Notre Dame                    2
University of South Carolina at Columbia    1
Vanderbilt University                       1
University of Wyoming                       1
dtype: int64

### How many awards in Oceanography?
In 2021, there were 15 (0.7%), and only 2 in chemical oceanography.

In 2020, there were 17 (0.8%), and 6 in chemical oceanography.

In 2019, there were 13 (0.6%), and 4 in chemical oceanography.

In [22]:
GRFP_Oce = GRFP[GRFP['Field of Study'].str.contains("oceanography", case=False, na=False)]
GRFP_Oce['Name'].count() / Awardees * 100

0.7232401157184185

In [23]:
GRFP[GRFP['Field of Study'].str.contains("Chemical Oceanography", case=False, na=False)]

Unnamed: 0,Name,Baccalaureate Institution,Field of Study,Current Institution
690,"Gunnells, Shelby Ann",North Dakota State University Fargo,Geosciences - Chemical Oceanography,North Dakota State University Fargo
1932,"Warren, Treasure A",University of California-Davis,Geosciences - Chemical Oceanography,


## How many in Marine Biology?
In 2021, there were 12 (0.6% of awardees).

In 2020, there were 10 (0.5% of awardees).

In 2019, there were 12 (0.6% of awardees).

In [24]:
GRFP_MarBio = GRFP[GRFP['Field of Study'].str.contains("Marine Biology", case=False, na=False)]
GRFP_MarBio['Name'].count() / Awardees * 100

0.5785920925747349

In [25]:
GRFP_MarBio

Unnamed: 0,Name,Baccalaureate Institution,Field of Study,Current Institution
218,"Bushnell, Elizabeth Josephine",University of San Diego,Geosciences - Marine Biology,
230,"Caldwell, Aliya Everest",Rutgers University New Brunswick,Geosciences - Marine Biology,University of New Hampshire
341,"Collins, Stormie Blayze",Florida International University,Geosciences - Marine Biology,
350,"Cook McNab, Aimee Arielle",Texas A&M University at Galveston,Geosciences - Marine Biology,
531,"Fenwick, Ileana Faye",Hampton University,Geosciences - Marine Biology,UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL
844,"Jarman, Cheyenne Nicole",University of California-Santa Cruz,Geosciences - Marine Biology,Oregon State University
862,"Johnson, Carter",UNIVERSITY OF WASHINGTON,Geosciences - Marine Biology,
1218,"McLean, Josette Elena Trisha",St. George's University,Geosciences - Marine Biology,Hampton University
1273,"Moore, Malia Leilani",University of California-Berkeley,Geosciences - Marine Biology,UNIVERSITY OF CALIFORNIA SAN DIEGO
1332,"Nodal, Andrea Alejandra",Florida International University,Geosciences - Marine Biology,


### Maybe people in chemical oceanography applied under other disciplines? What if we search for just "chem"? This includes a ton of different disciplines.
315 (15.2%) in 2021

316 (15.2%) in 2020

312 (15.2%) in 2019


In [26]:
GRFP_chem = GRFP[GRFP['Field of Study'].str.contains("Chem", case=False, na=False)]
GRFP_chem_byField = GRFP_chem.groupby('Field of Study').size().sort_values(ascending=False)
GRFP_chem_byField

Field of Study
Engineering - Chemical Engineering                                                  99
Chemistry - Chemical Synthesis                                                      29
Chemistry - Chemistry of Life Processes                                             28
Life Sciences - Biochemistry                                                        26
Chemistry - Chemical Catalysis                                                      22
Chemistry - Macromolecular, Supramolecular, and Nanochemistry                       17
Materials Research - Chemistry of Materials                                         15
Chemistry - Chemical Structure, Dynamics, and Mechanism                             12
Chemistry - Chemical Theory, Models and Computational Methods                       11
Chemistry - Chemical Measurement and Imaging                                         8
Geosciences - Biogeochemistry                                                        7
Geosciences - Geochemistry  

In [27]:
GRFP_chem['Name'].count() / Awardees * 100

15.18804243008679

### What about Paleoclimate (excluding paleontology and paleobiology)?
8 in 2021

4 in 2020

2 in 2019

In [28]:
GRFP_paleo = GRFP[GRFP['Field of Study'].str.contains("Paleoclimate", case=False, na=False)]
GRFP_paleo_byField = GRFP_paleo.groupby('Field of Study').size().sort_values(ascending=False)
GRFP_paleo_byField

Field of Study
Geosciences - Paleoclimate    8
dtype: int64