<h3>Atmospheric Science Faculty by Alma Mater</h3>

<ul>
  <li><b>Background</b>: <a href="https://www.nature.com/articles/s41586-022-05222-x">https://www.nature.com/articles/s41586-022-05222-x</a></li>
  <li><b>Data</b>: <a href="https://github.com/LarremoreLab/us-faculty-hiring-networks">https://github.com/LarremoreLab/us-faculty-hiring-networks</a></li>
</ul>

Information regarding how the data were collected, the period over which they are valid, etc. is provided in the <i>Nature</i> manuscript linked above. Information on how to read the data is provided in the GitHub repository linked above.
<hr>
We wish to read in the edge-lists.csv file. When subset for the Field (TaxonomyLevel) of Atmospheric Sciences and Meteorology (TaxonomyValue), these data indicate the counts of faculty at each surveyed institution from a given alma mater; e.g., a Total of 2 for MIT from UWM would indicate that two faculty at UWM received their PhD from MIT.

<b>Note</b>: I did a spot check on a DegreeInstitutionName of 'University of Arizona,' which returned Purdue as having seven Arizona alumni and Arizona having six of their own alumni on the faculty. These counts, particularly for Arizona, only make sense if the broadest-possible definition of faculty is considered - meaning tenure-track and tenured faculty, research faculty, and instructional faculty. I also did a spot check on 'Florida State,' which returned Florida State as having eight of their own alumni on the faculty (which their website does not seem to support). The <i>Nature</i> paper indicates that only tenured and tenure-track faculty are included in their data, but I couldn't immediately reconcile the data under that definition.

In [12]:
import pandas as pd

# read the CSV file into a pandas DataFrame straight from GitHub
df = pd.read_csv('https://raw.githubusercontent.com/LarremoreLab/us-faculty-hiring-networks/main/data/edge-lists.csv')
print(df['TaxonomyValue'].unique())

# subset by the Field of Atmospheric Sciences and Meteorology
df = df[(df['TaxonomyLevel'] == 'Field')  &  (df['TaxonomyValue'] == 'Atmospheric Sciences and Meteorology')]

# get list of institutions represented in the data
insts = df['InstitutionName'].unique()
print(insts[0:])

# get list of all alma maters with at least one alum as a faculty member
almas = df['DegreeInstitutionName'].unique()

# create blank list to store alma maters and faculty counts (total, men, women)
data = []

# loop over each alma mater, create subset of only that alma mater, 
# and find sums of faculty from that alma mater across institutions
for alma in almas:
    dfsub = df[(df['DegreeInstitutionName'] == alma)]
    data.append([alma, dfsub['Total'].sum(), dfsub['Men'].sum(), dfsub['Women'].sum()])

# convert list to pandas DataFrame and sort
datadf = pd.DataFrame(data, columns=['university','count','men','women'])
datadf = datadf.sort_values('count', ascending=False)
print(datadf)

# some stats
print(datadf.sum())
print(datadf[0:50])

['Mathematics' 'Mathematics and Computing' 'Academia'
 'Biological Sciences' 'Natural Sciences' 'Statistics' 'Applied Sciences'
 'Psychology' 'Social Sciences' 'Civil Engineering' 'Engineering'
 'Environmental Engineering' 'Linguistics' 'Humanities'
 'Exercise Science, Kinesiology, Rehab, Health' 'Medicine and Health'
 'Chemistry' 'Biochemistry' 'Physics' 'Agricultural Engineering'
 'Chemical Engineering' 'Aerospace Engineering' 'Industrial Engineering'
 'Systems Engineering' 'Mechanical Engineering' 'Operations Research'
 'Management' 'Electrical Engineering' 'Computer Engineering'
 'Materials Engineering' 'Biomedical Engineering' 'Architecture'
 'Computer Science' 'Information Technology'
 'Management Information Systems' 'Marketing' 'Accounting' 'Finance'
 'Economics' 'Physiology' 'Information Science'
 'Atmospheric Sciences and Meteorology' 'Geology' 'Microbiology'
 'English Language and Literature' 'Music' 'Veterinary Medical Sciences'
 'Immunology' 'Cell Biology' 'Neuroscience' '

