Given my file of the top 100 (currently 51) universities and the date of action of their COVID guidelines, I want to find out the COVID situation at each of the schools within the entire time period of the pandemic, starting at the earliest cases and extending until now. Note that if I want to extend my analysis to COVID guidelines beyond the Spring 2022 move online, I will need to re-download the NYT data. However, I don't anticipate doing this anytime soon, so I can just use the local version, as its March and the COVID data I care about ended in January, when the last decisions I tracked were made.

In [1]:
import numpy as np, pandas as pd, matplotlib.pyplot as plt

Get the zip codes for the top 100 universities from my data.

In [14]:
university_covid_dates = pd.read_excel("university_covid_dates.xlsx")

In [90]:
university_zips = university_covid_dates[["Unofficial Ranking", "zip"]]

Using [NYT us-county data](https://github.com/nytimes/covid-19-data/blob/master/us-counties.csv):

In [76]:
covid_data_county = pd.read_csv("us-counties-covid-nyt.csv")
covid_data_county.head()

Unnamed: 0,date,county,state,fips,cases,deaths
0,2020-01-21,Snohomish,Washington,53061.0,1,0.0
1,2020-01-22,Snohomish,Washington,53061.0,1,0.0
2,2020-01-23,Snohomish,Washington,53061.0,1,0.0
3,2020-01-24,Cook,Illinois,17031.0,1,0.0
4,2020-01-24,Snohomish,Washington,53061.0,1,0.0


Using [zip-county-fips data](https://www.kaggle.com/danofer/zipcodes-county-fips-crosswalk/version/1):

In [77]:
zcf_data = pd.read_csv("zip-county-fips/ZIP-COUNTY-FIPS_2017-06.csv")
zcf_data.head()

Unnamed: 0,ZIP,COUNTYNAME,STATE,STCOUNTYFP,CLASSFP
0,36003,Autauga County,AL,1001,H1
1,36006,Autauga County,AL,1001,H1
2,36067,Autauga County,AL,1001,H1
3,36066,Autauga County,AL,1001,H1
4,36703,Autauga County,AL,1001,H1


### Find counties from zip codes
Note that some zip codes can span counties. However, we will just use whatever data is in the file as we are only looking for a rough approximation of nearby COVID cases.

In [93]:
university_zcf = university_zips.merge(zcf_data, left_on="zip", right_on="ZIP", how="left")

Check if any missing zip data.

In [103]:
university_zcf["zip"].isna().sum()

0

No, so we can move on.

### Find covid data from county

In [99]:
university_covid = university_zcf.merge(covid_data_county, left_on="STCOUNTYFP", right_on="fips", how="left")

Simplify; delete unnecessary columns and set index as the unique index ``unofficial_rank``.

In [109]:
university_covid_simple = university_covid[["Unofficial Ranking", "date", "cases", "deaths"]]

In [113]:
university_covid_simple = university_covid_simple.rename(columns={"Unofficial Ranking": "unofficial_rank"})

Save to csv.

In [114]:
university_covid_simple.to_csv("university_county_covid")