# Analysis of Coronavirus Case Data for US by County
by Lauren Kwee
* Covid data by county from New York Times https://github.com/nytimes/covid-19-data downloaded on 08/13/2020
* County data from Wikipedia https://en.wikipedia.org/wiki/List_of_United_States_counties_by_per_capita_income downloaded on 08/13/2020

In [36]:
import pandas as pd

In [37]:
from matplotlib import pyplot as plt

In [38]:
covid_data = pd.read_csv("CovidDataByCounty.csv")

In [39]:
county_data = pd.read_csv("USCountiesbyPerCapitaIncome.csv")

Since covid_data is massive, I'm going to make a smaller file to test code with only Hawai'i's counties.

In [40]:
covid_data

Unnamed: 0,date,county,state,fips,cases,deaths
0,2020-01-21,Snohomish,Washington,53061.0,1,0
1,2020-01-22,Snohomish,Washington,53061.0,1,0
2,2020-01-23,Snohomish,Washington,53061.0,1,0
3,2020-01-24,Cook,Illinois,17031.0,1,0
4,2020-01-24,Snohomish,Washington,53061.0,1,0
...,...,...,...,...,...,...
327991,2020-07-12,Sweetwater,Wyoming,56037.0,138,0
327992,2020-07-12,Teton,Wyoming,56039.0,157,1
327993,2020-07-12,Uinta,Wyoming,56041.0,202,0
327994,2020-07-12,Washakie,Wyoming,56043.0,42,5


In [41]:
hi = covid_data[covid_data.state == 'Hawaii']

In [42]:
hi

Unnamed: 0,date,county,state,fips,cases,deaths
611,2020-03-06,Honolulu,Hawaii,15003.0,1,0
695,2020-03-07,Honolulu,Hawaii,15003.0,1,0
792,2020-03-08,Honolulu,Hawaii,15003.0,2,0
912,2020-03-09,Honolulu,Hawaii,15003.0,2,0
1053,2020-03-10,Honolulu,Hawaii,15003.0,2,0
...,...,...,...,...,...,...
322184,2020-07-11,Maui,Hawaii,15009.0,133,6
325359,2020-07-12,Hawaii,Hawaii,15001.0,101,0
325360,2020-07-12,Honolulu,Hawaii,15003.0,923,13
325361,2020-07-12,Kauai,Hawaii,15007.0,43,0


Now that I have a set of Hawaii data, I want to organize this data. I want to eventually be able to graph the total number of cases and deaths over the per capita income for each county. Since the NYT data is totalling the number of cases and deaths, I can just use the most recent numbers from 07/12/2020.

In [43]:
hi_latest = hi[hi.date == '2020-07-12']

In [44]:
hi_latest

Unnamed: 0,date,county,state,fips,cases,deaths
325359,2020-07-12,Hawaii,Hawaii,15001.0,101,0
325360,2020-07-12,Honolulu,Hawaii,15003.0,923,13
325361,2020-07-12,Kauai,Hawaii,15007.0,43,0
325362,2020-07-12,Maui,Hawaii,15009.0,134,6


Now I can do a similar data parsing with the county data to narrow down the data to only 

In [45]:
county_data

Unnamed: 0,Rank,County or county-equivalent,"State, federal district or territory",Per capitaincome,Medianhouseholdincome,Medianfamilyincome,Population,Number ofhouseholds
0,1,New York County,New York,"$62,498","$69,659","$84,627",1605272,736192
1,2,Arlington,Virginia,"$62,018","$103,208","$139,244",214861,94454
2,3,Falls Church City,Virginia,"$59,088","$120,000","$152,857",12731,5020
3,4,Marin,California,"$56,791","$90,839","$117,357",254643,102912
4,5,Alexandria City,Virginia,"$54,608","$85,706","$107,511",143684,65369
...,...,...,...,...,...,...,...,...
3292,—,Western District,American Samoa,"$6,429","$24,705","$24,916",31329,5418
3293,,,American Samoa,"$6,311","$23,892","$24,706",55519,9688
3294,—,Eastern District,American Samoa,"$6,191","$23,350","$24,911",23030,3982
3295,—,Maricao,Puerto Rico,"$5,943","$13,462","$15,864",6276,1914


In [46]:
county_data.columns = [c.replace(' ', '_') for c in county_data.columns]

In [47]:
county_data

Unnamed: 0,Rank,County_or_county-equivalent,"State,_federal_district_or_territory",Per_capitaincome,Medianhouseholdincome,Medianfamilyincome,Population,Number_ofhouseholds
0,1,New York County,New York,"$62,498","$69,659","$84,627",1605272,736192
1,2,Arlington,Virginia,"$62,018","$103,208","$139,244",214861,94454
2,3,Falls Church City,Virginia,"$59,088","$120,000","$152,857",12731,5020
3,4,Marin,California,"$56,791","$90,839","$117,357",254643,102912
4,5,Alexandria City,Virginia,"$54,608","$85,706","$107,511",143684,65369
...,...,...,...,...,...,...,...,...
3292,—,Western District,American Samoa,"$6,429","$24,705","$24,916",31329,5418
3293,,,American Samoa,"$6,311","$23,892","$24,706",55519,9688
3294,—,Eastern District,American Samoa,"$6,191","$23,350","$24,911",23030,3982
3295,—,Maricao,Puerto Rico,"$5,943","$13,462","$15,864",6276,1914


Since the names of the columns had spaces and it wouldn't allow me to access the column names, I replaced the spaces with underscores.

In [48]:
county_data.columns = [c.replace(',','') for c in county_data.columns]

I also had to take out the comma in the 3rd column name.

In [49]:
hi_county = county_data[county_data.State_federal_district_or_territory == 'Hawaii']

In [50]:
hi_county

Unnamed: 0,Rank,County_or_county-equivalent,State_federal_district_or_territory,Per_capitaincome,Medianhouseholdincome,Medianfamilyincome,Population,Number_ofhouseholds
21,22.0,Kalawao,Hawaii,"$45,515","$59,375","$88,750",71,46
322,311.0,Honolulu,Hawaii,"$30,361","$72,764","$85,440",964678,309803
379,363.0,Maui,Hawaii,"$29,517","$63,512","$75,407",156633,52623
389,,,Hawaii,"$29,305","$67,402","$79,963",1376298,449771
744,716.0,Kauai,Hawaii,"$26,658","$62,052","$73,205",67872,22390
1168,1127.0,Hawaii County,Hawaii,"$24,635","$51,250","$59,862",187044,64909
