A Personal Project by Tanas Gangadhar. Version 1 (1/10/21)

The following is a simple investigation of COVID-19 in the United States. With over 20 million Americans that have tested positive for the disease, a geospatial investigation into the nation's COVID outbreak can give much insight into the nature of the pandemic and its progression. While geographic COVID data is already available online, a personal analysis allows for more investigative independence.  

The two maps charted are:
1) covid_map: Shows number of infected indivudals per county
2)infected_map: Shows the percentage of the county's population that tested positive for the disease. 

The data is sourced from the Johns Hopkins repository, which is updated daily with new infection data. The current version will be updated over time with enhanced metrics and visualization tools. 

The mapping module used is the Python Folium package. 

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import json
import folium 
import branca.colormap as cmp


import geopandas as gpd
import matplotlib.pyplot as plt




# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


#USA_COVID_Confirmed = pd.read_csv('../input/covid19-data-from-john-hopkins-university/CONVENIENT_us_confirmed_cases.csv')
#USA_COVID_Confirmed.head()



In [None]:
#Load USA Cases Raw Data and index FIPS to match GeoJSON id. 

USA_COVID_cases = pd.read_csv('../input/covid19-data-from-john-hopkins-university/RAW_us_confirmed_cases.csv')
USA_COVID_cases.drop(['UID', 'iso2', 'iso3', 'code3', 'Country_Region', 'Lat', 'Long_'], axis = 1, inplace = True)
USA_COVID_cases.rename(columns = {'Province_State' : 'State', 'Admin2' : 'County', 'Combined_Key' : 'Location'}, inplace = True)



#Investigate missing values
missing_values_count = USA_COVID_cases.isnull().sum()
print(missing_values_count[0:10])
print(USA_COVID_cases.shape)
USA_COVID_cases.dropna(axis = 'index', inplace = True )



USA_COVID_cases['FIPS'] = USA_COVID_cases['FIPS'].astype('int64')

                                  


USA_COVID_cases['FIPS'] = USA_COVID_cases['FIPS'].astype(str).str.zfill(5)
USA_COVID_cases['id'] = USA_COVID_cases['FIPS']
#USA_COVID_cases = USA_COVID_cases.set_index('CountyCode')

#USA_COVID_cases = USA_COVID_cases.set_index('FIPS')

USA_COVID_cases.head()




In [None]:
#Get GeoJSON
geojson = '../input/geojsoncountiesfipsjson/geojson-counties-fips.json'
counties = gpd.read_file(geojson)
print(type(counties))
counties = counties.set_index('id')

#counties = counties.merge( USA_COVID_cases, left_index = True , how = 'left', right_on = ['1/11/21'] )
counties.head()


In [None]:
#Initialize Map

covid_map = folium.Map(location= [37, -102], zoom_start = 5)

#Total Cases by Day
folium.Choropleth(geo_data = counties,
                  data = USA_COVID_cases,
                  columns = ['FIPS', '1/9/21'],
                  fill_color = 'YlOrRd',
                  key_on = 'feature.id',
                  bins = [0, 10, 100, 1000, 5000, 10000, 20000, 40000, 100000, 1000000],
                  legend_name = 'Total COVID cases by US County',
                  nan_fill_color = 'black').add_to(covid_map)

g = folium.GeoJson(counties,
                   name = 'US County').add_to(covid_map)

folium.GeoJsonTooltip(fields = ['NAME'], aliases = ['County'] ).add_to(g)


covid_map

After obtaining the choropleth map of total people infected per county, I analyzed the percentage of infected residents of the total county population. 

In [None]:
County_Populations = pd.read_csv("../input/covid19-data-from-john-hopkins-university/CONVENIENT_us_metadata.csv")
County_Populations.drop(['Lat', 'Long'], axis = 1, inplace = True)
County_Populations.rename(columns = {'Province_State' : 'State', 'Admin2' : 'County'}, inplace = True)
#County_Populations.dropna(axis = 'index', inplace = True )
County_Populations.drop(County_Populations[County_Populations['County'] == 'Unassigned'].index, inplace = True)

County_Populations



In [None]:
USA_Percent_Infected = pd.read_csv('../input/covid19-data-from-john-hopkins-university/RAW_us_confirmed_cases.csv')
USA_Percent_Infected.drop(['UID', 'iso2', 'iso3', 'code3', 'Country_Region', 'Lat', 'Long_'], axis = 1, inplace = True)
USA_Percent_Infected.rename(columns = {'Province_State' : 'State', 'Admin2' : 'County', 'Combined_Key' : 'Location'}, inplace = True)

USA_Percent_Infected.drop(USA_Percent_Infected[USA_Percent_Infected['County'] == 'Unassigned'].index, inplace = True)



pd.set_option('display.max_rows', 4000)
USA_Percent_Infected = pd.merge(USA_Percent_Infected, County_Populations, how = 'left')
USA_Percent_Infected.dropna(axis = 'index', inplace = True )

USA_Percent_Infected['FIPS'] = USA_Percent_Infected['FIPS'].astype('int64')

USA_Percent_Infected['FIPS'] = USA_Percent_Infected['FIPS'].astype(str).str.zfill(5)

USA_Percent_Infected['COVID-19 Positive'] = (USA_Percent_Infected['1/10/21'])/(USA_Percent_Infected['Population']) * 100
USA_Percent_Infected.dropna(axis = 'index', inplace = True )

USA_Percent_Infected.drop(USA_Percent_Infected[USA_Percent_Infected['Population'] == 0 ].index, inplace = True)

USA_Percent_Infected

In [None]:
infected_map = folium.Map(location= [37, -102], zoom_start = 5)



folium.Choropleth(geo_data = counties,
                  data = USA_Percent_Infected,
                  columns = ['FIPS', 'COVID-19 Positive'],
                  fill_color = 'YlOrRd',
                  key_on = 'feature.id',
                  legend_name = 'Percentage of County Population Infected with COVID',
                  bins = [0, 4, 8, 12, 16, 20, 24, 28],
                  nan_fill_color = 'black').add_to(infected_map)



infected_map