[Link to Covid Maps - USA Part 1](https://www.kaggle.com/blakkmagic/covid-maps-usa-part-1)

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.


import geopandas as gpd
from shapely.geometry import LineString
from geopandas.tools import geocode
import folium
from folium import Choropleth, Circle, Marker
from folium.plugins import HeatMap, FastMarkerCluster
from folium import plugins
import math
import webbrowser
from IPython.display import HTML
import matplotlib.pyplot as plt
from pandasql import sqldf
import plotly.express as px

#turn off settingwithcopywarning off
pd.options.mode.chained_assignment = None

# **Step 4 - Geocoding with an external file continued...**

So far the maps we have created have been based on the total number of cases or deaths. It could also be interesting to map the cases or deaths over time to visualise the spread of Covid-19 in the US. So this time instead of taking the max date, we will use numerous dates. Ideally, I would like to use all dates but notebook will crash when committed so we will restrict the dataset to just the first 80 days. Day 80 is roughly the 10th of April.

In [None]:
#Have a look at initial dataset
US_covid_data = pd.read_csv("../input/us-counties-covid-19-dataset/us-counties.csv")
#Restrict dates to be before 11th April
US_covid_data = US_covid_data.loc[US_covid_data['date']<'2020-04-11']
#Create a concatenated column - not actually important since we'll map with fips column
US_covid_data['concat'] = US_covid_data['county']+str(', ')+US_covid_data['state']

US_covid_data = US_covid_data.sort_values(['date'], ascending = True)
US_covid_data

In [None]:
US_covid_data['concat'] = US_covid_data['county']+str(', ')+US_covid_data['state']
US_covid_data = US_covid_data.sort_values(['date'], ascending = True)
US_covid_data

At this point we need to account for New York City by inserting fips = 36061. We will do it in a slightly different way to last time by using the fillna() function to account for the multiple rows of NY City data. If you were to cut the data to only show New York City you will see that the only null values exist in the 'fips' column. Therefore, simple use of the fillna function to replace all nulls with 36061 will be required. From the code below we can see that originally the 'fips' column contains only nulls and after manipulation it only contains the value 36061

In [None]:
print("Values in fips column prior to manipulation: \n" +str(US_covid_data.loc[US_covid_data['county'] == 'New York City'].fips.value_counts())+"\n")
US_covid_data.loc[US_covid_data['county'] == 'New York City'] = US_covid_data.loc[US_covid_data['county'] == 'New York City'].fillna(36061.0)
print("Values in fips column after manipulation: \n" +str(US_covid_data.loc[US_covid_data['county'] == 'New York City'].fips.value_counts()))

The following steps are used create the 'time' aspect this heat map. Will be looking at time in terms of days e.g. day 1,2,3,... as opposed to looking at it from an actual date perspective.

Will be using SQL type functions to return and order distinct dates from the US_covid_data dataframe I created so that the represent day 1,2,3 etc. Final step is to join the 'day' column back to the US_covid_data dataframe as this will soon be joined with the GeoDataFrame and then be mapped.

In [None]:
q1='select DISTINCT date FROM US_covid_data'
df_new=sqldf(q1)
df_new['day'] = np.arange(len(df_new))
df_new

In [None]:
concat_result = US_covid_data.merge(df_new,on='date', how = 'left')
concat_result.head()

Now it's time to prepare the GeoDataFrame. Could work with what we used earlier to create the static heat maps however I will re-prepare it here so it is easy to follow along

In [None]:
us_counties_shapefile = gpd.read_file("../input/us-counties-geocoded/tl_2017_us_county.shp")
us_counties_shapefile.head()
us_counties_dataframe = pd.DataFrame(us_counties_shapefile[['GEOID', 'INTPTLAT', 'INTPTLON']])
us_counties_dataframe['GEOID'] = us_counties_dataframe['GEOID'].astype('float64')

concat_result2 = concat_result.merge(us_counties_dataframe,left_on = 'fips', right_on = 'GEOID', how = 'left')

concat_result2.head()

Now it is time to account for Kansas City. This will mean inserting the latitude and longitude for Kansas City of 39.0997 and -94.5786 into the 'INTPTLAT' and 'INTPTLON' columns.

Should also note that we will fill in the 'fips' and 'GEOID' columns with arbitrary numbers so that these rows aren't dropped when we use the dropna() function later on.

In [None]:
#Fill in 'fips' column with arbitrary number = 1
concat_result2.loc[concat_result2['county'] == 'Kansas City','fips'] = concat_result2.loc[concat_result2['county'] == 'Kansas City','fips'].fillna(1)

#Fill in 'GEOID' column with arbitrary number = 1
concat_result2.loc[concat_result2['county'] == 'Kansas City','GEOID'] = concat_result2.loc[concat_result2['county'] == 'Kansas City','GEOID'].fillna(1)

#Fill in 'INTPTLAT' column with +39.09970
concat_result2.loc[concat_result2['county'] == 'Kansas City','INTPTLAT'] = concat_result2.loc[concat_result2['county'] == 'Kansas City','INTPTLAT'].fillna('+39.0997000')

#Fill in 'INTPTLON' column with -94.57860
concat_result2.loc[concat_result2['county'] == 'Kansas City','INTPTLON'] = concat_result2.loc[concat_result2['county'] == 'Kansas City','INTPTLON'].fillna('-94.5786000')


After dealing with NY City, Kansas City and the time aspect it is now time do clean the data up a bit more before mapping. This includes dropping rows where the county is 'unknown' which is represented by null values in the 'GEOID' and 'fips' column.

We will also duplicate rows based on the 'cases' column. Given there is a time aspect involved here, the resulting dataset would be over 10 million rows if I did not restrict the dataset to end at the 10th of April. The map will take a long time to produce even after restricting and it is a bit clunky. 

In [None]:
concat_result2['INTPTLAT'] = concat_result2['INTPTLAT'].astype('str')
concat_result2['INTPTLAT']
concat_result2['INTPTLAT'] = concat_result2['INTPTLAT'].str[1:]
concat_result2['INTPTLAT']

concat_result2.dropna(how = 'any', inplace = True)

concat_result_cases = concat_result2.loc[concat_result2.index.repeat(concat_result2['cases'])]
print("Shape of dataset to be mapped: " +str(concat_result_cases.shape))


concat_result_cases['INTPTLAT'] = concat_result_cases['INTPTLAT'].astype('float64')
concat_result_cases['INTPTLON'] = concat_result_cases['INTPTLON'].astype('float64')

# **Heat Map with Time - Cases**
**For reasons I don't understand the play, forward, backward and loop buttons are missing their logos but they do work on the bottom left. Fps also seems to be 1 whereas I have set min fps to 4. There is no slider to change the fps either**

In [None]:
max_date = concat_result_cases['day'].max()

heat_data = [[[row['INTPTLAT'],row['INTPTLON']] 
              for index, row in concat_result_cases[concat_result_cases['day'] == i].iterrows()] for i in range(0,max_date)]

map4 = folium.Map(location=[40, -95], zoom_start=4)

hm = plugins.HeatMapWithTime(heat_data,auto_play=True,max_opacity=0.8, min_speed = 4, overlay=False,radius = 16.5, display_index=True)
hm.add_to(map4)

map4.save('plot_data4.html')   
HTML('<iframe src=plot_data4.html width=800 height=600></iframe>')
        

Working with heat maps certaintly carries visual appeal. We will now move on. In [part 3](https://www.kaggle.com/blakkmagic/covid-maps-usa-part-3) of this analysis we will use Choropleth maps to visualise the Covid-19 dataset