# Mapping Covid-19 Data

This exercise introduces some basic plotting of geospatial  time series data - Global and US statistics on Covid-19 deaths and confirmed cases.

The 'Geopandas' analysis module will be used to download and visualize the data in map and chart form>

## install the needed additional modules

The python environment in Colaboratory has many modules for scientific computing already installed, but it is straightforward to add additional modules. For this exercise, we will need the 'Geopandas' module to work with geospatial data.

In [None]:
!pip install geopandas

In [None]:
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import folium

## Get the data

We will work with a dataset maintained by John Hopkins University on github. The .csv files online can be directly read in as 'pandas' datasets.


> Time Series Datasets - https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_US.csv



> Source of the data on github - https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_dataIndented






In [None]:
# needs to point to the 'raw' csv file on github, not the nicely rendered html version
url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
usdeathsurl = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_US.csv'
worlddeathsurl = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
# Get a nice .csv with ISO3 codes to match later...
luurl = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/UID_ISO_FIPS_LookUp_Table.csv"

# read in the data
usdeaths = pd.read_csv(usdeathsurl)
confirmed = pd.read_csv(url)
gdeaths = pd.read_csv(worlddeathsurl)
lutab = pd.read_csv(luurl)

## Have a look at the data

Check out the global confirmed cases globally

In [None]:
confirmed.head(3)

The us deaths by county, etc.

In [None]:
usdeaths.head(3)

Global deaths by country, but there are some matching issues later

In [None]:
gdeaths.head(3)

In [None]:
# look at first 6 fields in the data
list(usdeaths)[:6]

## Make the data geospatial

This can be done:
- directly into points using the Lat/Long coordinates
- by joining the tables to existing polygon data

To convert directly into points, create a 'geometry' field from the Lat and Long fields

In [None]:
# make into point geopandas dataframes

usdeaths = gpd.GeoDataFrame(usdeaths, crs=('epsg:4326'), geometry=gpd.points_from_xy(usdeaths.Long_, usdeaths.Lat))
confirmed = gpd.GeoDataFrame(confirmed, crs=('epsg:4326'), geometry=gpd.points_from_xy(confirmed.Long, confirmed.Lat))
gdeaths = gpd.GeoDataFrame(gdeaths, crs=('epsg:4326'), geometry=gpd.points_from_xy(gdeaths.Long, gdeaths.Lat))

A 'geometry' column is at the end of the datasets now

In [None]:
list(gdeaths)[-3:]

In [None]:
list(confirmed)[-3:]

The data on the most recent number of cases or deaths in the the second to last column now

In [None]:
mostrecent = list(usdeaths)[-2] #last item in list is geometry
mostrecent

## Plot the point data
- use the most recent deaths to create graduated sizes

A basic plot of just the points is easily done, using default colours. The size of the points (markersize) is specified in points?

https://stackoverflow.com/questions/14827650/pyplot-scatter-plot-marker-size#:~:text=The%20standard%20size%20of%20points,is%20hence%201%2F72%20inches.&text=If%20you%20are%20interested%20in,data%20units%2C%20check%20this%20answer.

https://www.delftstack.com/howto/matplotlib/how-to-set-marker-size-of-scatter-plot-in-matplotlib/

Check out the highest value in the most recent data

In [None]:
gdeaths[mostrecent].max()

May as well look at all the descriptive statisitics.

In [None]:
gdeaths.describe()

Try dividing the value (of deaths) by 500 to size the points.  Fancier scaling could be done but this works for now

In [None]:
gdeaths.plot(markersize=gdeaths[mostrecent]/500, figsize=(15, 10))

## Add a background map

In [None]:
# get some base data - first, check out the data installed with geopandas
# Returns path of a particular map
datapath = gpd.datasets.get_path('naturalearth_lowres')
# Opens the map as a GeoDataFrame
world = gpd.read_file(datapath)

In [None]:
ax = world.plot(figsize=(15, 10))
gdeaths.plot(ax=ax, color='k', markersize=gdeaths[mostrecent]/500)

Not the greatest colours or map projection, but we can worrry about that later.

Try plotting the global confirmed cases data

In [None]:
ax = world.plot(figsize=(15, 10))
confirmed.plot(ax=ax, color='k', markersize=confirmed[mostrecent]/500)

Probably should adjust the marker size, but that can happen later.

Have a look at the us data on deaths.

In [None]:
# the data table
usdeaths.head(3)

Plot the data

In [None]:
ax = world.plot(figsize=(15, 10))
usdeaths.plot(ax=ax, color='k', markersize=usdeaths[mostrecent]/500)

## Adjust the map area

We can filter the points by latitude and longitude to focus more on North America.

https://kanoki.org/2020/01/21/pandas-dataframe-filter-with-multiple-conditions/

In [None]:
# get only the US points above the equator and west of Greenwich, England
nw_world_us = usdeaths.loc[(usdeaths["Lat"]>0) & (usdeaths["Long_"]<0)]

nw_world_us.plot()

Now plot the data with some background information.

In [None]:
# Subsets the world GeoDataFrame
usa = world[world.name == "United States of America"]

# plot, using the subset points first
ax = nw_world_us.plot(figsize=(15, 10))
usa.plot(ax=ax)
#world.plot(ax=ax, color='grey')
nw_world_us.plot(ax=ax, color='k', markersize=nw_world_us[mostrecent]/50)

## A little fancier styling

A US Map

https://www.earthdatascience.org/courses/scientists-guide-to-plotting-data-in-python/plot-spatial-data/customize-vector-plots/python-change-spatial-extent-of-map-matplotlib-geopandas/

Although since the data is lat/long it would be straightforward to set the extent directly with xmin, etc.

In [None]:
# Get spatial extent  - to zoom in on the map rather than clipping
aoi_bounds = nw_world_us.geometry.total_bounds
# print(f"Figure boundary{aoi_bounds}")

# Create x and y min and max objects to use in the plot boundaries
xmin, ymin, xmax, ymax = aoi_bounds

# Plot the data with a modified spatial extent
fig, ax = plt.subplots(figsize = (10,6))

xlim = ([xmin,  xmax])
ylim = ([ymin,  ymax])

ax.set_xlim(xlim)
ax.set_ylim(ylim)

nw_world_us.plot(color='red', alpha = .5, ax = ax, markersize=nw_world_us[mostrecent]/10)
world.plot(color='grey', ax=ax, alpha=.5)

ax.set(title='US Deaths from Covid \n Zoomed into the continental United States')
ax.set_axis_off()
plt.show()

## Styling a global map

Now, lets return to the global data, and put some text in the figure.

https://matplotlib.org/3.2.2/gallery/pyplots/text_commands.html#sphx-glr-gallery-pyplots-text-commands-py


### Calculate some statistics for the figure

Since the data is there, it is nice to have supporting information for a figure.

- maximum deaths 
- total deaths

- maximum confirmed
- total confirmed

It would be nice to identify the name of the max death and confirmed country also...

Use pandas methods to calculate these.

In [None]:
gdmax = gdeaths[mostrecent].max()
maxdcountry = gdeaths[gdeaths[mostrecent]==gdmax].iloc[0]['Country/Region']
gdsum = gdeaths[mostrecent].sum()

gcmax = confirmed[mostrecent].max()
maxccountry = confirmed[confirmed[mostrecent]==gcmax].iloc[0]['Country/Region']
gcsum= confirmed[mostrecent].sum()

In [None]:
# check out the country with the recent maximum
maxdcountry

In [None]:
# total global deaths
gdsum

In [None]:
# total global cases
gcsum

In [None]:
# highest country value confirmed
gcmax

In [None]:
# most confirmed cases in
maxccountry

In [None]:
# Plot the data with a modified spatial extent
fig, ax = plt.subplots(figsize = (15,7))

fig.suptitle(f'World Deaths from Covid {mostrecent}', fontsize=14, fontweight='bold', color='red', alpha=.5)




world.plot(color='grey', ax=ax, alpha=.5)

confirmed.plot(ax=ax, color='k', alpha=.5, markersize=confirmed[mostrecent]/1000)

gdeaths.plot(color='red', alpha = .5, ax = ax, markersize=gdeaths[mostrecent]/1000)

ax.set_title(f'Confirmed Covid Cases {mostrecent}', size=14, fontweight='bold', color='k', alpha=.5)


ax.text(0.90, 0.11, f'{gcmax:,} confirmed country highest in {maxdcountry} - {gcsum:,} total cases',
        verticalalignment='bottom', horizontalalignment='right',
        transform=ax.transAxes,
        color='grey', fontsize=15, bbox={'color':'white', 'alpha':0.9, 'pad':2})

ax.text(0.90, 0.01, f'{gdmax:,} deaths country highest in {maxccountry} - {gdsum:,} total dead \n ',
        verticalalignment='bottom', horizontalalignment='right',
        transform=ax.transAxes,
        color='red', fontsize=15, bbox={'color':'white', 'alpha':0.9, 'pad':2})

ax.set_axis_off()
plt.savefig('CovidPlot.pdf')
plt.show()

## Make a function

That looks okay - now to adapt the code into a function so that any date could be plotted automatically.

In [None]:
def plotdate(targ_date):

  # calculate values for the text and title
  gdmax = gdeaths[targ_date].max()
  maxdcountry = gdeaths[gdeaths[targ_date]==gdmax].iloc[0]['Country/Region']
  gdsum = gdeaths[targ_date].sum()

  gcmax = confirmed[targ_date].max()
  maxccountry = confirmed[confirmed[targ_date]==gcmax].iloc[0]['Country/Region']
  gcsum= confirmed[targ_date].sum()

  # Plot the data 
  fig, ax = plt.subplots(figsize = (15,7))

  # title and subtitle
  fig.suptitle(f'World Deaths from Covid {targ_date}', fontsize=14, fontweight='bold', color='red', alpha=.5)

  ax.set_title(f'Confirmed Covid Cases {targ_date}', size=14, fontweight='bold', color='k', alpha=.5)

  world.plot(color='grey', ax=ax, alpha=.5)

  confirmed.plot(ax=ax, color='k', alpha=.5, markersize=confirmed[targ_date]/1000)

  gdeaths.plot(color='red', alpha = .5, ax = ax, markersize=gdeaths[targ_date]/1000)

  # Text for map statistics
  ax.text(0.90, 0.11, f'{gdmax:,} deaths country highest in {maxccountry} - {gdsum:,} total dead \n ',
          verticalalignment='bottom', horizontalalignment='right',
          transform=ax.transAxes,
          color='red', fontsize=15, bbox={'color':'white', 'alpha':0.9, 'pad':2})
  
  ax.text(0.90, 0.08, f'{gcmax:,} confirmed country highest in {maxdcountry} - {gcsum:,} total cases',
          verticalalignment='bottom', horizontalalignment='right',
          transform=ax.transAxes,
          color='grey', fontsize=15, bbox={'color':'white', 'alpha':0.9, 'pad':2})

  ax.set_axis_off()
  plt.show()

Check out to see what is different in the function compared to the code above!

Now, see if the function works...

In [None]:
plotdate('1/28/20')

In [None]:
plotdate('2/28/20')

## Plot a time series from the data

Since the data is a time series, we might as well have a look at this aspect as well.

The table is convenient, but not formatted for easy plotting over time, so will have to do some manipulations to easily plot a time series for a country.

https://towardsdatascience.com/how-to-plot-time-series-86b5358197d6

In [None]:
us = confirmed[confirmed['Country/Region'] == 'US']
#confirmed.iloc[]
us

Now that we have a single country, we can pull the pertinent data is several ways.

In [None]:
# array of values from the columns by index number (selecting rows, then columns)
usvalues = us.iloc[0, 4:-1].values

In [None]:
usvalues

In [None]:
plt.plot(usvalues)

In [None]:
# plot the first row of confirmed data
plt.plot(confirmed.iloc[0, 4:-1].values)

#### top 5 for a date

There are other ways to get this data, and maintain a connection to the 'date' information as well.

First, we can get the top 5 (or however many) values for a particular date fairly easily.

In [None]:
top5 = confirmed.sort_values(by=['10/1/20'], ascending=False).head(5)
top5



Now, access the most recent date and plot the data over time.  The code below creates a new dataframe from the information, drops the unneeded colums, and plots the information.

In [None]:
df = confirmed.sort_values(by=[mostrecent], ascending=False).head(5)
df = pd.DataFrame(df.drop(columns=['Lat', 'Long', 'Province/State', 'geometry']))
df = df.pivot_table(columns=['Country/Region'])
# Converting the index as date
df.index = pd.to_datetime(df.index)
df.plot(figsize=(15,3))

### Adopt the code above into a function

In [None]:
def plotdate_chart(targ_date):
  import matplotlib.gridspec as gridspec

  # calculate values for the text and title
  gdmax = gdeaths[targ_date].max()
  maxdcountry = gdeaths[gdeaths[targ_date]==gdmax].iloc[0]['Country/Region']
  gdsum = gdeaths[targ_date].sum()

  gcmax = confirmed[targ_date].max()
  maxccountry = confirmed[confirmed[targ_date]==gcmax].iloc[0]['Country/Region']
  gcsum= confirmed[targ_date].sum()

  #fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(15, 8))
  fig = plt.figure(figsize=(12,10))

  gs = gridspec.GridSpec(nrows=2, ncols=1, height_ratios=[3, 1])

  ax1 = fig.add_subplot(gs[0, 0])
  ax2 = fig.add_subplot(gs[1, 0])

  #fig = plt.figure()
  #ax1 = fig.add_subplot(211)
  #ax1.plot([1],[1])
  #ax.tick_params(axis=u'both', which=u'both',length=0)
  #ax2 = fig.add_subplot(212)



  
  # Plot the data 
  #ax1 = fig.add_subplot(2, 1, 1)
  #ax2 = fig.add_subplot(2, 1, 2)

  #ax2 = fig.add_axes([0.1, 0.1, 0.9, 0.15])
  #ax1 = fig.add_axes([0.1, 0.1, 0.9, 0.9])
  
  # title and subtitle
  fig.suptitle(f'World Deaths from Covid {targ_date}', fontsize=14, fontweight='bold', color='red', alpha=.5)

  ax1.set_title(f'Confirmed Covid Cases {targ_date}', size=14, fontweight='bold', color='k', alpha=.5)
  

  world.plot(color='grey', ax=ax1, alpha=.5)

  confirmed.plot(ax=ax1, color='k', alpha=.5, markersize=confirmed[targ_date]/1000)

  gdeaths.plot(color='red', alpha = .5, ax = ax1, markersize=gdeaths[targ_date]/1000)

  # Text for map statistics
  ax1.text(0.90, 0.11, f'{gdmax:,} deaths country highest in {maxccountry} - {gdsum:,} total dead \n ',
          verticalalignment='bottom', horizontalalignment='right',
          transform=ax1.transAxes,
          color='red', fontsize=15, bbox={'color':'white', 'alpha':0.9, 'pad':2})
  
  ax1.text(0.90, 0.08, f'{gcmax:,} confirmed country highest in {maxdcountry} - {gcsum:,} total cases',
          verticalalignment='bottom', horizontalalignment='right',
          transform=ax1.transAxes,
          color='grey', fontsize=15, bbox={'color':'white', 'alpha':0.9, 'pad':2})

  #ax1.set_axis_off()

  df = confirmed.sort_values(by=[targ_date], ascending=False).head(5)
  df = pd.DataFrame(df.drop(columns=['Lat', 'Long', 'Province/State', 'geometry']))
  df = df.rename(columns={'Country/Region':'date'})
  df = df.pivot_table(columns=['date']) 
  # Converting the index as date
  df.index = pd.to_datetime(df.index)

  # filter the data by date
  # df = df.loc[:pd.to_datetime(targ_date)]


  df.plot(ax=ax2)
  plt.show()

In [None]:
plotdate_chart(mostrecent)

https://python4astronomers.github.io/plotting/advanced.html

In [None]:
plotdate_chart('6/6/20')

In [None]:
plotdate_chart('4/4/20')

## Look at it with Folium

Another way to look at the data is with an online map.  We an use the 'folium' module to do this. The code may look a bit intimidating, but check it out.

You can also look at this article, or search online for more references: 

- Data Visualization with Python Folium Maps
- https://towardsdatascience.com/data-visualization-with-python-folium-maps-a74231de9ef7

In [None]:
# Make an empty map
m = folium.Map(width=1000,height=500,location=[20,0], zoom_start=2)

date = '10/29/20'
loc = 'Covid-19 Confirmed Cases'
title_html = '''
             <h3 align="center" style="font-size:16px"><b>{}</b> - {}</h3>
             '''.format(loc,date)  
m.get_root().html.add_child(folium.Element(title_html))

# get the data
data = confirmed
# I can add marker one by one on the map
for i in range(0,len(data)):
   folium.Circle(
      location=[data.iloc[i]['Lat'], data.iloc[i]['Long']],
      popup=data.iloc[i]['Country/Region'],
      radius=data.iloc[i][date]/10,
      color='crimson',
      fill=True,
      fill_color='crimson'
   ).add_to(m)
 
# Save it as html
m.save(f'GlobalCovid.html')

# view the map
m

### The End

The last command should have also created an html file in your workspace called "GlobalCovid.html" - download this and have a look at it.

This part of the exercise is over. Your task now is to modify or extend some of the code above to plot the data for the usdeaths instead of for global deaths or global confirmed cases.

The folium map could be adapted by adding 'pop-ups' or changing the display.