# Class 11: part 2 

NYU Tandon C4SUE @avigailvantu , April 2021

## Polygons, merging, more mapping features, and creating choropleth maps 

In this notebook we will be back to working with the NYT COVID-19 data that we worked with in the past few classes. 

In [None]:
import pandas as pd
import geopandas as gpd
import numpy as np
%matplotlib inline 

import matplotlib.pyplot as plt
from shapely.geometry import Point
from geopandas import GeoDataFrame
from shapely.geometry import MultiPolygon


## States boundries from the Census Buraeu 

Can be accessed here: 

https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html

In [None]:
#load geodataframe
states = gpd.read_file('cb_2018_us_state_500k')

In [None]:
states.head() 

In [None]:
states.plot( color='red',legend=True,figsize=(30, 12),markersize=0.1)
plt.axis('off')
plt.show()

In [None]:
states.plot( column='NAME',legend=False,figsize=(40, 10),markersize=0.1)
plt.axis('off')
plt.show()

## COVID-19 data from the NYT 

my data is up to date as of April 18th 

https://github.com/nytimes/covid-19-data

In [None]:
#Load data

covidUS = pd.read_csv('us-states.csv')

In [None]:
covidUS.shape

In [None]:
#let's see the data  

covidUS.head()

In [None]:
covidUS.tail()

# Create a DataFrame for April 18th, the last day in data

In [None]:
#filter as of the last day in the data

april2521 = covidUS[covidUS['date']== '2021-04-25']

In [None]:
#check out the new filtered data
april2521.head()

# Time to merge data: 

We will merge data so that we are able to have the covid-19 data merged with the US countries boundries. 

The merge function works the same for DataFrame and GeoDataFrames. Basically, it joins two datasets based on a mutual attribute (column). Because our data is state based, we will merge based on that column. In geographical data, we will join the data without geogrpahical attirbutes (covid 19 for us), into the layer with the geomtery column (state shapfile). 


    

In [None]:
#copy state column as a prep for merging datasets --> both need to be the same name
states['state'] = states['NAME']

In [None]:
# time to merge: 

states = states.merge(april2521, on='state')

In [None]:
#Check out merged data:

states.head()

## Visualize the merged data: 



In [None]:
states.plot(column='cases',figsize=(30, 10),cmap='PuRd')

plt.title('cases')
plt.show()


## Set a map x and y range

Often the deafult of the map that will be plotted will not show the map in its ideal range. For example: 


In [None]:
states.plot(column='cases',figsize=(60, 10),cmap='PuRd')

plt.title('cases by state')
plt.ylim((23,50))
plt.xlim((-130,-66))
plt.show()

## And final touches-->

- remove axis, 
- adjust title + its size & color 
- add legend


In [None]:
states.plot(column='cases',legend=True, figsize=(60, 10),cmap='PuRd')
plt.axis('off')
plt.title('COVID-19 cases by state as of April 18th', fontsize=23, color ='purple')
plt.ylim((23,50))
plt.xlim((-130,-66))
plt.show()

# Le'ts zoom into the the tri-state area 

To do so first let's plot the axis again so we can see the range of the coordinates we want to zoom into.

In [None]:
states.plot(column='cases',legend=True, figsize=(60, 10),cmap='PuRd')
#plt.axis('off')
plt.title('COVID-19 cases by state as of April 18th', fontsize=23, color ='purple')
#this is the y range
plt.ylim((23,50))
#this is the x range 
plt.xlim((-130,-66))
plt.show()

In [None]:
states.plot(column='cases',legend=True, figsize=(60, 10),cmap='PuRd')
#plt.axis('off')
plt.title('Tri-state area covid-19', fontsize=23, color ='purple')
plt.ylim((35,50))
plt.xlim((-80,-66))
plt.show()

# Now Let's Plot Multiple Plots Side-By-Side 

In [None]:
#1.tri-state area 
ax1 = states.plot(column='cases',legend=True, figsize=(60, 10),cmap='summer') 
plt.title('Tri-state area covid-19', fontsize=23, color ='green')
plt.ylim((35,50))
plt.xlim((-80,-66))
plt.show()

#2. west coast 
ax2 = states.plot(column='cases', figsize=(60, 10),cmap='summer')
plt.title('West Coast COVID-19', fontsize=23, color ='green')
plt.ylim((28,50))
plt.xlim((-125,-110))
plt.show()



# Task 1: 

Can you plot 2 plots of the cases in the us:
1. For April 1st, 2020
2. For April 1st, 2021 

* What are some of the trends you found? Which trends have remained the same and which have changed in this past year? 

# Task 2: 

Can you visualize one state's cases only? 

# Task 3: 

using the below counties data, can you merge the data with NYS's counties and visualize the number of cases on April 25th 2021? 

If so, what were the steps that led you there? If not, what are some of the issues you faced and the techniques you looked into? 

Suggested places to look for counties shapefile:
1. https://catalog.data.gov/dataset/tiger-line-shapefile-2016-state-new-york-current-county-subdivision-state-based 
2. https://cugir.library.cornell.edu/catalog/cugir-007865 


In [None]:
counties = pd.read_csv('us-counties.csv')
counties[counties['state']=='New York']

In [None]:
counties[counties['date']=='2021-03-01']

In [None]:
#your code here... 