# POINT PATTERN ANALYSIS OF THE SPREAD OF COVID-19 IN UNITED STATES OF AMERICA (USA) 2020-2022

## INTRODUCTION
The CORONA VIRUS 2019 (referred to as C-19 from hereon out) was contact transmitted virus that ravaged the world from the close of 2019 to the later dates of 2022. One of the most hit countries was the USA with close to 1.2 million casualties. 
The aim of this analysis is to attempt to find the spatial relationships between the cases reported in the pandemic and the spread of the pandemic.

In [1]:
import pandas as pd
import numpy as np
import geopandas as gpd
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns
import os

## EXPOLRATORY DATA ANALYSIS

### _Data_
Since USA is a very big country with over 1000 counties. The data here is reduced to mainland US. This is to facilitate the spatial correlation and mitigate any disjoint errors due to lack of spatial continuity. Alternatively, This allows for the maps to be compact, as US territory spans from Alaska, to Virgin Islands.

In [2]:
us_covid_link = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_US.csv"
us_covid_df = pd.read_csv(us_covid_link)
us_states_gdf = gpd.read_file("https://raw.githubusercontent.com/roywaswa/pattern_analysis_us/main/data/US_State_Boundaries/US_State_Boundaries.shp")
us_counties_gdf = gpd.read_file("C:/Users/roywa/GIS_RS/Data/shapefiles/USA_Counties_626072402819112956/USA_Counties-wgs84.shp")

# Remove data whose LAT and LONG are not within the US
us_covid_df = us_covid_df[us_covid_df.Lat.between(24, 50) & us_covid_df.Long_.between(-125, -66)]
# Filter the counties states by the state_names
us_counties_gdf = us_counties_gdf[us_counties_gdf.STATE_NAME.isin(us_covid_df.Province_State)]
us_covid_df = us_covid_df.drop(columns=["UID", "iso2", "iso3", "code3", "FIPS", "Admin2", "Country_Region", "Combined_Key"])
all_dates = list(us_covid_df.columns[4:])

In [3]:
# Select first date of each month
first_dates = [date for date in all_dates if date.split("/")[1] == "1"]
us_covid_df = us_covid_df[["Province_State", "Lat", "Long_","Population", *first_dates]]

# Filter by mainland US states
state_names = list(us_states_gdf['NAME'])
us_covid_df = us_covid_df[us_covid_df['Province_State'].isin(state_names)]

# Generate new geodataframe with covid data from lat and long
us_covid_gdf = gpd.GeoDataFrame(us_covid_df, geometry=gpd.points_from_xy(us_covid_df.Long_, us_covid_df.Lat))
us_covid_gdf.crs = us_states_gdf.crs

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(10, 10))
us_counties_gdf.plot(ax=ax, color='white', edgecolor='black')
us_covid_gdf.plot(ax=ax, color='red', markersize=10)
fig.show()

In [None]:
# Merge the two geodataframes based on intersection
us_covid_counties_gdf = gpd.sjoin( us_counties_gdf,us_covid_gdf, how='inner', predicate='intersects')
fig, ax = plt.subplots(1, 1, figsize=(12, 12))
us_covid_counties_gdf.plot()
fig.show()


### Exploratory Date

The date of September 1st 2020 will be taken for the sake of exploratory analysis of spatial metrics. This will later be expanded iteratively to all the months between 2020 - 2022 (24 months)

In [None]:
all_columns = list(us_covid_counties_gdf.columns)
rel_columns = all_columns[:all_columns.index("Population")+1]
rel_columns.append(all_columns[all_columns.index("9/1/20")])

In [None]:
us_cc_sep = us_covid_counties_gdf[rel_columns]
# Plot choropleth map of the deaths on 9/1/20   
fig, ax = plt.subplots(1, 1, figsize=(10, 10))
us_counties_gdf.plot(ax=ax, color='white', edgecolor='black')
us_cc_sep.plot(column="9/1/20", ax=ax, legend=True, cmap="OrRd")
fig.show()


In [None]:
# plot for all dates
fig, ax = plt.subplots(1, 1, figsize=(10, 10))
for date in first_dates:
    us_counties_gdf.plot(ax=ax, color='white', edgecolor='black')
    us_covid_counties_gdf.plot(column=date, ax=ax, legend=True, cmap="OrRd")

fig.show()

In [None]:
import libpysal as ps


In [None]:
weighting = weights.Queen.from_dataframe(us_cc_sep)
morans = ps.Moran(us_cc_sep["9/1/20"], weighting)
print(morans.I)

In [None]:
moran.p_sim

In [None]:
lisa = ps.explore.esda.Moran_Local(us_covid_counties_gdf, weighting)
# Explore lisa.Is (local Moran's I values) and lisa.q (quadrants)
