# Identifying the Risk of COVID-19 and North Atlantic Hurricanes 

**Authors**: Jocelyn Lutes, Quinton Lopez, Uriel Eckmann

## Problem Statement
---
The east coast endures more storms yearly than any other area of the country. The warm temperatures of the Atlantic increase the power of hurricanes. That combined with the common patterns of winds moving east to west after being formed in the eastern Atlantic make threats to residents homes, businesses, and way of life every hurricane season. From June 1st to November 30, citizens must remain aware of hazardous weather conditions except this year, an unexpected global pandemic could cause even more worry. The number of confirmed cases have passed 5 million and deaths have reached over 160,000 in the United States. There is no vaccine created and the close proximity of large groups of people significantly increases the chances of becoming infected. Identifying the areas at risk for Covid-19 and other natural disasters can help provide proper guidance for evacuating families and prevent the further spread of Covid-19. 

In this project, we will use clustering algorithms to attempt to identify clusters based on historical risk for tropical storms and current risk of COVID-19.

## Executive Summary

## Table of Contents
---
- [Imports](#Imports)
- [Read-In Data](#Read-In-Data)
- [Data Cleaning](#Data-Cleaning)
- [Feature Engineering](#Feature-Engineering)
- [Exploratory Data Analysis](#Exploratory-Data-Analysis)
- [Model Preparation](#Model-Preparation)
- [Modeling](#Modeling)
- [Model Selection](#Model-Selection)
- [Model Evaluation and Interpretation](#Model-Evaluation-and-Interpretation)
- [Conclusion & Recommendation](#Conclusion-&-Recommendations)
- [References](#References)

## Imports

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas as gpd
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans, DBSCAN
from sklearn.metrics import silhouette_score
import plotly.express as px
import plotly.figure_factory as ff

RANDOM_STATE = 42

ModuleNotFoundError: No module named 'plotly'

## Read-In Data

For this notebook, we will rely on the hurricane data from IBTrACKS, the COVID-19 data from the New York Times, and the aggregate data that we engineered from these two sources.

### Merge Aggregate Tropical Storms Data with COVID-19 Data

#### Read - In Data

In [None]:
aggregate_storm_data  = pd.read_csv('../data/aggregate_storm_data.csv')
covid_hurricane_states = pd.read_csv('../data/covid_hurricane_states.csv')

In [None]:
aggregate_storm_data.shape

#### Merge Data

In [None]:
hurricanes_and_covid = pd.merge(left = aggregate_storm_data, right = covid_hurricane_states[['state', 'county', 'cases', 'deaths', 'previous_cases', 'previous_deaths','2019_population', 'change_in_case_ratio' ]], how = 'left', on = ['state', 'county'])

#### Select Columns

In [None]:
location_info = ['state', 'county']
hurricane_features = ['cat_1_count', 'cat_2_count', 'cat_3_count', 'cat_4_count', 'cat_5_count', 'hurricane_count', 'tropical_storm_count', 'extratropical_system_count', 'tropical_depression_count', 'low_count', 'subtropical_depression_count', 'dissipating_storm_count']
covid_features = ['cases', 'deaths', 'previous_cases', 'previous_deaths','2019_population','change_in_case_ratio']

features_to_keep = location_info + hurricane_features + covid_features

hurricanes_and_covid = hurricanes_and_covid[features_to_keep]

#### Rename Columns

In [None]:
hurricanes_and_covid.rename(columns = {'date_x':'current_date', 
                                         'cases':'current_cases',
                                         'deaths': 'current_deaths',
                                         'cases_per_100000': 'current_cases_per_100000',
                                         'date_y':'one_week_ago_date'}, inplace = True)

In [None]:
hurricanes_and_covid.shape

### Tropical Storms/ Hurricane Data

In [None]:
geo_df_usa = pd.read_csv('../data/geo_usa.csv')
geo_df_usa.drop(columns = 'Unnamed: 0', inplace = True)
geo_df_usa.head(2)

### COVID-19 Data for States with a History of Tropical Storms

In [None]:
covid_hurricane_states = pd.read_csv('../data/covid_hurricane_states.csv')
covid_hurricane_states.head(3)

## Read-In Geopandas Data

### USA - States

In [None]:
usa_states = gpd.read_file('../maps/states_21basic/states.shp')
usa_states.head(1)

### USA - Counties

In [None]:
usa_counties = gpd.read_file('../maps/cb_2018_us_county_20m/cb_2018_us_county_20m.shp')
usa_counties.head(1)

## Data Cleaning

### Aggregate Data

In [None]:
hurricanes_and_covid.isna().sum()

The null values are missing COVID-19 data so will be dropped.

In [None]:
hurricanes_and_covid.dropna(inplace = True)

### COVID-19 for States with History of Tropical Storms

In [None]:
covid_hurricane_states.isna().sum()

Many of these cases correspond to "Unknown" counties so will be dropped.

In [None]:
covid_hurricane_states.dropna(inplace = True)

### Historical Data for North Atlantic Tropical Storms/ Hurricanes

In [None]:
geo_df_usa.isna().sum()

Because we will only be using this data for plotting, we will be able to ignore missing values.

## Feature Engineering

### Change in Deaths Ratio

In [None]:
hurricanes_and_covid['change_in_deaths_ratio'] = ((hurricanes_and_covid['current_deaths'] - hurricanes_and_covid['previous_deaths']) / hurricanes_and_covid['2019_population']) * 100_000

In [None]:
hurricanes_and_covid.head(2)

### Tropical Storm Composite Score

In addition to the aggregate storm feature and COVID-19 features that we have already engineered, we also wanted to engineer a composite feature to capture the overall historical risk of the hurricanes. Because Category 3, Category 4, and Category 5 hurricanes are the most destructive, we also wanted these storms to carry a higher weight in our composite score.

In [None]:
hurricanes_and_covid['storm_composite'] = hurricanes_and_covid['cat_1_count'] + hurricanes_and_covid['cat_2_count'] + (hurricanes_and_covid['cat_3_count'] ** 2) + (hurricanes_and_covid['cat_4_count'] ** 2) + (hurricanes_and_covid['cat_5_count'] ** 2) + hurricanes_and_covid['tropical_storm_count'] + hurricanes_and_covid['tropical_depression_count']
hurricanes_and_covid['storm_composite'].describe()

We decided to take the sum of the storms, but to give extra weight to Category 3, Category 4, and Category 5 storms, we squared them.



## Exploratory Data Analysis

### Visualize Tropical Storms That Have Made Landfall in the USA

In [None]:
plt.figure(figsize= (15,10))
sns.scatterplot(x = 'longitude', y = 'latitude', hue = 'usa_status', data = geo_df_usa, palette = 'Blues_r', legend = 'full')
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
plt.title('Tropical Storms in the North Atlantic That Made Landfall in the United States (1851-2020)', fontdict = {'fontsize':18}, pad = 8)
plt.xlabel('Longitude', fontdict = {'fontsize':15}, labelpad = 8 )
plt.ylabel('Latitude', fontdict = {'fontsize':15}, labelpad = 8);

From the above plot of coordinates of storms, it is possible to see the outline of the United States. As indicated by the darker colors in these states, many stronger storms hit Florida, Louisiana, and Texas.

### Which states in the USA have experienced the most hurricanes?

In [None]:
plt.figure(figsize = (8, 5))
hurricanes_and_covid.groupby('state')['hurricane_count'].agg('sum').sort_values(ascending=False).head().plot(kind='bar', title='Top 5 Hurricane States')
plt.title('Top 5 Hurricane States', fontdict = {'fontsize':15})
plt.xlabel('States', fontsize=13)
plt.ylabel('Total Count', fontsize=13);

This graph represents the top five states who have experienced the most hurricanes between 1851 and 2020.

### Which counties in the USA have experienced the most hurricanes?

In [None]:
plt.figure(figsize = (8, 5))
hurricanes_and_covid.groupby(['state', 'county'])['hurricane_count'].agg('sum').sort_values(ascending=False).head().plot(kind='bar')
plt.title('Top 5 Hurricane Counties', fontdict = {'fontsize':15})
plt.xlabel('County', fontsize=13)
plt.ylabel('Number of Hurricanes', fontsize=13);

This graph represents the top five counties who have experienced the most hurricanes between 1851 and 2020.

### Which states historically impacted by tropical storms currently have the highest number of COVID-19 cases?

In [None]:
plt.figure(figsize = (8, 5))
hurricanes_and_covid.groupby('state')['current_cases'].agg('sum').sort_values(ascending=False).head().plot(kind='bar')
plt.title('Top 5 COVID-19 States', fontdict = {'fontsize':15}, pad = 8)
plt.xlabel('States', fontsize=13, labelpad = 8)
plt.ylabel('Total Count', fontsize=13, labelpad = 8);

This graph represents the top five states who have the most COVID-19 currently.

### Which counties in the US currently have the most COVID-19 case?

In [None]:
plt.figure(figsize = (8, 5))
hurricanes_and_covid.groupby(['state', 'county'])['current_cases'].agg('sum').sort_values(ascending=False).head().plot(kind='bar')
plt.title('Top 5 COVID-19 Counties', fontdict = {'fontsize':15}, pad = 8)
plt.xlabel('County', fontsize=13)
plt.ylabel('Total Count', fontsize=13);

This graph represents the states with the highest raw count of COVID-19 cases.

### How does the change in cases per 100,000 people compare by county?

#### Set - Up Plotting DF

In [None]:
usa_counties['total_fip'] = usa_counties['STATEFP'].astype(str) + usa_counties['COUNTYFP'].astype(str)
usa_fips = pd.merge(left = usa_counties, right = usa_states, how = 'left', left_on = 'STATEFP', right_on = 'STATE_FIPS')
usa_fips = usa_fips[['STATE_NAME', 'STATEFP', 'geometry_y', 'NAME', 'COUNTYFP', 'geometry_x',]].copy()
usa_fips.dropna(inplace = True) # Values correspond to Puerto Rico

In [None]:
plotting_df = pd.merge(left = hurricanes_and_covid, right = usa_fips, left_on = ['state', 'county'], right_on = ['STATE_NAME', 'NAME'])

plotting_df_abbrev = plotting_df[['state', 'county', 'geometry_x', 'change_in_case_ratio', 'STATEFP','COUNTYFP', 'storm_composite']]

plotting_df_abbrev['total_fips'] = plotting_df_abbrev['STATEFP'].astype(str) + plotting_df_abbrev['COUNTYFP'].astype(str)

#### Plot One-Week Change in Cases per 100,000 People

In [None]:
colorscale = ["#f7fbff","#ebf3fb","#deebf7","#d2e3f3","#c6dbef","#b3d2e9","#9ecae1",
                  "#85bcdb","#6baed6","#57a0ce","#4292c6","#3082be","#2171b5","#1361a9",
                  "#08519c","#0b4083","#08306b"]

endpts = list(np.linspace(0, 500, len(colorscale) - 2))
fips = plotting_df_abbrev['total_fips'].tolist()
values = plotting_df_abbrev['change_in_case_ratio'].tolist()

fig = ff.create_choropleth(
    fips=fips, values=values,
    binning_endpoints=endpts,
    colorscale=colorscale,
    show_state_data=True,
    show_hover=True, centroid_marker={'opacity': 0},
    asp=2.9, title='Weekly Change in COVID-19 Cases per 100,000 People (08/04/20 - 08/11/20)',
    legend_title='Cases per 100,000 People',
    round_legend_values=True
)

fig.layout.template = None
fig.show()

### How does our composite score map to counties?

In [None]:
# Code adapted from Plotly 

colorscale = ["#f7fbff","#ebf3fb","#deebf7","#d2e3f3","#c6dbef","#b3d2e9","#9ecae1",
              "#85bcdb","#6baed6","#57a0ce","#4292c6","#3082be","#2171b5","#1361a9",
              "#08519c","#0b4083","#08306b"]

endpts = list(np.linspace(0, 50, len(colorscale) - 5))
fips = plotting_df_abbrev['total_fips'].tolist()
values = plotting_df_abbrev['storm_composite'].tolist()

fig = ff.create_choropleth(
    fips=fips, values=values,
    binning_endpoints=endpts,
    colorscale=colorscale,
    show_state_data=True,
    show_hover=True, centroid_marker={'opacity': 0},
    asp=2.9, title='Tropical Storm Composite Score by County',
    legend_title='Composite Score',
    round_legend_values=True)

fig.layout.template = None
fig.show()

Based on our previous explorations, we expect counties in Florida, Texas, and Louisiana to be hardest hit. We do see the darkest shades of blue (highest composite scores) in these counties.

### Examine Discriptive Statistics of Data

In [None]:
hurricanes_and_covid.describe()

We do not see a wide range of values for many of the columns in our data frame. Approximately 75% of counties have had 0 hurricanes. The 75th percentile for tropical storms is 2 tropical storms. Additionally, 75% of counties have a change in cases per 100,000 people that is 194.78 or lower. The lack of spread in the data could make clustering difficult, but it is possible that clusters exist at different combinations of values.

## Model Preparation

When we initially set out to model, we hoped to use individual storm features along with covid features. However, there were ~12 storm features to choose from and only two features that captured standardized change in COVID-19. Therefore, preliminary models tended to cluster along hurricane data with little variation in COVID-19. In order to achieve a balance of clusters, when creating models to identify counties at risk for hurricanes and COVID-19, we found that including the composite score and change in cases per 100,000 resulted in the most interpretable clusters. Therefore, we will use these two features in our model.

#### Select Features

In [None]:
X = hurricanes_and_covid[['storm_composite', 'change_in_case_ratio']].dropna()

#### Scale Features

In [None]:
ss = StandardScaler()
X_scaled = ss.fit_transform(X)

## Modeling


### Functions

In [None]:
def km_grid_search(max_clusters, data):
    
    k_list = range(2,max_clusters)
    
    for k in k_list:
        km = KMeans(n_clusters = k, random_state = 42)
        km.fit(data)
        print(f'For k = {k}, the silhouette score is:')
        print(silhouette_score(X_scaled, km.labels_))
        print(f'For k = {k}, the inertia score is:')
        print(km.inertia_)
        print()

In [None]:
def dbs_grid_search(eps_list, min_samples_list, X):
    for epsilon in eps_list:
        for n in min_samples_list:
            dbs = DBSCAN(eps=epsilon, min_samples=n)
            dbs.fit(X)
            print(f'For eps = {epsilon} and min_samples = {n}, the silhouette score is:')
            print(silhouette_score(X, dbs.labels_))
            print()

In [None]:
def describe_cluster(cluster_num, feature_names, df = hurricanes_and_covid, cluster_col = 'cluster'):
    for col in feature_names:
        print(f'The mean for {col} is {df[df[cluster_col] == cluster_num][col].mean()}.')

### Initial Plot of Data

In [None]:
plt.figure(figsize = (8,5))
sns.scatterplot(x = 'storm_composite', y = 'change_in_case_ratio', data = hurricanes_and_covid)
plt.title('Change in Cases per 100,000 by Storm Composite Score', fontdict = {'fontsize':14}, pad = 8)
plt.xlabel('Storm Composite Score')
plt.ylabel('Change in Cases per 100,000 People')

### K-Means Clustering

#### Grid Search

In [None]:
km_grid_search(15, X_scaled)

Based the balance of a moderate silhouette score and an okay inertia, we have decided to fit a model with 5 clusters.

#### Model

In [None]:
km = KMeans(n_clusters = 5, random_state = RANDOM_STATE)
km.fit(X_scaled)

hurricanes_and_covid['km_cluster'] = km.labels_

#### Confirm Matching Silhouette Score to Grid Search

In [None]:
silhouette_score(X_scaled, km.labels_)

The silhouette score is 0.526. Given that a score of 1 indicates that a point is a perfect match to its own cluster and distanced from other clusters, this is a moderate score.

#### Check Cluster Value Counts

In [None]:
hurricanes_and_covid['km_cluster'].value_counts()

From the value counts, we can see that our data was clustered into uneven clusters. However, in comparison to other models that were built, this distribution of points across clusters is okay.

#### Plot Clusters

In [None]:
plt.figure(figsize = (8,5))
sns.scatterplot(x = 'storm_composite', y = 'change_in_case_ratio', data = hurricanes_and_covid, hue = 'km_cluster', legend = 'full', palette = 'coolwarm')
plt.title('Clusters Resulting from KMeans Algorithm', fontdict = {'fontsize':14}, pad = 8)
plt.xlabel('Storm Composite Score')
plt.ylabel('Change in Cases per 100,000 People');

As shown in the plot above, the KMeans Algorithm identified 5 clusters:
* **Cluster 0:** Moderate Covid, Low Storm Composite
* **Cluster 1:** Moderate Covid, Low-Moderate Storm Composite
* **Cluster 2:** High Covid, Low-Moderate Storm Composite
* **Cluster 3:** Low Covid, Low Storm Composite
* **Cluster 4:** Low-Moderate Covid, High Storm Composite

### DBSCAN

Because our data does not show really good separation, we are also interested in building a DBSCAN to see how it will separate the data.

#### Grid Search

In [None]:
eps_list = [0.5, 0.75, 1, 1.5, 2]
min_samples = list(range(2,12))

dbs_grid_search(eps_list, min_samples, X_scaled)

Based on the results of the grid search, we will build a model with `eps` = 2 and `min_samples` = 5. Even though it does not have the highest silhouette score, given the plot of our original data, it might be able to identify some of the points that are spread out from the largest conglomerate of points.

#### Model

In [None]:
dbs = DBSCAN(eps = 2, min_samples = 5)
dbs.fit(X_scaled)

hurricanes_and_covid['dbs_clusters'] = dbs.labels_

#### Check Silhouette Score

In [None]:
silhouette_score(X_scaled, dbs.labels_)

#### Check Cluster Value Counts

In [None]:
hurricanes_and_covid['dbs_clusters'].value_counts()

From these value counts, we can see that the model was not really able to find any patterns in the data. 

#### Plot Clusters

In [None]:
sns.scatterplot(x = 'storm_composite', y = 'change_in_case_ratio', data = hurricanes_and_covid, hue = 'dbs_clusters', legend = 'full', palette = 'coolwarm')
plt.title('Clusters Resulting from DBSCAN Algorithm', fontdict = {'fontsize':14}, pad = 8)
plt.xlabel('Storm Composite Score')
plt.ylabel('Change in Cases per 100,000 People');

From the above plot, we can see that the majority of our data points were clustered as low-medium Covid and low-medium storm composite. The highest levels of Storm Composite are being filtered out as noise.

## Model Selection

|Model|Number of Clusters|Silhouette Score|
|---|---|---|
|KMeans|5|0.526|
|DBSCAN|2|0.861|

Overall, we were surprised by the clusters that emerged from our models. When we first began the project, we were optimistic that a cluster that was high risk for COVID-19 and high risk for tropical storms would emerge. However, after exploring our data, we realized that severe hurricanes are not as common as we thought, and that the variation in COVID-19 cases was also less than expected.

However, given our options for models, due to it's ability to pick up on slight differences in current COVID-19 cases and historical risk of tropical storms, we will choose to further evaluate the KMeans clusters.

## Model Evaluation and Interpretation

### Evaluation

For KMeans Clustering the model can be evaluated for inertia (a measure of error) and th silhouette score (a measure of separation between the clusters).

#### Silhouette Score
As previously stated, the silhouette score for this model is 0.526.

In [None]:
centroids = pd.DataFrame(ss.inverse_transform(km.cluster_centers_), columns = ['storm_composite', 'change_in_covid'])
plt.figure(figsize = (8,5))
ax = sns.scatterplot(x = 'storm_composite', y = 'change_in_case_ratio', data = hurricanes_and_covid, hue = 'km_cluster', legend = 'full', palette = 'coolwarm')
centroids.plot(kind = 'scatter', x = 'storm_composite', y = 'change_in_covid', marker = '*', s = 200, ax = ax, c = ['midnightblue', 'royalblue', 'grey', 'brown', 'darkred'])
plt.title('Clusters Resulting from KMeans Algorithm', fontdict = {'fontsize':14}, pad = 8)
plt.xlabel('Storm Composite Score')
plt.ylabel('Change in Cases per 100,000 People');

When looking at the plot of clusters, it is easy to see why this model does not result in a high silhouette score:
* **Cohesion:** For three clusters (0, 3, 1), there appears to be low cohesion, but for two clusters (2 and 4), points are spread out and cohesion is high.
* **Separation:** In general, the distance between neighboring clusters is low. 

#### Inertia

In [None]:
inertia = km.inertia_
inertia

The inertia (or error metric) for this model is 600.5. In comparison to other KMeans models that were built, this score is somewhere in the middle. Because we do not have a target value, it is hard to know what this actually means for our model.

### Interpretation

In order to understand if our clusters are meaningful, we can look at the mean values for each of the features in our model. As shown in the table below:

Cluster|Storm Composite|Change in COVID-19 cases per 100,000  
-|-|-
0|3.07|317.9
1|11.7|189.6
2|10.5|1716.9
3|1.5|78.5
4|43.7|201.9

***NOTE:*** All reported risks are relative to other clusters. In this sample, even clusters with the lowest COVID-19 means can be considered red zones.

* **Cluster 0** has a low risk of tropical storms and a moderate risk of COVID-19. It is important to note that many counties in this cluster would meet the White House's criteria for a red zone (100 new cases/100,000 people).
* **Cluster 1** has a moderate risk of tropical storms and a lower risk of COVID-19. Many counties in this cluster likely also meet the criteria for a COVID-19 red zone.
* **Cluster 2** has a moderate risk of tropical storms and a high risk of COVID-19.
* **Cluster 3** has a low risk of tropical storms and a low risk for COVID-19. It is the only cluster where the average change in COVID-19 per 100,000 people would not be classified as a red zone.
* **Cluster 4** has a high risk of tropial storms and a moderate relative risk of COVID-19. The average change in COVID-19 per 100,000 people meets criteria for a red zone.

The table below shows the mean counts for individual tropical storm types by cluster:

Cluster|Tropical Depression| Tropical Storm|Cat 1|Cat 2|Cat3|Cat 4|Cat 5| Change in COVID-19 cases per 100,000 
-|-|-|-|-|-|-|-|-
0|1.6|1.3|0.1|0.02|0.01|0|0|317.9
1|3.9|5.8|1.3|0.4|0.1|0.04|0|189.6
2|3.2|5.7|1.1|0.5|0.1|0|0|1716.9
3|0.9|0.6|0.05|0.01|0|0|0|78.5
4|11.1|15.8|6.7|2.3|2.0|0.6|0|201.9

Overall, these values seem to be well-captured by the composite score.

A summary of risk by cluster is presented in the table below:

Cluster| Storm Risk|COVID-19 Risk|Red-Zone
-|-|-|-
0|Low|Moderate|Yes
1|Moderate|Low|Yes
2|Moderate|High|Yes
3|Low|Low|No
4|High|Moderate|Yes

***NOTE:*** Risk is relative to other clusters.

### Geographical Location of Clusters

#### Prepare Data (Add a set of coordinates to counties)

In [None]:
# Note: A single point for each county was generated using the Google Maps API
# Code for this process is not listed in the repo as it will not run without an API key

county_coordinates = pd.read_csv('../data/county_coordinates.csv')
hurricanes_and_covid = pd.merge(left = hurricanes_and_covid, right = county_coordinates, on = ['state', 'county'])

In [None]:
hurricanes_and_covid['latitude'] = hurricanes_and_covid['coordinates'].map(lambda x : x.split()[0].replace('(','').replace(',',''))
hurricanes_and_covid['latitude'] = hurricanes_and_covid['latitude'].map(lambda x: float(x))

hurricanes_and_covid['longitude'] = hurricanes_and_covid['coordinates'].map(lambda x : x.split()[1].replace(')','').replace(',',''))
hurricanes_and_covid['longitude'] = hurricanes_and_covid['longitude'].map(lambda x: float(x))

In [None]:
geo_hc = hurricanes_and_covid[['latitude', 'longitude', 'km_cluster']]

In [None]:
geo_hc = gpd.GeoDataFrame(geo_hc, geometry = gpd.points_from_xy(geo_hc['longitude'], geo_hc['latitude']))

#### Plot Clusters on Map

In [None]:
# https://towardsdatascience.com/finding-and-visualizing-clusters-of-geospatial-data-698943c18fed

fig, ax = plt.subplots(figsize = (16,5))
ax.set_aspect('auto')
usa_states.plot(ax = ax, color = 'lightgrey', edgecolor = 'darkgrey');
plt.figure(figsize = (8, 5))
geo_hc.plot(ax = ax, column = 'km_cluster', cmap = 'Blues_r', ec = 'black', linewidth = 0.25, s = 12, legend = True, categorical = True);
ax.set_title('Plot of Clusters by Latitude and Longitude', fontdict = {'fontsize':15})
ax.set_xlabel('Longitude', fontdict = {'fontsize':14})
ax.set_ylabel('Latitude', fontdict = {'fontsize':14})

In [None]:
plt.figure(figsize = (8, 5))
sns.scatterplot(x = 'longitude', y = 'latitude',hue = 'km_cluster', data = hurricanes_and_covid, palette = 'Blues_r', legend = 'full', edgecolor = 'black', linewidth = 0.35)
plt.title('Plot of Clusters by Latitude and Longitude')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5));

From this map, we can see that Cluster 4 (Highest Risk of Tropical Storms) is located on the coast of Florida and scattered in Texas and Louisiana. Additionally, Cluster 3, which was clustered to have a low risk of tropical storms is clustered farther north. This is what we would expect based on risk of North Atlantic Hurricanes.

## Conclusion & Recommendations

## References
---
[County Populations Data](https://www.census.gov/data/tables/time-series/demo/popest/2010s-counties-detail.html)  
[COVID-19 Data](https://github.com/nytimes/covid-19-data)  
[IBTraACS Data](https://www.ncdc.noaa.gov/ibtracs/index.php?name=introduction)  
[U.S. Counties Shape File](https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html)  
[What is a COVID-19 red zone? by Fast Company](https://www.fastcompany.com/90529280/what-is-a-covid-19-red-zone-do-you-live-in-one-heres-how-to-find-out)  

## Export Final Data Frame to CSV for Use in Command-Line App

In [None]:
#hurricanes_and_covid.to_csv('../data/final_data_with_clusters.csv', index = False)