# Does latitude impact the spread of COVID-19?
* Here I use public datasets that are hosted on Kaggle to demonstrate that there are geographic variations in both SARS-CoV-2 infection rates and COVID-19 mortality rates. Specifically, I indentify a specific range of latitudes that allows for rapid spread of the COVID-19 coronavirus disease.

*Step 1: Import Python packages and load the data*

In [1]:
# Import Python Packages
import pandas as pd
import numpy as np
import plotly.express as px
import warnings 
warnings.filterwarnings('ignore')

# Load Data
coordinates = pd.read_csv('/kaggle/input/latitude-and-longitude-for-every-country-and-state/world_country_and_usa_states_latitude_and_longitude_values.csv')
country_coordinates = coordinates[['country_code','latitude','longitude','country']]
state_coordinates = coordinates[['usa_state_code','usa_state_latitude','usa_state_longitude','usa_state']]
df = pd.read_csv('/kaggle/input/novel-corona-virus-2019-dataset/covid_19_data.csv')
df['Country/Region'].replace(['Mainland China'], 'China',inplace=True)
df['Country/Region'].replace(['US'], 'United States',inplace=True)
df['Country'] = df['Country/Region']
df = df[df.ObservationDate==np.max(df.ObservationDate)]
todays_date = '3/11/2020'

# Mortality rate for every country in the dataset
df_deaths = pd.DataFrame(df.groupby('Country')['Deaths'].sum())
df_confirmed = pd.DataFrame(df.groupby('Country')['Confirmed'].sum())
df_confirmed['Deaths'] = df_deaths['Deaths']
df_global = df_confirmed
df_global['Mortality Rate'] = np.round((df_global.Deaths.values/df_global.Confirmed.values)*100,2)
df_global = df_global.reset_index()
df_global = df_global.merge(country_coordinates, left_on='Country', right_on='country')
df_global = df_global[['Country','Confirmed','Deaths','Mortality Rate','latitude','longitude','country_code']]
df_global.columns = ['Country','Confirmed','Deaths','Mortality Rate','Latitude','Longitude','Country_Code']
df_global.to_csv('/kaggle/working/global_covid19_mortality_rates.csv')

# Mortality rate for every state in the USA
df_usa = df[df['Country/Region']=='United States']
df_usa = df_usa[df_usa.ObservationDate==np.max(df_usa.ObservationDate)]
df_usa['State'] = df_usa['Province/State']
df_usa['Mortality Rate'] = np.round((df_usa.Deaths.values/df_usa.Confirmed.values)*100,2)
df_usa.sort_values('Mortality Rate', ascending= False).head(10)
df_usa = df_usa.merge(state_coordinates, left_on='State', right_on='usa_state')
df_usa['Latitude'] = df_usa['usa_state_latitude']
df_usa['Longitude'] = df_usa['usa_state_longitude']
df_usa = df_usa[['State','Confirmed','Deaths','Recovered','Mortality Rate','Latitude','Longitude','usa_state_code']]
df_usa.columns = ['State','Confirmed','Deaths','Recovered','Mortality Rate','Latitude','Longitude','USA_State_Code']
df_usa.to_csv('/kaggle/working/usa_covid19_mortality_rates.csv')

*Step 2: Map Spread of COVID-19 for Every Country*

In [2]:
fig = px.choropleth(df_global, 
                    locations="Country", 
                    color="Confirmed", 
                    locationmode = 'country names', 
                    hover_name="Country",
                    range_color=[0,5000],
                    title='Global COVID-19 Infections as of '+todays_date)
fig.show()

fig = px.choropleth(df_global, 
                    locations="Country", 
                    color="Deaths", 
                    locationmode = 'country names', 
                    hover_name="Country",
                    range_color=[0,50],
                    title='Global COVID-19 Deaths as of '+todays_date)
fig.show()

fig = px.choropleth(df_global, 
                    locations="Country", 
                    color="Mortality Rate", 
                    locationmode = 'country names', 
                    hover_name="Country",
                    range_color=[0,5],
                    title='Global COVID-19 Mortality Rates as of '+todays_date)
fig.show()

*Step 3: Plot Spread of COVID-19 for Every Country*

In [3]:
fig = px.bar(df_global.sort_values('Confirmed',ascending=False)[0:10], 
             x="Country", 
             y="Confirmed",
             title='Global COVID-19 Infections as of '+todays_date)
fig.show()

fig = px.bar(df_global.sort_values('Deaths',ascending=False)[0:10], 
             x="Country", 
             y="Deaths",
             title='Global COVID-19 Deaths as of '+todays_date)
fig.show()

fig = px.bar(df_global.sort_values('Deaths',ascending=False)[0:10], 
             x="Country", 
             y="Mortality Rate",
             title='Global COVID-19 Mortality Rates as of '+todays_date+' for Countries with Top 10 Most Deaths')
fig.show()

*Step 4: Map Spread of COVID-19 for USA State*

In [4]:
fig = px.choropleth(df_usa, 
                    locations="USA_State_Code", 
                    color="Confirmed", 
                    locationmode = 'USA-states', 
                    hover_name="State",
                    range_color=[0,50],scope="usa",
                    title='Global COVID-19 Infections as of '+todays_date)
fig.show()

fig = px.choropleth(df_usa, 
                    locations="USA_State_Code", 
                    color="Deaths", 
                    locationmode = 'USA-states', 
                    hover_name="State",
                    range_color=[0,20],scope="usa",
                    title='Global COVID-19 Deaths as of '+todays_date)
fig.show()

fig = px.choropleth(df_usa, 
                    locations="USA_State_Code", 
                    color="Mortality Rate", 
                    locationmode = 'USA-states', 
                    hover_name="State",
                    range_color=[0,5],scope="usa",
                    title='Global COVID-19 Mortality Rate as of '+todays_date)
fig.show()

*Step 5: Plot Spread of COVID-19 for USA State*

In [5]:
fig = px.bar(df_usa.sort_values('Confirmed',ascending=False)[0:10], 
             x="State", 
             y="Confirmed",
             title='USA COVID-19 Infections as of '+todays_date)
fig.show()

fig = px.bar(df_usa.sort_values('Deaths',ascending=False)[0:10], 
             x="State", 
             y="Deaths",
             title='USA COVID-19 Deaths as of '+todays_date)
fig.show()

fig = px.bar(df_usa.sort_values('Deaths',ascending=False)[0:10], 
             x="State", 
             y="Mortality Rate",
             title='USA COVID-19 Mortality Rates as of '+todays_date+' for USA States with Top 10 Most Deaths')
fig.show()

*Step 6: Plot COVID-19 vs Latitude for Every Country*

In [6]:
df_global2 = df_global
df_global2['Latitude'] = abs(df_global2['Latitude'])
df_global2 = df_global2[df_global2['Country']!='China']

fig = px.scatter(df_global2.sort_values('Deaths',ascending=False), 
             x="Latitude", 
             y="Confirmed",
             title='Global COVID-19 Infections vs Absolute Value of Latitude Coordinate as of '+todays_date)
fig.show()

fig = px.scatter(df_global2.sort_values('Deaths',ascending=False), 
             x="Latitude", 
             y="Deaths",
             title='Global COVID-19 Deaths vs Absolute Value of Latitude Coordinate as of '+todays_date)
fig.show()
fig = px.scatter(df_global2.sort_values('Deaths',ascending=False), 
             x="Latitude", 
             y="Mortality Rate",
             title='Global COVID-19 Mortality Rates vs Absolute Value of Latitude Coordinate as of '+todays_date)
fig.show()
df_global.sort_values('Mortality Rate', ascending= False).head(10)

Unnamed: 0,Country,Confirmed,Deaths,Mortality Rate,Latitude,Longitude,Country_Code
68,Morocco,5,1,20.0,31.791702,-7.09262,MA
18,Bulgaria,7,1,14.29,42.733883,25.48583,BG
76,Panama,8,1,12.5,8.537981,-80.782127,PA
47,Iraq,71,7,9.86,33.223191,43.679291,IQ
1,Albania,12,1,8.33,41.153332,20.168331,AL
50,Italy,12462,827,6.64,41.87194,12.56738,IT
4,Argentina,19,1,5.26,38.416097,-63.616672,AR
56,Lebanon,61,3,4.92,33.854721,35.862285,LB
46,Iran,9000,354,3.93,32.427908,53.688046,IR
24,China,80921,3161,3.91,35.86166,104.195397,CN


**Conclusion: There appears to be a range of latitudes between abs(30) and abs(45) that have the highest infection rates and mortality rates **

*Step 7: Plot COVID-19 vs Latitude for Every USA State*

In [7]:
fig = px.scatter(df_usa.sort_values('Deaths',ascending=False), 
             x="Latitude", 
             y="Mortality Rate",
             title='USA States COVID-19 Mortality Rates vs Absolute Value of Latitude Coordinate as of '+todays_date)
fig.show()
df_usa.sort_values('Mortality Rate', ascending= False).head(10)

Unnamed: 0,State,Confirmed,Deaths,Recovered,Mortality Rate,Latitude,Longitude,USA_State_Code
21,South Dakota,8,1,0,12.5,43.969515,-99.901813,SD
0,Washington,366,29,1,7.92,47.751074,-120.740139,WA
5,Florida,28,2,0,7.14,27.664827,-81.515754,FL
8,New Jersey,23,1,0,4.35,40.058324,-74.405661,NJ
2,California,177,3,2,1.69,36.778261,-119.417932,CA
33,Utah,3,0,0,0.0,39.32098,-111.093731,UT
26,Minnesota,5,0,0,0.0,46.729553,-94.6859,MN
27,Nebraska,5,0,0,0.0,41.492537,-99.901813,NE
28,New Hampshire,5,0,0,0.0,43.193852,-71.572395,NH
29,Rhode Island,5,0,0,0.0,41.580095,-71.477429,RI


**Conclusion: The data from within the USA supports that there is a range of latitudes between abs(30) and abs(45) that have the highest infection rates and mortality rates**

# Conclusion

**Conclusion: There appears to be a range of latitudes between abs(30) and abs(45) that have the highest infection rates and mortality rates.  This is most obvious in the global data but is also supported by the data that is broken down for every USA state.**