### Introduction

- This analysis is based on the [COVID-19 Community Mobility Reports](https://www.google.com/covid19/mobility/index.html?hl=en) created by Google. The reports aim to provide insights into what has changed in response to policies aimed at combating COVID-19. The reports chart movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential.
- In order to run the analysis, make download the [original dataset](https://www.google.com/covid19/mobility/index.html?hl=en) from Google and run all the code chunks in the `data_cleaning.ipynb` to create the cleaned dataframe that only focuses on the trend in the U.S
- We also combined it with the [COVID-19 dataset](https://github.com/nytimes/covid-19-data) from New York Times, trying to find how people from different states are responding to the number of confirmed cases and deaths.

### Loading data

In [2]:
import pandas as pd
df = pd.read_csv('US_Mobility_Report.csv')

In [3]:
df.sample(10)

Unnamed: 0,state,county,date,retail,grocery,parks,transit,workplaces,residential
309206,SC,Lexington County,2020-04-10,-36.0,-2.0,22.0,-21.0,-51.0,20.0
354239,TX,Sabine County,2020-04-02,,18.0,,,-29.0,
354996,TX,Shelby County,2020-04-21,,,,,-32.0,
299214,PA,Luzerne County,2020-03-14,-3.0,17.0,47.0,4.0,0.0,2.0
104960,IN,Rush County,2020-03-13,-6.0,30.0,,,0.0,
57459,GA,Coweta County,2020-06-24,-3.0,-2.0,,31.0,-32.0,11.0
135326,KY,Floyd County,2020-02-16,13.0,-5.0,,,,
314634,SD,Oglala Lakota County,2020-06-30,,,,,-42.0,
348652,TX,Maverick County,2020-06-23,-23.0,-7.0,-15.0,29.0,-38.0,11.0
91300,IL,Rock Island County,2020-06-16,-19.0,12.0,16.0,-23.0,-30.0,8.0


In [49]:
# Convert the date column to datetime type
df.date = pd.to_datetime(df.date, format='%Y-%m-%d')

### What does the number stand for

- Changes for each day are compared to a baseline value for that day of the week:
 - The baseline is the median value, for the corresponding day of the week, during the 5-week period **Jan 3–Feb 6, 2020.**
 - The datasets show trends over several months with the most recent data representing approximately **2-3** days ago—this is how long it takes to produce the datasets.

### Place categories
- **Grocery & pharmacy**: Mobility trends for places like grocery markets, food warehouses, farmers markets, specialty food shops, drug stores, and pharmacies.
- **Parks**: Mobility trends for places like local parks, national parks, public beaches, marinas, dog parks, plazas, and public gardens.
- **Transit stations**: Mobility trends for places like public transport hubs such as subway, bus, and train stations.
- **Retail & recreation**: Mobility trends for places like restaurants, cafes, shopping centers, theme parks, museums, libraries, and movie theaters.
- **Residential**: Mobility trends for places of residence.
- **Workplaces**: Mobility trends for places of work.

### Now let's move on to the COVID data

In [62]:
covid_df = pd.read_csv('covid_cases.csv')
covid_df.sample(10)

Unnamed: 0,date,state,cases,deaths
6037,2020-06-20,NE,17707,249
3240,2020-04-30,,14,2
2427,2020-04-15,VT,759,29
5073,2020-06-02,WY,912,17
2117,2020-04-10,ID,1425,25
4832,2020-05-29,NY,373108,29535
4276,2020-05-19,MT,471,16
7259,2020-07-12,PA,99794,6950
4447,2020-05-22,NY,362991,28802
1922,2020-04-06,OK,1326,51


In [63]:
# Convert the date column to datetime type 
covid_df.date = pd.to_datetime(covid_df.date, format='%Y-%m-%d')

### As a starter, let's only focus on the state level data

In [64]:
# Aggregate the data for each state 
final_df = df.groupby(['state', 'date']).mean().reset_index()

In [65]:
final_df.sample(10)

Unnamed: 0,state,date,retail,grocery,parks,transit,workplaces,residential
6994,VA,2020-03-28,-34.230769,-7.818966,18.52381,-25.837209,-24.787879,12.125
3449,MI,2020-06-25,17.654545,17.862745,175.230769,-3.105263,-25.3375,5.489362
1074,CT,2020-06-20,-5.875,9.125,225.5,-1.666667,-9.375,2.125
1779,IA,2020-03-27,-40.2375,-6.238806,28.0,-18.157895,-28.62766,16.904762
6051,RI,2020-04-02,-45.2,-18.8,-2.75,-53.0,-51.0,22.0
3204,ME,2020-03-30,-46.1875,-23.125,-30.8,-48.0,-40.375,17.333333
3494,MN,2020-03-04,9.228571,4.288136,4.363636,-2.454545,2.35,-0.863636
2095,IL,2020-03-27,-41.066667,-8.333333,-2.470588,-29.818182,-38.010417,19.486486
5227,NY,2020-02-28,0.47541,-1.344262,-6.242424,2.057143,0.47541,1.137931
378,AR,2020-04-17,-20.956522,-5.428571,12.666667,-21.526316,-31.173913,14.421053


In [66]:
final_df = final_df.merge(covid_df, how="inner", on=['state', 'date'])

In [67]:
final_df.head(10)

Unnamed: 0,state,date,retail,grocery,parks,transit,workplaces,residential,cases,deaths
0,AK,2020-03-12,16.285714,18.8,10.0,2.0,-4.5,1.6,1,0
1,AK,2020-03-13,6.285714,16.666667,6.0,0.666667,-10.0,3.25,1,0
2,AK,2020-03-14,7.428571,17.6,23.0,-2.333333,-3.5,1.0,1,0
3,AK,2020-03-15,8.714286,13.0,62.0,-3.333333,-0.2,0.666667,1,0
4,AK,2020-03-16,3.714286,16.4,30.0,-7.333333,-9.8,3.25,3,0
5,AK,2020-03-17,-5.857143,7.4,-1.0,-14.666667,-12.545455,6.5,6,0
6,AK,2020-03-18,-13.285714,5.6,5.0,-19.0,-17.363636,9.5,9,0
7,AK,2020-03-19,-26.5,2.8,-1.0,-20.333333,-20.0,10.75,12,0
8,AK,2020-03-20,-26.428571,-5.714286,24.0,-26.666667,-20.545455,12.25,14,0
9,AK,2020-03-21,-31.142857,-6.666667,-10.0,-36.666667,-18.0,9.0,21,0


In [68]:
# dump the data for faster access from the app
final_df.to_pickle('app.data')

### Let's quickly "confirm" the COVID cases trend in each state

In [90]:
import plotly.express as px
state_df = final_df.loc[final_df.state=="NY"].copy()
fig = px.line(state_df, x='date', y='cases')
fig.show()

### Also "confirm" the mobility trend in each category

In [91]:
fig = px.line(state_df, x='date', y='retail')
fig.show()

In [92]:
# Look at six different categories at the same time
fig = px.scatter(state_df, x="date", y=["retail", "grocery", "parks", "transit", "workplaces", "residential"])
fig.show()

### Let's take a look at what drives the number of COVID cases

In [93]:
state_df.corr()

Unnamed: 0,retail,grocery,parks,transit,workplaces,residential,cases,deaths
retail,1.0,0.79651,0.504589,0.962824,0.726982,-0.836614,0.060394,0.124095
grocery,0.79651,1.0,0.508476,0.760076,0.514916,-0.62931,0.17806,0.235186
parks,0.504589,0.508476,1.0,0.50043,0.331781,-0.574282,0.620662,0.639043
transit,0.962824,0.760076,0.50043,1.0,0.801526,-0.875765,0.011073,0.07835
workplaces,0.726982,0.514916,0.331781,0.801526,1.0,-0.926589,-0.211,-0.153014
residential,-0.836614,-0.62931,-0.574282,-0.875765,-0.926589,1.0,-0.0308,-0.088878
cases,0.060394,0.17806,0.620662,0.011073,-0.211,-0.0308,1.0,0.995669
deaths,0.124095,0.235186,0.639043,0.07835,-0.153014,-0.088878,0.995669,1.0


### Another way to look at this data is to check how people from each state are reacting to the COVID number

In [94]:
# Calculate the daily new cases using the diff method
state_df['new_cases'] = final_df['cases'].diff()
state_df['new_deaths'] = final_df['deaths'].diff()

In [96]:
state_df.corr()

Unnamed: 0,retail,grocery,parks,transit,workplaces,residential,cases,deaths,new_cases,new_deaths
retail,1.0,0.79651,0.504589,0.962824,0.726982,-0.836614,0.060394,0.124095,-0.677868,-0.730407
grocery,0.79651,1.0,0.508476,0.760076,0.514916,-0.62931,0.17806,0.235186,-0.532021,-0.598767
parks,0.504589,0.508476,1.0,0.50043,0.331781,-0.574282,0.620662,0.639043,-0.313326,-0.427352
transit,0.962824,0.760076,0.50043,1.0,0.801526,-0.875765,0.011073,0.07835,-0.664696,-0.723322
workplaces,0.726982,0.514916,0.331781,0.801526,1.0,-0.926589,-0.211,-0.153014,-0.519465,-0.550666
residential,-0.836614,-0.62931,-0.574282,-0.875765,-0.926589,1.0,-0.0308,-0.088878,0.589405,0.636626
cases,0.060394,0.17806,0.620662,0.011073,-0.211,-0.0308,1.0,0.995669,-0.125967,-0.122834
deaths,0.124095,0.235186,0.639043,0.07835,-0.153014,-0.088878,0.995669,1.0,-0.187406,-0.191991
new_cases,-0.677868,-0.532021,-0.313326,-0.664696,-0.519465,0.589405,-0.125967,-0.187406,1.0,0.742625
new_deaths,-0.730407,-0.598767,-0.427352,-0.723322,-0.550666,0.636626,-0.122834,-0.191991,0.742625,1.0
