## Exploração de dados do [Google Mobility](https://www.google.com/covid19/mobility/)

#### Data Overview
The Community Mobility Reports show movement trends by region, across different categories of places. For each category in a region, reports show the changes in 2 different ways:

* Headline number: Compares mobility for the report date to the baseline day. Calculated percent change for the latest day (or report date), unless there are gaps. 
* Trend graph: The percent changes in the 6 weeks before the report date. Shown as a graph.

If we didn't have enough data to confidently and anonymously estimate the change from the baseline, you’ll see gaps and the headline number is the most-recent calculated change.

Avoid comparing places across regions. Regions can have local differences in the data which might mislead. Location accuracy and the understanding of categorized places varies from region to region, so we don’t recommend using this data to compare changes between countries, or between regions with different characteristics (e.g. rural versus urban areas).

#### Baseline
The data shows how visitors to (or time spent in) categorized places change compared to our baseline days. A baseline day represents a normal value for that day of the week. The baseline day is the median value from the 5‑week period Jan 3 – Feb 6, 2020. The datasets show trends over several months with the most recent data representing approximately 2-3 days ago—this is how long it takes to produce the datasets.

For each region-category, the baseline isn’t a single value—it’s 7 individual values. The same number of visitors on 2 different days of the week, result in different percentage changes. So, we recommend the following:

* Don’t infer that larger changes mean more visitors or smaller changes mean less visitors.
* Avoid comparing day-to-day changes. Especially weekends with weekdays.

How did we pick perfectly normal baseline days? We probably haven’t—a short period of the year can't represent normal for every region on our planet. We picked a recent period, before widespread disruption as communities responded to COVID-19. Even so, for some regions, the baseline falls during a time when COVID-19 was established. To interpret the data for your region, follow the [local checklist](https://support.google.com/covid19-mobility/checklist/9834261).

To help you track week-to-week changes, the baseline days never change. These baseline days also don't account for seasonality. For example, visitors to parks typically increase as the weather improves.

#### Place categories
To make the reports useful, we use categories to group some of the places with similar characteristics for purposes of social distancing guidance. For example, we combine grocery and pharmacy as these tend to be considered essential trips. Each high-level category contains many types of places—some might not be obvious.
* Grocery & pharmacy: Mobility trends for places like grocery markets, food warehouses, farmers markets, specialty food shops, drug stores, and pharmacies.
* Parks: Mobility trends for places like local parks, national parks, public beaches, marinas, dog parks, plazas, and public gardens.
* Transit stations: Mobility trends for places like public transport hubs such as subway, bus, and train stations.
* Retail & recreation: Mobility trends for places like restaurants, cafes, shopping centers, theme parks, museums, libraries, and movie theaters.
* Residential: Mobility trends for places of residence.
* Workplaces: Mobility trends for places of work.

#### Understand the data
Gaps and spikes: You might see data gaps for some categories in your region. These gaps are intentional and happen because the data doesn’t meet the quality and privacy threshold—when there isn’t enough data to ensure anonymity.

Expectations: Vacations and public holidays can help you understand what your community looks like when people don’t go to places of work.

Small residential changes: The Residential category shows a change in duration—the other categories measure a change in total visitors. Because people already spend much of the day at places of residence (even on workdays), the capacity for change isn’t so large. **You shouldn’t compare the change in Residential with other categories because they have different units of measurement.**

Weekends: Remember that these mobility reports show relative changes, and not absolute visitors or duration. For example, if few people normally visit places of work on a Sunday, you wouldn’t expect to see large changes to Sunday visitors as your community responds to COVID-19.

### Publishing
If you publish results based on this data set, please cite as:

Google LLC "Google COVID-19 Community Mobility Reports".
https://www.google.com/covid19/mobility/ Accessed: <Date>.

In [1]:
from datetime import date
import os
import re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.ticker import FormatStrFormatter
import seaborn as sns
from ipywidgets import interact

%matplotlib inline
sns.set_style('whitegrid')

pd.set_option('display.max_rows', 150)

In [2]:
date.today().strftime("%d/%m/%y")

'03/06/20'

Mobility dataframe data from 'retail_and_recreation', 'grocery_and_pharmacy', 'parks', 'transit_stations', 'workplaces', 'residential' is percentage from baseline

In [3]:
dtypes = {'country_region_code': 'object', 'country_region': 'object', 'sub_region_1': 'object', 'sub_region_2': 'object'}
date_parser = lambda x: pd.datetime.strptime(x, '%Y-%m-%d')

mobility = pd.read_csv("2020-05-29 - Global_Mobility_Report.csv", dtype = dtypes, parse_dates=['date'], date_parser=date_parser)
mobility = mobility.rename(columns=lambda i: re.sub('_percent_change_from_baseline','',i))

In [4]:
mobility.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 458565 entries, 0 to 458564
Data columns (total 11 columns):
country_region_code      457777 non-null object
country_region           458565 non-null object
sub_region_1             445233 non-null object
sub_region_2             265611 non-null object
date                     458565 non-null datetime64[ns]
retail_and_recreation    358425 non-null float64
grocery_and_pharmacy     345821 non-null float64
parks                    212781 non-null float64
transit_stations         244044 non-null float64
workplaces               445735 non-null float64
residential              251216 non-null float64
dtypes: datetime64[ns](1), float64(6), object(4)
memory usage: 38.5+ MB


In [5]:
mobility.head(3)

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,date,retail_and_recreation,grocery_and_pharmacy,parks,transit_stations,workplaces,residential
0,AE,United Arab Emirates,,,2020-02-15,0.0,4.0,5.0,0.0,2.0,1.0
1,AE,United Arab Emirates,,,2020-02-16,1.0,4.0,4.0,1.0,2.0,1.0
2,AE,United Arab Emirates,,,2020-02-17,-1.0,1.0,5.0,1.0,2.0,1.0


In [9]:
mob_br = mobility[(mobility['country_region'] == 'Brazil') & mobility['sub_region_1'].isna()].copy()
mob_br.drop(['country_region_code', 'sub_region_1', 'sub_region_2'], axis=1, inplace=True)
mob_br.reset_index(inplace=True, drop=True)
mob_br.head()

Unnamed: 0,country_region,date,retail_and_recreation,grocery_and_pharmacy,parks,transit_stations,workplaces,residential
0,Brazil,2020-02-15,5.0,4.0,-5.0,8.0,6.0,0.0
1,Brazil,2020-02-16,2.0,3.0,-13.0,3.0,0.0,1.0
2,Brazil,2020-02-17,-2.0,0.0,-12.0,9.0,19.0,-1.0
3,Brazil,2020-02-18,-3.0,-1.0,-11.0,9.0,15.0,-1.0
4,Brazil,2020-02-19,-1.0,-2.0,-5.0,8.0,14.0,-1.0


In [14]:
mob_states = mobility[(mobility['country_region'] == 'Brazil') & mobility['sub_region_1'].notna()].copy()
mob_states.drop(['country_region_code', 'sub_region_2'], axis=1, inplace=True)
mob_states.reset_index(inplace=True, drop=True)
mob_states.head()

Unnamed: 0,country_region,sub_region_1,date,retail_and_recreation,grocery_and_pharmacy,parks,transit_stations,workplaces,residential
0,Brazil,Federal District,2020-02-15,10.0,7.0,-8.0,8.0,8.0,0.0
1,Brazil,Federal District,2020-02-16,11.0,6.0,-10.0,4.0,0.0,2.0
2,Brazil,Federal District,2020-02-17,2.0,4.0,-1.0,11.0,22.0,-2.0
3,Brazil,Federal District,2020-02-18,1.0,4.0,2.0,14.0,22.0,-2.0
4,Brazil,Federal District,2020-02-19,-2.0,0.0,-3.0,9.0,20.0,-2.0


In [15]:
mob_states['sub_region_1'].unique()

array(['Federal District', 'State of Acre', 'State of Alagoas',
       'State of Amapá', 'State of Amazonas', 'State of Bahia',
       'State of Ceará', 'State of Espírito Santo', 'State of Goiás',
       'State of Maranhão', 'State of Mato Grosso',
       'State of Mato Grosso do Sul', 'State of Minas Gerais',
       'State of Pará', 'State of Paraíba', 'State of Paraná',
       'State of Pernambuco', 'State of Piauí', 'State of Rio de Janeiro',
       'State of Rio Grande do Norte', 'State of Rio Grande do Sul',
       'State of Rondônia', 'State of Roraima', 'State of Santa Catarina',
       'State of São Paulo', 'State of Sergipe', 'State of Tocantins'],
      dtype=object)

In [17]:
mob_states['sub_region_1'] = mob_states['sub_region_1'].apply(lambda i: re.sub('State of ', '', i))
mob_states['sub_region_1'].unique()

array(['Federal District', 'Acre', 'Alagoas', 'Amapá', 'Amazonas',
       'Bahia', 'Ceará', 'Espírito Santo', 'Goiás', 'Maranhão',
       'Mato Grosso', 'Mato Grosso do Sul', 'Minas Gerais', 'Pará',
       'Paraíba', 'Paraná', 'Pernambuco', 'Piauí', 'Rio de Janeiro',
       'Rio Grande do Norte', 'Rio Grande do Sul', 'Rondônia', 'Roraima',
       'Santa Catarina', 'São Paulo', 'Sergipe', 'Tocantins'],
      dtype=object)