# Mobility Data Wrangling

## Data Sources

* ISTAT, [Confini delle unità amministrative a fini statistici al 1° Gennaio 2020](https://www.istat.it/it/archivio/222527). (2020)
* OpenPolis. [Limits of Italian Provinces](https://github.com/openpolis/geojson-italy/blob/master/geojson/limits_IT_provinces.geojson). *GitHub* (2019)
* Pepe, E., Bajardi, P., Gauvin, L. et al. [COVID-19 outbreak response, a dataset to assess mobility changes in Italy following national lockdown](https://doi.org/10.1038/s41597-020-00575-2). *Scientific Data* 7, 230 (2020).

## Modules

In [38]:
# Data Wrangling
import numpy as np
import pandas as pd

# Data Visualization
import matplotlib.pyplot as plt 
import matplotlib.ticker as ticker
%pylab inline

# Geospatial Data Wrangling 
import geopandas as gpd
from shapely.geometry import Point, LineString

# Mobility
import skmob

# Basic Utilities 
import warnings
warnings.filterwarnings('ignore')

Populating the interactive namespace from numpy and matplotlib


## Flows

The [file](https://data.humdata.org/dataset/40a9ea9e-0edb-49f7-a440-6aee3015961b/resource/5319b9e6-17e5-43ce-81be-c4a801c9a454/download/od_matrix_daily_flows_norm_full_2020_01_18_2020_06_26.csv) contains the daily fraction of users' moving between Italian provinces. Each line corresponds to an entry of the origin-destination matrix (i,j). The fields of the table are: - p1: COD PROV of origin, - p2: COD PROV of destination, - days in the format yyyy-mm-dd.

In [39]:
flows = pd.read_csv("/Users/Pit/GitHub/DigitalEpidemiologyProject/Data/CSV/2020/Pepe2020/Flows.csv").rename(columns={'p1':'origin', 'p2':'destination'})
flow = flows[['2020-01-18', 'origin', 'destination']].rename(columns={'2020-01-18': 'flow'}).to_csv("/Users/Pit/GitHub/DigitalEpidemiologyProject/Data/CSV/2020/Pepe2020/flow.csv", index=False)

In [40]:
tessellation = gpd.read_file("/Users/Pit/GitHub/DigitalEpidemiologyProject/Data/CSV/2020/Pepe2020/Provinces.geojson") # load a tessellation
tessellation = tessellation[['prov_name', 'prov_istat_code_num', 'geometry']].rename(columns={'prov_name':'name', 'prov_istat_code_num':'code'})
tessellation

###tessellation = gpd.read_file("/Users/Pit/GitHub/DigitalEpidemiologyProject/Data/Shapefiles/ProvCM01012020/ProvCM01012020_WGS84.shp") # load a tessellation
###tessellation =  tessellation[['COD_PROV', 'DEN_PROV', 'geometry']].rename(columns={'DEN_PROV':'name', 'COD_PROV':'code'})
###tessellation.name[0]='Torino'

Unnamed: 0,name,code,geometry
0,Torino,1,"POLYGON ((7.89397 45.58222, 7.89654 45.57985, ..."
1,Vercelli,2,"POLYGON ((7.92900 45.74244, 7.92584 45.74196, ..."
2,Novara,3,"POLYGON ((8.42079 45.82981, 8.42028 45.83010, ..."
3,Cuneo,4,"MULTIPOLYGON (((6.94540 44.42794, 6.94734 44.4..."
4,Asti,5,"POLYGON ((7.96685 45.11667, 7.96729 45.11673, ..."
...,...,...,...
102,Sassari,90,"MULTIPOLYGON (((9.46502 40.65584, 9.46475 40.6..."
103,Nuoro,91,"MULTIPOLYGON (((9.28037 39.91741, 9.27741 39.9..."
104,Cagliari,92,"MULTIPOLYGON (((9.00622 39.32697, 9.01541 39.3..."
105,Oristano,95,"MULTIPOLYGON (((8.78200 40.18982, 8.78829 40.1..."


In [41]:
flow_data = skmob.FlowDataFrame.from_file("/Users/Pit/GitHub/DigitalEpidemiologyProject/Data/CSV/2020/Pepe2020/flow.csv",tessellation=tessellation, tile_id='code', sep=",")
flow_data

Unnamed: 0,flow,origin,destination
0,0.9276,1,1
1,0.0060,1,2
2,0.0020,1,3
3,0.0275,1,4
4,0.0095,1,5
...,...,...,...
5133,0.0000,111,90
5134,0.0000,111,91
5135,0.0826,111,92
5136,0.0165,111,95


In [42]:
# The tessellation is an attribute of the FlowDataFrame
flow_data.tessellation.head() 

Unnamed: 0,name,tile_ID,geometry
0,Torino,1,"POLYGON ((7.89397 45.58222, 7.89654 45.57985, ..."
1,Vercelli,2,"POLYGON ((7.92900 45.74244, 7.92584 45.74196, ..."
2,Novara,3,"POLYGON ((8.42079 45.82981, 8.42028 45.83010, ..."
3,Cuneo,4,"MULTIPOLYGON (((6.94540 44.42794, 6.94734 44.4..."
4,Asti,5,"POLYGON ((7.96685 45.11667, 7.96729 45.11673, ..."


In [43]:
flow_data.plot_flows()