<center><img src="logo_skmob.png" width=450 align="left" /></center>

# Introduction

- Repo: [http://bit.ly/skmob_repo](http://bit.ly/skmob_repo)
- Docs: [http://bit.ly/skmob_doc](http://bit.ly/skmob_doc)
- Paper: [http://bit.ly/skmob_paper](http://bit.ly/skmob_paper)



## What is scikit-mobility?

a library to analyze <font color="blue">*mobility data*</font>, suited for working with:

- **trajectories** composed by lat/long points (e.g., GPS data)
- **fluxes** of movements between places (e.g., OD matrix)


In [None]:
# import the library
import skmob

scikit-mobility provides two user-friendly data structures that extends the *pandas* `DataFrame`:

- `TrajDataFrame` - for spatio-temporal <font color="blue">**trajectories**</font>
- `FlowDataFrame` - for <font color="blue">**fluxes**</font> mapped into a tessellation


### What you can do with scikit-mobility?

- **Preprocessing** of mobility data
- **Measuring** individual and collective behaviours
- **Assessing** privacy risk
- **Predicting** migration flows
- <font color="grey">**Generating** synthetic trajectories</font>
    

## `TrajDataFrame`


Each row describes a trajectory's point and contains the following columns:

- `lat` - latitude of the point
- `lng` - longitude of the point
- `datetime` - date and time of the point

For multi-user data sets, there are two *optional* columns:

- `uid` - user's identifier to which the trajectory belongs to
- `tid` - identifier for the trajectory

A `TrajDataFrame` can be created from:

- a python list or *numpy* array
- a python dictionary
- a *pandas* `DataFrame`
- a text file

### From a `list`

In [None]:
# From a list
data_list = [[1, 39.984094, 116.319236, '2008-10-23 13:53:05'],
             [1, 39.984198, 116.319322, '2008-10-23 13:53:06'],
             [1, 39.984224, 116.319402, '2008-10-23 13:53:11'],
             [1, 39.984211, 116.319389, '2008-10-23 13:53:16']]
data_list

We must set the indexes of the mandatory columns using arguments `latitude`, `longitude` and `datetime`.

In [None]:
tdf = skmob.TrajDataFrame(data_list, 
                          latitude=1, longitude=2, 
                          datetime=3)
print(type(tdf))
tdf

### From a `DataFrame`

In [None]:
# import the pandas library
import pandas as pd 
# build a dataframe from the 2D list
data_df = pd.DataFrame(data_list, 
                       columns=['user', 'latitude', 'lng', 'hour']) 

In [None]:
print(type(data_df)) # type of the structure
data_df.head() # head of the DataFrame

Note that: 
- name of columns in `data_df` don't match the names required
- you must specify the names of the mandatory columns using arguments `latitude`, `longitude` and `datetime` 

In [None]:
# Create a TrajDataFrame from a DataFrame
tdf = skmob.TrajDataFrame(data_df, 
                          latitude='latitude', 
                          datetime='hour', 
                          user_id='user')

print(type(tdf))
tdf.head()

### From a text file

Class `TrajDataFrame` has a method `from_file` to construct the object from an input text file.

Let's try with a subsample of the <font color="blue">**GeoLife**</font> trajectories. The whole dataset can be found [here](https://www.microsoft.com/en-us/download/details.aspx?id=52367).

In [None]:
# create a TrajDataFrame from a dataset of trajectories 
tdf = skmob.TrajDataFrame.from_file(
    './data/geolife_sample.txt.gz', sep=',')
print(type(tdf))

In [None]:
# explore the TrajDataFrame
tdf.head()

### Attributes of a `TrajDataFrame`


- `crs`: the coordinate reference system. Default: `epsg:4326` (lat/long)
- `parameters`: dictionary to add as many as necessary additional properties

In [None]:
tdf.crs

In [None]:
tdf.parameters

In [None]:
# add your own parameter
tdf.parameters['something'] = 5
tdf.parameters

Columns of `TrajDataFrame` have specific types

In [None]:
# In the DataFrame
print(type(data_df))
data_df.dtypes

In [None]:
print(type(tdf)) # In the TrajDataFrame
tdf.dtypes

In [None]:
tdf.lat.head()

### Write and read 

To write/read a `TrajDataFrame` into a file, scikit-mobility provides ad-hoc methods.

#### Writing a `TrajDataFrame` to a file

- includes the `parameters` and `crs`attributes
- preserves `dtype` of columns with timestamps (time zone info is lost though).

In [None]:
skmob.write(tdf, './tdf.json')

In [None]:
tdf.parameters

### Read a `TrajDataFrame` from a json file

In [None]:
# read the file written before
tdf2 = skmob.read('./tdf.json') 
tdf2[:4]

`dtype`s and the `parameters` and `crs` attributes are preserved

In [None]:
print(tdf2.dtypes)
tdf2.parameters

### Plotting trajectories and flows

*scikit-mobility* relies on the *folium* library to plot:
- trajectories
- flows
- tessellations

In [None]:
tdf.plot_trajectory(zoom=12, weight=3, opacity=0.9, tiles='Stamen Toner', start_end_markers=False)

## `FlowDataFrame`

Each row describes a flow and contains the columns:

- `origin`: ID of the origin tile
- `destination`: ID of the destination tile
- `flow`: number of people travelling from `origin` to `destination`

<!-- NOTE: `FlowDataFrame` is a dataframe way of having Origin-Destination Matrix. -->

### Tessellation
Each `FlowDataFrame` is associated  with a <font color="blue">**tessellation**</font>, i.e., a `GeoDataFrame` that  contains two columns:
- `tile_ID`, identifier of a location
- `geometry`, geometric shape of the location

### Create of a `FlowDataFrame`

The `FlowDataFrame` can be created from:

- a python list or a numpy array
- a *pandas* `DataFrame`
- a python dictionary
- a text file


### From a file

method `from_file` creates a `FlowDataFrame` from a text file with the format:
    
- `origin`, `destination`, `flow`, `datetime` (optional)


In [None]:
import geopandas as gpd # Let's import geopandas

In [None]:
tessellation = gpd.GeoDataFrame.from_file(
    "data/NY_counties_2011.geojson") # load a tessellation

# create a FlowDataFrame from a file and a tessellation
fdf = skmob.FlowDataFrame.from_file(
    "data/NY_commuting_flows_2011.csv",
    tessellation=tessellation, tile_id='tile_id', sep=",")

In [None]:
fdf.head()

In [None]:
fdf.dtypes

In [None]:
# The tessellation is an attribute of the FlowDataFrame
fdf.tessellation.head() 

### Plot the tessellation

In [None]:
fdf.plot_tessellation(popup_features=['tile_ID', 'population']) 

### Plot the flows

In [None]:
fdf.plot_flows(flow_color='green')

### Plot tessellation and flows

In [None]:
tess_style = {'color':'gray', 'fillColor':'gray', 'opacity':0.2}
map_f = fdf.plot_tessellation(style_func_args=tess_style)
fdf[fdf['origin'] == '36061'].plot_flows(map_f=map_f, flow_exp=0., flow_popup=True)

## Construction of a `tessellation`

It can be created from:

- the name of the area of interest, 
    - e.g. `"Florence, Italy"`
- a `GeoDataFrame` with Points or Polygons

In [None]:
from skmob.tessellation import tilers
from skmob.utils import plot
# Create tessellation from a base shape
tessellation = tilers.tiler.get("squared", meters=500, 
                                base_shape="Florence, Italy")
print(len(tessellation))
print(tessellation.head())

In [None]:
plot.plot_gdf(tessellation, zoom=12, popup_features=['tile_ID'], style_func_args=tess_style)

## Tessellation from `TrajDataFrame`
- using method `to_geodataframe()` 

In [None]:
# tdf contains trajectories from GeoLife
gdf = tdf.to_geodataframe() 
gdf.head()

In [None]:
tessellation = tilers.tiler.get("squared", base_shape=gdf, meters=100000)
# NOTE: It accepts also geodataframe with list of polygons

In [None]:
print(len(tessellation))
tessellation.head()

In [None]:
map_f = plot.plot_gdf(tessellation, zoom=4, popup_features=['tile_ID'], style_func_args=tess_style)
tdf.plot_trajectory(map_f=map_f) 

In [None]:
a_tdf = tdf[tdf['uid']==1]
a_gdf = a_tdf.to_geodataframe() 
a_tessellation = tilers.tiler.get("squared", base_shape=a_gdf, meters=1000)
a_tessellation.shape

In [None]:
map_f = plot.plot_gdf(a_tessellation, zoom=11, popup_features=['tile_ID'], style_func_args=tess_style)
a_tdf.plot_trajectory(map_f=map_f)

## Mapping point to corresponding tile

In [None]:
mapped_a_tdf = a_tdf.mapping(a_tessellation)
mapped_a_tdf.head()

## Select points within a tessellation

In [None]:
haidian_tess = tilers.tiler.get("squared", base_shape='Haidian, China', meters=1000)
map_f = plot.plot_gdf(haidian_tess, zoom=11, popup_features=['tile_ID'], style_func_args=tess_style)
tdf.plot_trajectory(map_f=map_f)

In [None]:
mapped_tdf = tdf.mapping(haidian_tess, remove_na=True)
map_f = plot.plot_gdf(haidian_tess, zoom=11, popup_features=['tile_ID'], style_func_args=tess_style)
mapped_tdf.plot_trajectory(map_f=map_f)

## From `TrajDataFrame` to `FlowDataFrame`

In [None]:
# remove_na removes points not contained in the tessellation
fdf = tdf.to_flowdataframe(tessellation=haidian_tess, self_loops=True, remove_na=True)
fdf.head()

In [None]:
fdf.plot_flows(flow_exp=0., zoom=11)

## The curious case of Tokyo
https://en.wikipedia.org/wiki/Tokyo

In [None]:
tss_tko = tilers.tiler.get("squared", base_shape='Tokyo, Japan', meters=10000)
plot.plot_gdf(tss_tko, zoom=5, popup_features=['tile_ID'], style_func_args=tess_style)