# Visualizing Foursquare data in Flowmap using the Unitrip format

This notebook shows how to use the module created to convert Foursquare data to the Unitrip format. Then, from the format created, the data can be converted so that it can be visualized in flowmap.blue

## Preamble

In [1]:
import seaborn as sns
import pandas as pd
import geopandas as gpd

%matplotlib inline
sns.set(context='notebook', font='Lucida Sans Unicode', style='white', palette='plasma')

Import Foursquare data: POIs and Check-ins

In [2]:
pois = pd.read_csv('../data/other_format/foursquare/raw_POIs.txt', sep='\t', names=['venue_id', 'lat', 'lon', 'category', 'country']) 
pois.head()

Unnamed: 0,venue_id,lat,lon,category,country
0,3fd66200f964a52000e61ee3,40.729209,-73.998753,Post Office,US
1,3fd66200f964a52000e71ee3,40.733596,-74.003139,Jazz Club,US
2,3fd66200f964a52000e81ee3,40.758102,-73.975734,Gym,US
3,3fd66200f964a52000ea1ee3,40.732456,-74.003755,Indian Restaurant,US
4,3fd66200f964a52000ec1ee3,42.345907,-71.087001,Indian Restaurant,US


In [3]:
check_ins = pd.read_csv('../data/other_format/foursquare/raw_Checkins_anonymized.txt', sep='\t', names=['user_id', 'venue_id', 'datetime', 'utc_offset']) # dd.read_csv is not working with the current version of pyarrow
check_ins.head()

Unnamed: 0,user_id,venue_id,datetime,utc_offset
0,546830,4f5e3a72e4b053fd6a4313f6,Tue Apr 03 18:00:06 +0000 2012,240
1,822121,4b4b87b5f964a5204a9f26e3,Tue Apr 03 18:00:07 +0000 2012,180
2,2277773,4a85b1b3f964a520eefe1fe3,Tue Apr 03 18:00:08 +0000 2012,-240
3,208842,4b4606f2f964a520751426e3,Tue Apr 03 18:00:08 +0000 2012,-300
4,1139878,4d9254ef62ad5481fa6e6a4b,Tue Apr 03 18:00:08 +0000 2012,-180


The municipalities from which the trips are to be studied are listed.

In [4]:
municipalities = '''Cerrillos
La Reina
Pudahuel
Cerro Navia
Las Condes
Quilicura
Conchalí
Lo Barnechea
Quinta Normal
El Bosque
Lo Espejo
Recoleta
Estación Central
Lo Prado
Renca
Huechuraba
Macul
San Miguel (Chile)
Independencia (Chile)
Maipú
San Joaquín (Chile)
La Cisterna
Ñuñoa
San Ramón (Chile)
La Florida
Pedro Aguirre Cerda
Santiago de Chile
La Pintana
Peñalolén
Vitacura
La Granja (Chile)
Providencia
Peñaflor (Chile)
San Bernardo (Chile)
Padre Hurtado
Puente Alto
'''.strip().split('\n')

The module is used to convert the format to unitrip.

In [25]:
import sys
import os.path
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(os.getcwd()))))

from unitrip.format_conversion.foursquare_converter import foursquare_to_unitrip

foursquare_to_unitrip(pois, check_ins, municipalities, '../data/unified_format/santiago_foursquare_unitrip2.parquet', 12)

Retrieving municipalities data from Open Street Maps ...
There are 95670 POIs in the area bounding box
There are 92094 POIs in the municipalities
Gathering checkins that are inside the municipalities...
There are 1306426 Check-ins in the municipalities
Crossing check-ins data with POIs location data...
Calculating the h3 cell index per check-in...
Sorting the table per user and datetime...
Building origin-destination pairs with each user's movements ...
There are 1265338 trips
Building trips with the Unitrip format ...
Dataframe ready to store
File stored at ../data/unified_format/santiago_foursquare_unitrip2.parquet


## Let's aggregate trips and create flows in unitrip format (uniflows)

Load the parquet to a dataframe with the unitrip data

In [26]:
santiago_unitrip = pd.read_parquet('../data/unified_format/santiago_foursquare_unitrip2.parquet', columns=['user_id', 'o_h3_cell', 'd_h3_cell'])
santiago_unitrip.head()

Unnamed: 0_level_0,user_id,o_h3_cell,d_h3_cell
trip_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,230.0,8cb2c556db34dff,8cb2c50939a2dff
1,230.0,8cb2c50939a2dff,8cb2c5543754bff
2,230.0,8cb2c5543754bff,8cb2c55452c45ff
3,230.0,8cb2c55452c45ff,8cb2c556aa861ff
4,230.0,8cb2c556aa861ff,8cb2c5198793dff


The aggregation will be performed using h3 cells, with a specific level of resolution and filtering with a minimum number of trips per flow

In [27]:
from unitrip.unified_format.trip_to_flow import unitrip_to_uniflow

unitrip_to_uniflow(santiago_unitrip, '../data/unified_format/santiago_foursquare_uniflow.parquet', flow_res=7, minimun_trips=8)

Set the h3 cell columns to a parent resolution ...
Trips from one cell to the same cell are deleted, and the same OD trips made by a user are grouped together. ...
There are 550107 unique trips
Aggregating trips of the same OD and with a minimun number of trips ...
There are 8609 flows
File stored at ../data/unified_format/santiago_foursquare_uniflow.parquet
