## Instance: GPSAnalytics()
In this notebook you can find all the necessary steps to compute descriptive statistics on the staypoint dataframe. It also requires the leg for computing distances for instance.

The objective for the library user is to get two dafaframes :
- the df at the end of PART 1 through the following pipeline
    - `check_inputs(leg, staypoint)` To be done: A small function to check if the input data have the right columns else ask user to adapt input data
    - `split_overnight()`
    - `spatial_clustering()`
    - `get_metrics()`
    
- the df at the end of PART 2
    - `get_daily_metrics()`

In [None]:
#from gps_analytics import *
from xyt import GPSAnalytics
import pandas as pd
from functions_preprocessing import *

%reload_ext autoreload
%autoreload 2

Load staypoints

In [None]:
%%time
# READ FILES
act = pd.read_pickle('sample_data/staypoint_sample_panel.pkl').reset_index()
act.rename(columns={'IDNO':'user_id', 'id':'activity_id'}, inplace=True)
del act['type']

In [None]:
act.head(2)

In [None]:
%%time
# Extract longitude and latitude into separate columns
act['lon'] = act['geometry'].apply(lambda point: point.x)
act['lat'] = act['geometry'].apply(lambda point: point.y)
#Parse the activity df to datetime and geopandas
act = parse_time_geo_data(act, geo_columns=['lon','lat'], datetime_format='%Y-%m-%d %H:%M:%S', CRS2='EPSG:2056')
del act['geometry']

In [None]:
act.head(3)

Load legs

In [None]:
%%time
leg = pd.read_pickle('sample_data/leg_sample_panel.pkl').reset_index()
leg.rename(columns={'id':'leg_id', 'IDNO':'user_id'}, inplace=True)
leg['started_at'] = pd.to_datetime(leg['started_at'])
leg['finished_at'] = pd.to_datetime(leg['finished_at'])

# Add the leg destination activity_id
leg = find_next_activity_id(leg, act)

# Add a 'length' column in meters
leg = gpd.GeoDataFrame(leg, geometry='geometry', crs='EPSG:4327')
leg['length'] = leg.to_crs(crs='EPSG:2056').length

# Calculate the duration in seconds and add a 'duration' column in minutes
leg['duration'] = (leg['finished_at'] - leg['started_at']).dt.total_seconds() / 60

leg.head(2)


## Part 1

**Data format**

In order to perform Part 1, you must have a staypoint df and a leg df with at least the following columns : 
```python
staypoint.columns = ['activity_id', 'started_at', 'finished_at',
       'purpose', 'user_id', 'lon', 'lat']
```
```python
leg.columns = ['leg_id', 'started_at', 'finished_at',
       'detected_mode', 'mode', 'user_id', 'geometry', 'next_activity_id',
       'length', 'duration']
```
Pay attention to the format of (in particular) the columns with datetimes or geometries.
Also, having a `purpose == 'home'`will help complete the calculations.

**XYT instance implementation**

Output of part 1 is an extended staypoint df with extra columns
```python
extended_staypoint = GPSAnalytics().metrics()
extended_staypoint.columns = ['leg_id', 'started_at', 'finished_at',
       'detected_mode', 'mode', 'user_id', 'geometry', 'next_activity_id',
       'length', 'duration''cluster', 'cluster_size', 'cluster_info', 'location_id',
       'peak', 'first_dep', 'last_arr', 'home_loop', 'daily_trip_dist',
       'num_trip', 'max_dist', 'min_dist', 'max_dist_from_home',
       'dist_from_home', 'home_location_id', 'weekday']

```

In [None]:
metrics = GPSAnalytics()

- `GPSAnalytics().metrics.split_overnight()`

In [None]:
%%time
#split the overnight activity into last and first activities
staypoint1 = metrics.split_overnight(act)

- `GPSAnalytics().metrics.spatial_clustering()`

In [None]:
%%time
staypoint2 = metrics.spatial_clustering(staypoint1)

- `GPSAnalytics().metrics.get_metrics()`

In [None]:
%%time
extended_staypoint = metrics.get_metrics(staypoint2, leg)

In [None]:
extended_staypoint.head()

## Part 2

- `GPSAnalytics().metrics.get_daily_metrics()`

Aggregate per day

In [None]:
%%time
metrics.get_daily_metrics(extended_staypoint)