## Instance: GPStoActionspace()
### Input
Takes as input the output of `GPSAnalytics().metrics.get_metrics()` -> act (dataframe)

### Introduction
This notebook applies centrography to the MOBIS data at state 2, focusing on activity-level aggregation.

The objective is to generate key metrics characterizing the activity space for a more in-depth exploration of spatial familiarity.

Spatial familiarity metrics encompass a composite evaluation of location history, daily activity-space variability, and spatial innovation. Achieving this involves intricate data transformations utilizing advanced point-pattern centrography. Leveraging a dataset with labeled locations, including purpose and visit counts, over a specific time frame, marked point pattern analysis (PPA) facilitates the study of individual action spaces (Baddeley, Rubak, and Turner 2015).

The implementation of centrography (utilizing the Python Spatial Analysis library) extracts characteristics to describe the activity space:

- **Points**: Marked visited places with counts of visits, purpose labels (home, work, leisure, duties), unique location IDs, and intensity (average number of event points per unit of the convex hull area).
- **Centers**: The mean center and weighted mean centers (weighted by the count of visits).
- **Distances**: Standard distance, offering a one-dimensional measure of how dispersed visited locations are around their mean center, and the sum of distances from home.
- **Shapes**: Standard deviational ellipse, providing a two-dimensional measure of the dispersion of visited locations, and the minimum convex hull of frequently visited places.

This approach predominantly relies on the Python library for spatial analysis, PySAL.

### Public methods
The public methods should be the following:
- `GSPtoActionspace().compute_action_space(act, aggreg_method = 'user_id'/'user_id_day',plot_ellipses = False)` -> AS (dataframe) Get from Part 0 and 1 and 2 below
- `GSPtoActionspace().covariance_matric(AS)` Get from Part 3 below
- `GSPtoActionspace().plot_action_space(act, AS, user_subset = ['CH15029', 'CH16871'], how = 'vignette'/'folium', save = False)` Get from Part 4 below
- `GPStoActionspace().inno_rate(mtf_, AS_day, user_id_, phase=None, treatment=None)` Get from Part 5 below

### Methodology

$$
\text{(Eq. 3) } \quad
Regularity = \dfrac{n_f + 1}{n} \text{; with $n_f$ the number of frequently visited locations and $n$ the total number of locations} 
\\ \text{(Eq. 4) }  \quad
    Frequency=
    \begin{cases}
      \text{'most visited'}, & \text{for}\ \arg\max({f_i}) \\
      \text{'frequent visits'}, & \text{for}\ f_i > 0.5 \cdot \arg\max({f_i}) \\
      \text{'occasional visits'}, & \text{for}\ f_i \leq 0.5 \cdot \arg\max({f_i}) \\
      \text{'visited once'}, & \text{if}\ f_i = 1
    \end{cases} 
    \text{with $f_i$ the count of visits at location $i$}
\\ \text{(Eq. 5) }  \quad
    Proximity =
    \dfrac{SD_{freq}}{SD_{all}} \text{ is }
        \begin{cases}
          > 1 \text{ for dispersed habitual activity space and close innovative activity space} \\
          \approx 1 \text{ for homogeneous activity spaces} \\
          < 1 \text{ for dispersed innovative activity space and close habitual activity space }
        \end{cases} 
\\
\qquad  \text{Given the coordinates $(x,y)$ of $i$ locations :}
\\
\qquad  \text{(5.1) } \text{with}  \quad
SD_{freq} = \displaystyle \sqrt{\frac{\sum^n_{i=1}(x_i-x_{home})^2}{n} + \frac{\sum^n_{i=1}(y_i-y_{home})^2}{n}}
\text{  } \forall \text{nodes } n_i(x_i,y_i)\in C_f = \{\text{'frequently visited places', 'most visited place'}\}
\\ \qquad \text{(5.2) } \text{and}  \quad
SD_{all} = \displaystyle \sqrt{\frac{\sum^n_{i=1}(x_i-x_{home})^2}{n} + \frac{\sum^n_{i=1}(y_i-y_{home})^2}{n}}
\text{  } \forall \text{nodes } n_i(x_i,y_i) \in C =\{\text{'all visited places}\}
\\ \text{(Eq. 6) }  \quad
    \textit{Home shift} = 
    \sqrt {\left( {x_{home} - x_{wmc} } \right)^2 + \left( {y_{home} - y_{wmc} } \right)^2 }
$$

In [None]:
from xyt import GPSAnalytics, GPStoActionspace, GPStoGraph

from functions_preprocessing import *

%reload_ext autoreload
%autoreload 2

action_space = GPStoActionspace()

### Part 0
Preprocess the input so rest of the finctions work
maybe better to add this line directly in the GPSAnalytics().get_metrics method ?

In [None]:
act = pd.read_pickle('sample_data/staypoint_sample_panel.pkl').reset_index()
act.rename(columns={'IDNO':'user_id', 'id':'activity_id'}, inplace=True)
# Extract longitude and latitude into separate columns
act['lon'] = act['geometry'].apply(lambda point: point.x)
act['lat'] = act['geometry'].apply(lambda point: point.y)
#Parse the activity df to datetime and geopandas
act = parse_time_geo_data(act, geo_columns=['lon','lat'], datetime_format='%Y-%m-%d %H:%M:%S', CRS2='EPSG:2056')
del act['geometry']

leg = pd.read_pickle('sample_data/leg_sample_panel.pkl').reset_index()
leg.rename(columns={'id':'leg_id', 'IDNO':'user_id'}, inplace=True)
leg['started_at'] = pd.to_datetime(leg['started_at'])
leg['finished_at'] = pd.to_datetime(leg['finished_at'])

# Add the leg destination activity_id
leg = find_next_activity_id(leg, act)

# Add a 'length' column in meters
leg = gpd.GeoDataFrame(leg, geometry='geometry', crs='EPSG:4327')
leg['length'] = leg.to_crs(crs='EPSG:2056').length

# Calculate the duration in seconds and add a 'duration' column in minutes
leg['duration'] = (leg['finished_at'] - leg['started_at']).dt.total_seconds() / 60

metrics = GPSAnalytics()

staypoint1 = metrics.split_overnight(act)
staypoint2 = metrics.spatial_clustering(staypoint1)
act_orig = act.copy()
act = metrics.get_metrics(staypoint2, leg)

act.cluster_size = act.cluster_size.astype(int) #make sure we have the right format

In [None]:
#I need to get the MAIN home location ID first, as in the data above one user_id may have different home location for each day.

# Group by 'user_id' and the date part of 'started_at'
grouped = act.groupby(['user_id', act['started_at'].dt.date])

# Find the most recurrent 'home_location_id' for each user and day
most_recurrent_home = grouped['home_location_id'].agg(lambda x: x.value_counts().idxmax())

# Most recurrent home_id per user
most_recurrent_home_id = most_recurrent_home.value_counts().idxmax()

# Create a mapping of user_id and date to the most recurrent home_location_id
mapping = most_recurrent_home.reset_index().set_index(['user_id','started_at']).to_dict()['home_location_id']

# Map the values to the original DataFrame to create the new column
act['main_home_location_id'] = act.set_index(['user_id', act['started_at'].dt.date]).index.map(mapping)

In [None]:
act.head()

### Part 1
Process the action space metrics per user_id (all days, one_user) or per user_id_day (one day, one user)
- `GSPtoActionspace().compute_action_space(act, aggreg_method = 'user_id'/'user_id_day',plot_ellipses = False)`

In [None]:
#aggregation_method = 'user_id_day'  # Change this to 'user_id' or 'user_id_day'
aggregation_method = 'user_id'  # Change this to 'user_id' or 'user_id_day'
act_spc = action_space.compute_action_space(act, aggregation_method=aggregation_method)
mymap = action_space.plot_ellipses(act_spc, aggregation_method=aggregation_method)


In [None]:
mymap

In [None]:
act_spc.head()

### Part 2
Plot the ellipses, get one color per user and user_id on hover.

### Part 3
Covariance matrix of the Action Space Args we computed
- `GSPtoActionspace().covariance_matric(AS)`

In [None]:
action_space.covariance_matrix(action_space=act_spc)

### Part 4
More plots, including the points from `act` and ellipse from `AS`
- `GSPtoActionspace().plot_action_space(act, AS, user_subset = ['CH15029', 'CH16871'], how = 'vignette'/'folium', save = False)` 

In [None]:
act_spc.head()

In [None]:
#action_space.plot_action_space(act, act_spc, user="CH16871_20230605", how="vignette", save=False)
action_space.plot_action_space(act, act_spc, user="CH16871", how="vignette", save=False)


In [None]:
#action_space.plot_action_space(act, act_spc, user="CH16871_20230605", how="folium", save=False)
action_space.plot_action_space(act, act_spc, user="CH16871", how="folium", save=False)

### Part 5
**Method**
- `GPStoActionspace().inno_rate(mtf_, AS_day, user_id_, phase=None, treatment=None)`

**Objective**

The objective here is to plot the innovation rate

**The necessary data for this method are :**
- `GPStoGraph().get_graphs()` -> mtf_ 
- `GSPtoActionspace().compute_action_space(act, aggreg_method = 'user_id_day')` -> AS_day

**phase and treatment**

GPS data often come with a treatment e.g. {control, treat_1, treat_2} or phase column e.g. {before, after}. Here it is not the case, so no hue is necessary on the plots. But please keep the option of plotting hue for different treatment or phase

In [None]:
act_spc.columns

In [None]:
leg = pd.read_pickle('sample_data/leg_sample_panel.pkl').reset_index()
leg.rename(columns={'id':'leg_id', 'IDNO':'user_id'}, inplace=True)
leg['started_at'] = pd.to_datetime(leg['started_at'])
leg['finished_at'] = pd.to_datetime(leg['finished_at'])

# Add the leg destination activity_id
leg = find_next_activity_id(leg, act)

# Add a 'length' column in meters
leg = gpd.GeoDataFrame(leg, geometry='geometry', crs='EPSG:4327')
leg['length'] = leg.to_crs(crs='EPSG:2056').length

# Calculate the duration in seconds and add a 'duration' column in minutes
leg['duration'] = (leg['finished_at'] - leg['started_at']).dt.total_seconds() / 60


metrics = GPSAnalytics()

staypoint1 = metrics.split_overnight(act_orig)
staypoint2 = metrics.spatial_clustering(staypoint1)
extended_staypoint = metrics.get_metrics(staypoint2, leg)
day_staypoint = metrics.get_daily_metrics(extended_staypoint)

graphs = GPStoGraph()
multiday_graph = graphs.get_graphs(extended_staypoint)

action_space.get_inno_rate_per_phase(act_spc, multiday_graph)


the script below is implemented in case of phase and treatment. In the sample data I have no phase and no treatment, but I still want to be able to plot the innovation rate. Sorry it is messy. It you cannot adapt I will do it myself later on.

In [None]:
to_plot = 800
#Init df to be populated with the innovation rates per cluster
df_innov_rate_1 = pd.DataFrame(index=range(500),columns=user_id_clstr1[:to_plot]) 
df_innov_rate_2 = pd.DataFrame(index=range(500),columns=user_id_clstr2[:to_plot]) 
df_innov_rate_3 = pd.DataFrame(index=range(500),columns=user_id_clstr3[:to_plot]) 


treatment_ = 'Pricing' #Pricing, Nudging, Control


#Init df to have the mean innovation rates in a single df
mean_innov_rate = pd.DataFrame(index=range(500),columns=['exclusive_phase1', 'moderate_phase1', 'mixed_phase1','exclusive_phase2', 'moderate_phase2', 'mixed_phase2'])

for phase_ in [1,2]:
    #Exclusive car users, user_id_clstr1
    for user_id_ in user_id_clstr1[:to_plot]:
        try:
            y = get_inno_rate_per_phase(mtf_treatment, user_id_, phase=phase_, treatment=treatment_)
            x = np.arange(0, len(y), 1).tolist()
            df_innov_rate_1.loc[x, user_id_] = y
        except:
            continue        
    
    mean_innov_rate.loc[:, 'exclusive_phase%d' %phase_] = df_innov_rate_1.mean(axis=1)
    
    #Moderate car users, user_id_clstr2
    for user_id_ in user_id_clstr2[:to_plot]:
        try:
            y = get_inno_rate_per_phase(mtf_treatment, user_id_, phase=phase_, treatment=treatment_)
            x = np.arange(0, len(y), 1).tolist()
            df_innov_rate_2.loc[x, user_id_] = y
        except:
            continue        
    
    mean_innov_rate.loc[:, 'moderate_phase%d' %phase_] = df_innov_rate_2.mean(axis=1)

    #Mixed car users, user_id_clstr3
    for user_id_ in user_id_clstr3[:to_plot]:
        try:
            y = get_inno_rate_per_phase(mtf_treatment, user_id_, phase=phase_, treatment=treatment_)
            x = np.arange(0, len(y), 1).tolist()
            df_innov_rate_3.loc[x, user_id_] = y
        except:
            continue        
    
    mean_innov_rate.loc[:, 'mixed_phase%d' %phase_] = df_innov_rate_3.mean(axis=1)

mean_innov_rate.head(3)    

In [None]:
plot_innov_rate_cluster1 = sns.lineplot(data=mean_innov_rate[['exclusive_rate_phase1','moderate_rate_phase1', 'mixed_rate_phase1']][:25]) #, legend=False
#plot_innov_rate_cluster1.get_figure().savefig("innov_rate_modal_clus__phase1%s.png"%treatment_, dpi=300)