# Accessibility analysis step-by-step

## 0. Dataset choice

This notebook is prepared so that you can easily perform analysis of your dataset of choice. We have already performed 3 analyses on the following regions:
1. Paris proper,
2. Petite Couronne (Paris, Hauts-de-Seine, Seine-Saint-Denis, Val-de-Marne),
3. Petite Couronne without Paris.
    

The data you will need:

1. geodataframe containing the spatial + statistical data of the area,
2. .pbf file with OSM data of the area
3. .gtfs file with transit data of the area

For customisation, assign to the variables below the names of your files:

In [None]:
gdf_name = "pcparis.gpkg" 
pbf_name = "pcparis.pbf"
gtfs_name = "IDFM.gtfs.zip"

In order to run the analysis on the files that we used, you should use the gtfs_name as above, and modify the gdf/pbf names accordingly:
1. paris.gdf/pbf for Paris
2. pcparis.gdf/pbf for Petite Couronne
3. pc.gdf/pbf for Petite Couronne without Paris

## 1. Preparation

In [None]:
import os
import sys
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), os.pardir)))
os.environ['USE_PYGEOS'] = '0'
import pysal.lib
import helpers as hs
from importlib import reload
import folium
import pandas as pd
import geopandas as gpd
import r5py
import shapely
import time
import datetime

1. Read the geopackage containing geographical data + amenities statistic (the creation of such geopackage is shown in GeneralExtraction notebook). Be careful about the working directory.

In [None]:
%cd ../data
gdf = gpd.read_file(gdf_name, layer="cool")

2. Inspect the geodataframe. r5py requires a column named 'id', therefore we rename the correspodning id column

In [None]:
gdf.rename(columns={'IdINSPIRE':'id'}, inplace = True)
gdf.iloc[14,:]

3. Example: number of restaurants in each square

In [None]:
hs.folium_color_map(gdf,"restaurant")

4. Additional variable of interest, the total number of housing units:

In [None]:
gdf['Log'] = gdf['Log_inc'] + gdf['Log_av45'] + gdf['Log_45_70'] + gdf['Log_70_90'] + gdf['Log_ap90']

## Travel times computation

1. Initialise r5py with the OSM extract and GTFS file

In [None]:
transport_network = r5py.TransportNetwork(pbf_name, [gtfs_name])

In [None]:
gdf_centered = gdf.copy()
gdf_centered['geometry'] = gdf_centered['geometry'].apply(lambda x : x.centroid)

travel_time_matrix_computer = r5py.TravelTimeMatrixComputer(
    transport_network,
    origins=gdf_centered,
    destinations=gdf_centered,
    departure=datetime.datetime(2024,10,8,11,59),
    transport_modes=[
        r5py.TransportMode.TRANSIT,
        r5py.TransportMode.WALK,
    ],
)

6. Compute the travel times matrix. Warning: this can take up to 1h hour on Petite Couronne. You can skip this step and just import the already performed calculations in the next step.

In [None]:
travel_times = travel_time_matrix_computer.compute_travel_times()
travel_times.to_csv('travel_times_' + gdf_name[:-5] + '.csv')  

7. This step is if you skipped the previous step.

In [None]:
travel_times = pd.read_csv('travel_times_' + gdf_name[:-5] + '.csv')

8. Change the format into a square matrix.

In [None]:
travel_times_square = hs.vert_to_square(travel_times)

9. Calculate the weights between squares.

In [None]:
weights_by_id = hs.transfer_time_to_weight_faster(travel_times_square)

10. Inspect the resulting dataframe:

In [None]:
weights_by_id.head()

11. Example: weights with reference to Chatelet Les Halles.

In [None]:
df_reset = weights_by_id.reset_index()

value = 'CRS3035RES200mN2890000E3760400'
losc = df_reset.where(df_reset == value).stack().index.tolist()
print(losc)

example = gdf[["geometry", "id"]].assign(weight_0 = df_reset.iloc[losc[0][0],1:].values)#
example

In [None]:
hs.folium_color_map(example, "weight_0")

## 2SFCA

1. Calculate the 2SFCA score. Important: don't skip the first line! both gdf and weights_by_id need to have the same id column!

In [None]:
gdf = gdf.set_index('id')
interestVar = ['Log','Log_soc','restaurant',
       'culture and art', 'education', 'food_shops', 'fashion_beauty',
       'supply_shops']
accessibility_measures = hs.calculate_2SFCA_accessibility(gdf,interestVar,weights_by_id)

In [None]:
for var in interestVar:
    gdf[var+"_access"] = accessibility_measures[var]

2. Inspect the results.

In [None]:
gdf.head()

In [None]:
hs.folium_color_map(gdf, "Log_soc_access", cmap = 'Reds')

## Aggregation of accessibility

In order to obtain the aggregated 2SFCA score for each square, we attach a weight to each amenity. For an amenity $p$, we define its weight as 
$$ w_p = \dfrac{N_p}{N}$$
where $N$ is the number of occurences of all amenities and $N_p$ the number of occurences of the amenity $p$. The idea is that the less frequent an amenity is, the more important it is. This approach has a drawback of prioritising some amenities which are known to be less important, like museums.

Obtaining the aggregated 2SFCA is then just:
$$ CS_{i} = \sum_{p=1}^{P} (1 -w_{p}) \times X_{i,p} $$ where $X_{i,p}$ denotes the minmax normalisation of accessibility of $p$ in square $i$. (Why is this important? some variables, like housing access, have higher accessibility scores, just because naturally there are more housing units than schools).

We implement this approach below:

1. Calculate the weights:

In [None]:
interestVarAggr = ['restaurant',
       'culture and art', 'education', 'food_shops', 'fashion_beauty',
       'supply_shops']
for var in interestVarAggr:
        gdf[str("weight_" + var)] = gdf[var].sum()/(gdf[interestVarAggr].sum(axis = 1).sum())

In [None]:
weight_table = 1 - gdf[[str("weight_" + var) for var in interestVarAggr]]
print(weight_table.iloc[[0]].to_latex())

2. Get the normalised version of measurements:

In [None]:
interestVarAggrAccess = [i + '_access' for i in interestVarAggr]
for col in interestVarAggrAccess:
    gdf[f'{col}_normalized'] = (gdf[col] - gdf[col].min()) / (gdf[col].max() - gdf[col].min())

In [None]:
gdf['CS_aggregated'] = sum(gdf[f'{col}_access_normalized'] * (1 - gdf[f'weight_{col}']) for col in interestVarAggr)

In [None]:
gdf['CS_aggregated_without_weight'] = sum(gdf[f'{col}_access_normalized'] for col in interestVarAggr)

In [None]:
gdf.head()

## 2SFCA results

In [None]:
hs.folium_color_map(gdf,"CS_aggregated_without_weight")

In [None]:
hs.folium_color_map(gdf,"CS_aggregated")

Save the results:

In [None]:
gdf.to_file('results_' + gdf_name, layer="cool", driver="GPKG")