# Examining route attributes
Before, I had been only considering how these attributes can be compared across route alternatives, but instead of doing that, let's just look at the distance travelled on high stress facilities as this gives some evidence on people's willingness to travel on these facilities.

Look at distance rather than percent of route because a short trip with only a brief two block strech on a busy road is different than someone biking several miles on a stressful road that ends up being a smaller portion of the trip. Essentially, tolerance to stress from motor vehicles should in theory not vary with the distance of the trip. We're trying to find user profiles by acutally looking at the composition of links that they actually put themselves on.

However, it should be noted that distance will be limited in that there are only so many of a facility. Also, because sidewalks exist, this approach may run into issues in the case that someone is matched to a highstress road but they were actually on a sidewalk.

In [None]:
from pathlib import Path
import time
import geopandas as gpd
import numpy as np
import pickle
import networkx as nx
from shapely.ops import MultiLineString
import pandas as pd
import math
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator

from bikewaysim.paths import config
from bikewaysim.impedance_calibration import summarize_route, stochastic_optimization, post_calibration
from bikewaysim.routing import rustworkx_routing_funcs

In [None]:
links, turns_df, length_dict, geo_dict, turn_G = rustworkx_routing_funcs.import_calibration_network(config)

# just make this a function
with (config['calibration_fp']/'ready_for_calibration.pkl').open('rb') as fh:
    ready_for_calibration = pickle.load(fh)
print(len(ready_for_calibration),'trips')

# get the best performing full model so far
best_model = 'bootstrap_final,validation,0'
with (config['calibration_fp']/f'loss/{best_model}.pkl').open('rb') as fh:
    best_model = pickle.load(fh)

# reduce size ofready for calibration
ready_for_calibration = {key:item for key,item in ready_for_calibration.items() if key in best_model.keys()}
print(len(ready_for_calibration),'trips')

In [None]:
#new pickles
with (config['cycleatl_fp']/'trips_4.pkl').open('rb') as fh:
    trips = pickle.load(fh)
trips.reset_index(drop=True,inplace=True)
trips = trips[trips['tripid'].isin(ready_for_calibration.keys())]
with (config['cycleatl_fp']/'users_4.pkl').open('rb') as fh:
    users = pickle.load(fh)
users = users[users['userid'].isin(trips['userid'])]

#recalculate the number trips
users['matched_trips'] = users['userid'].map(trips.groupby('userid').size())

# Dealing with cycletracks and multi-use path sidepaths.
Problem is that when two links are parallel and close together, the map matching algorithm may not route on the parallel bike infrastructure. Especially true when the infrastructure doesn't have good network connectivity. In this case, we want to acknowledge that there was a bicycle facility that they could have been on.

This happens in two instances:
1. Cycletracks
1. Mutli-Use Paths that are essentially wide sidewalks

Some trips may be matched to these features still so I think it's important that not only it be acknowledge that the road had an adjacent cycletrack/multi-use path but that the adjacent cycletrack/multi-use path also takes on the features of the adjacent road. That way, it won't matter how the trip was matched.

For an LTS style analysis, we want to know 

In [None]:
sidepaths = gpd.read_file(config['bicycle_facilities_fp']/'sidepaths.gpkg',layer='sidepaths',ignore_geometry=True)

First, add cycletrack / multi-use path attributes to streets

In [None]:
links = pd.merge(links,sidepaths,on='linkid',how='left')

In [None]:
# identify the roads with sidepath variables but no bicycle facility variables
cond = links['sidepath_linkid'].notna() & links['facility_fwd'].isna()

# assign sidepath to the road if it doesn't already have a facility
links.loc[cond,'facility_fwd'] = links.loc[cond,'sidepath']

# assign sidepath year if there is one
links.loc[cond & links['sidepath_year'].notna(),'year'] = links['sidepath_year']

Next, add street attributes to cycletracks / multi-use paths

In [None]:
# get the street attributes that we care about and then drop duplicates
# assign the highest value for each (NOTE: these sidepaths prolly need to be split up in OSM)
cols = ['AADT','speed','lanes']
to_add = links.loc[links['linkid'].isin(set(list(sidepaths['linkid']))),['sidepath_linkid']+cols].drop_duplicates()
# retrieve the highest value present
to_add = to_add.groupby('sidepath_linkid').max()
to_add['link_type'] = 'road'

In [None]:
links = links.merge(to_add,left_on='linkid',right_index=True,how='left',suffixes=(None,'_new'))
# replaces na values in the to_add column with the links data
for col in to_add.columns:
    links[col] = links[f'{col}_new'].fillna(links[col])
links.drop(columns=[x for x in links.columns if '_new' in x],inplace=True)

# Getting route attributes

In [None]:
links['speed']

In [None]:
# recode the attributes
links['speed'] = links['speed'].astype(str)
links.loc[links['speed'].isin(['(30,40]', '(40,inf)']),'speed'] = '(30,inf)'
links['speed'] = pd.Categorical(links['speed'],categories=['[0,30]','(30,inf)'],ordered=True)

links['lanes'] = links['lanes'].astype(int).astype(str)
links.loc[links['lanes']=='3','lanes'] = '3+'
links['lanes'] = pd.Categorical(links['lanes'],categories=['1','2','3+'],ordered=True)

links.loc[links['facility_fwd']=='buffered bike lane','facility_fwd'] = 'bike lane'

links['grade_cat'] = pd.Categorical(links['ascent_grade_cat'],categories=['[0,4)', '[4,6)', '[6,inf)'],ordered=True)

In [None]:
#set index for quick retrieval
links.set_index(['linkid'],inplace=True)
turns_df.set_index(['source_linkid','source_reverse_link','target_linkid','target_reverse_link'],inplace=True)

links['length_mi'] = links['length_ft'] / 5280 
links['facility_fwd'] = links['facility_fwd'].fillna('No facility')

In [None]:
#how much on the base case?
links['base_case'] = (links['facility_fwd'] == 'No facility') & (links['grade_cat'] == '[0,4)') & (links['speed'] == '[0,30]') & (links['lanes'] == '1') 

In [None]:
# item = ready_for_calibration[tripid]
# matched_edges = item['matched_edges']

def route_attributes(tripid,edge_list):

    record = {}

    record['tripid'] = tripid

    # get links traversed
    trip_links = links.loc[edge_list['linkid']] 
    route = [tuple(x) for x in edge_list.values]

    # get the turn movements
    trip_turns = [(route[i][0],route[i][1],route[i+1][0],route[i+1][1]) for i in range(0,len(route)-1)]
    trip_turns = [x for x in trip_turns if x[0] != x[2]]
    trip_turns = turns_df.loc[trip_turns]

    #general stats
    record['Length (miles)'] = trip_links['length_mi'].sum().round(1)
    record['Ascent (feet)'] = trip_links['ascent_ft'].sum() # ascent seems a little high
    record['Not on Road'] = trip_links.loc[trip_links['link_type']!='road','length_mi'].sum()
    record['Base Case'] = trip_links.loc[trip_links['base_case']==True,'length_mi'].sum()

    # turn stats
    record['Left Turns'] = trip_turns['left_turn'].sum()
    record['Right Turns'] = trip_turns['right_turn'].sum()
    record['Unsignalized Crossing'] = trip_turns['unsig_crossing'].sum()  # these appear to match up to real world
    record['Signalized Crossing'] = trip_turns['signalized'].sum()

    #bike facilities
    bike_attrs = trip_links[trip_links['facility_fwd'].isin(['multi use path', 'bike lane', 'cycletrack'])]
    bike_attrs = bike_attrs.groupby('facility_fwd')['length_mi'].sum().to_dict()
    bike_attrs = {f"{key.title()}":item for key, item in bike_attrs.items()}
    record.update(bike_attrs)

    # road variables
    road_attrs = trip_links[(trip_links['link_type']=='road')].copy()
    aadt = {f'AADT: {key}':item for key, item in road_attrs.groupby('AADT')['length_mi'].sum().to_dict().items()}
    lanes = {f'Lanes: {key}':item for key, item in road_attrs.groupby('lanes')['length_mi'].sum().to_dict().items()}
    speed = {f'Speed: {key}':item for key, item in road_attrs.groupby('speed')['length_mi'].sum().to_dict().items()}
    grade = {f'Grade: {key}':item for key, item in road_attrs.groupby('grade_cat')['length_mi'].sum().to_dict().items()}

    record.update(aadt)
    record.update(lanes)
    record.update(speed)
    record.update(grade)

    # # road variables (no bike facilities)
    # # remove if there's a bicycle facility
    # road_attrs = trip_links[(trip_links['link_type']=='road') & (trip_links['facility_fwd'].isin(['multi use path', 'bike lane', 'cycletrack', 'buffered bike lane'])==False)].copy()
    
    # aadt = {('aadt',str(key)+'_mi'):item for key, item in road_attrs.groupby('AADT')['length_mi'].sum().to_dict().items()}
    # lanes = {('lanes',str(key)+'_mi'):item for key, item in road_attrs.groupby('lanes')['length_mi'].sum().to_dict().items()}
    # speed = {('speed',str(key)+'_mi'):item for key, item in road_attrs.groupby('speed')['length_mi'].sum().to_dict().items()}

    # record.update(aadt)
    # record.update(lanes)
    # record.update(speed)

    # # road variables (w bike faciliies)
    # road_attrs_w_bikeaccom = trip_links[(trip_links['link_type']=='road') & (trip_links['facility_fwd'].isin(['multi use path', 'bike lane', 'cycletrack', 'buffered bike lane']))].copy()

    # aadt_bike = {('bike_aadt',str(key)+'_mi'):item for key, item in road_attrs_w_bikeaccom.groupby('AADT')['length_mi'].sum().to_dict().items()}
    # lanes_bike = {('bike_lanes',str(key)+'_mi'):item for key, item in road_attrs_w_bikeaccom.groupby('lanes')['length_mi'].sum().to_dict().items()}
    # speed_bike = {('bike_speed',str(key)+'_mi'):item for key, item in road_attrs_w_bikeaccom.groupby('speed')['length_mi'].sum().to_dict().items()}

    # record.update(aadt_bike)
    # record.update(lanes_bike)
    # record.update(speed_bike)

    return record

Trying to think of why 

In [None]:
# calculate chosen route attributes
chosen_route_attr = [route_attributes(key,item['matched_edges']) for key, item in ready_for_calibration.items()]
chosen_route_attr = pd.DataFrame.from_records(chosen_route_attr).fillna(0).round(2)
chosen_route_attr.set_index('tripid',inplace=True)

# calculate shortest route attributes
shortest_route_attr = [route_attributes(key,item['shortest_edges']) for key, item in ready_for_calibration.items()]
shortest_route_attr = pd.DataFrame.from_records(shortest_route_attr).fillna(0).round(2)
shortest_route_attr.set_index('tripid',inplace=True)

# calculate modeled route attributes
modeled_route_attr = [route_attributes(key,item['modeled_edges']) for key, item in best_model.items()]
modeled_route_attr = pd.DataFrame.from_records(modeled_route_attr).fillna(0).round(2)
modeled_route_attr.set_index('tripid',inplace=True)

In [None]:
# normalize route attributes by the length of the trip
chosen_route_attr_norm = chosen_route_attr.drop(columns=['Length (miles)']).div(chosen_route_attr['Length (miles)'],axis=0)
shortest_route_attr_norm = shortest_route_attr.drop(columns=['Length (miles)']).div(shortest_route_attr['Length (miles)'],axis=0)
modeled_route_attr_norm = modeled_route_attr.drop(columns=['Length (miles)']).div(modeled_route_attr['Length (miles)'],axis=0)

# add the length back in
chosen_route_attr_norm['Length (miles)'] = chosen_route_attr_norm.index.map(chosen_route_attr['Length (miles)'])
shortest_route_attr_norm['Length (miles)'] = shortest_route_attr_norm.index.map(shortest_route_attr['Length (miles)'])
modeled_route_attr_norm['Length (miles)'] = modeled_route_attr_norm.index.map(modeled_route_attr['Length (miles)'])

In [None]:
# find difference in percentage
chosen_aligned, shortest_aligned = chosen_route_attr_norm.align(shortest_route_attr_norm, fill_value=0)
chosen_minus_shortest = chosen_aligned - shortest_aligned

chosen_aligned, modeled_aligned = chosen_route_attr_norm.align(modeled_route_attr_norm, fill_value=0)
chosen_minus_modeled = chosen_aligned - modeled_aligned

Calculate sum of squared differences for each attribute

In [None]:
test1 = (chosen_minus_modeled ** 2).sum()
test1.name = 'chosen_minus_modeled'
test2 = (chosen_minus_shortest ** 2).sum()
test2.name = 'chsoen_minus_shortest'
test3 = pd.concat([test1,test2],axis=1,ignore_index=False).round(1)
test3.to_csv(config['scratch_fp']/'sum_of_squared_difference.csv')

In [None]:
chosen_minus_modeled['Signalized Crossing'].max()

In [None]:
# # TODO Find squared difference for each attribute
# chosen_minus_modeled
# (chosen_minus_modeled ** 2).sum(axis=1).mean()
# (chosen_minus_shortest ** 2).sum(axis=1).mean()
# chosen_minus_shortest.describe()

# Create chosen route attributes plots

In [None]:
# link_pct_cols = [
#     'Lanes: 1',
#     'Lanes: 2',
#     'Lanes: 3+',
#     'Speed: [0,30]', 
#     'Speed: (30,inf)',
#     'AADT: [0,4k)',
#     'AADT: [4k,10k)', 
#     'AADT: [10k,inf)',
#     'Bike Lane',
#     'Cycletrack',
#     'Multi Use Path',
#     'Not on Road', 
#     'Base Case',
# ]

# # Loop through each column and create a histogram
# for column in link_pct_cols:
    
#     # Set up the figure and subplots
#     fig, ax = plt.subplots()

#     # Center the histograms around zero by setting limits
#     max_val = max(chosen_route_attr[column].max(), chosen_minus_shortest[column].max())
#     min_val = min(chosen_route_attr[column].min(), chosen_minus_shortest[column].min())
#     val = np.ceil(max(abs(max_val),abs(min_val)))

#     bins = np.arange(0,val,0.5)
#     # Set major ticks every 0.5 and minor ticks every 0.1
#     ax.xaxis.set_major_locator(MultipleLocator(1))
#     ax.xaxis.set_minor_locator(MultipleLocator(0.5))
#     ax.yaxis.set_major_locator(MultipleLocator(50))
#     ax.yaxis.set_minor_locator(MultipleLocator(10))   
    
#     # Plot histogram for chosen_route_attr
#     ax.hist(chosen_route_attr[column], bins=bins, alpha=0.3, color='blue', edgecolor='black')
    
#     ax.set_title(column)
#     ax.set_ylabel(f'Frequency (N={chosen_route_attr.shape[0]})')
#     ax.set_xlim(0, val)
#     # ax.legend()

#     # Set a common xlabel
#     ax.set_xlabel('Miles')

#     #save the figure
#     plt.savefig(config['figures_fp']/f"{column.replace(' ','_').replace(':','_')}_chosenattrs.png",dpi=300)

#  Create difference in route attributes plots

In [None]:
link_pct_cols = [
    'Lanes: 1',
    'Lanes: 2',
    'Lanes: 3+',
    'Speed: [0,30]', 
    'Speed: (30,inf)',
    'AADT: [0,4k)',
    'AADT: [4k,10k)', 
    'AADT: [10k,inf)',
    'Grade: [0,4)',
    'Grade: [4,6)',
    'Grade: [6,inf)',
    'Bike Lane',
    'Cycletrack',
    'Multi Use Path',
    'Not on Road', 
    'Base Case',
]

# Loop through each column and create a histogram
for column in link_pct_cols:
    
    # Set up the figure and subplots
    fig, ax = plt.subplots()

    # Center the histograms around zero by setting limits
    max_val = max(chosen_minus_modeled[column].max(), chosen_minus_shortest[column].max())
    min_val = min(chosen_minus_modeled[column].min(), chosen_minus_shortest[column].min())
    val = np.ceil(max(abs(max_val),abs(min_val)))

    bins = np.arange(-1,1,0.05)
    # Set major ticks every 0.5 and minor ticks every 0.1
    ax.xaxis.set_major_locator(MultipleLocator(0.25))
    ax.xaxis.set_minor_locator(MultipleLocator(0.05))
    ax.yaxis.set_major_locator(MultipleLocator(50))
    ax.yaxis.set_minor_locator(MultipleLocator(10))   
    
    # Plot histogram for chosen_minus_modeled
    ax.hist(chosen_minus_modeled[column], bins=bins, alpha=0.3, color='blue', label='Chosen Minus Modeled', edgecolor='black')

    # Plot histogram for chosen_minus_shortest
    ax.hist(chosen_minus_shortest[column], bins=bins, alpha=0.3, color='grey', label='Chosen Minus Shortest', edgecolor='black')
    
    ax.set_title(column + ' (%)')
    ax.set_ylabel(f'Frequency (N={chosen_minus_modeled.shape[0]})')
    ax.set_xlim(min(val * -1, -1), max(val, 1))
    ax.legend()

    # Set a common xlabel
    ax.set_xlabel('Difference')

    #save the figure
    plt.savefig(config['figures_fp']/f"{column.replace(' ','_').replace(':','_')}_routeattrs.png",dpi=300)

In [None]:
turn_cols = [
    'Left Turns', 
    'Right Turns',
    'Unsignalized Crossing',
    'Signalized Crossing',
]

# Loop through each column and create a histogram
for column in turn_cols:
    
    # Set up the figure and subplots
    fig, ax = plt.subplots()

    # Center the histograms around zero by setting limits
    max_val = max(chosen_minus_modeled[column].max(), chosen_minus_shortest[column].max())
    min_val = min(chosen_minus_modeled[column].min(), chosen_minus_shortest[column].min())
    val = np.ceil(max(abs(max_val),abs(min_val)))

    bin_size = 0.25
    bins = np.arange(-1*val,val+bin_size,bin_size)
    # Set major ticks every 0.5 and minor ticks every 0.1
    ax.xaxis.set_major_locator(MultipleLocator(1))
    ax.xaxis.set_minor_locator(MultipleLocator(0.25))

    ax.yaxis.set_major_locator(MultipleLocator(50))
    ax.yaxis.set_minor_locator(MultipleLocator(10))   
    
    # Plot histogram for chosen_minus_modeled
    ax.hist(chosen_minus_modeled[column], bins=bins, alpha=0.3, color='blue', label='Chosen Minus Modeled', edgecolor='black')

    # Plot histogram for chosen_minus_shortest
    ax.hist(chosen_minus_shortest[column], bins=bins, alpha=0.3, color='grey', label='Chosen Minus Shortest', edgecolor='black')

    # Set the title and labels for each subplot
    ax.set_title(column + ' Per Mile')
    ax.set_ylabel(f'Frequency (N={chosen_minus_modeled.shape[0]})')
    ax.set_xlim(min(val * -1, -1), max(val, 1))  # Adjust these limits based on your data
    ax.legend()

    # Set a common xlabel
    ax.set_xlabel(f'Difference (Bin Size = {bin_size})')

    #save the figure
    plt.savefig(config['figures_fp']/f"{column.replace(' ','_')}_routeattrs.png",dpi=300)

In [None]:
column = 'Length (miles)'

# Set up the figure and subplots
fig, ax = plt.subplots()

# Center the histograms around zero by setting limits
max_val = max(chosen_minus_modeled[column].max(), chosen_minus_shortest[column].max())
min_val = min(chosen_minus_modeled[column].min(), chosen_minus_shortest[column].min())
val = np.ceil(max(abs(max_val),abs(min_val)))

bin_size = 0.5
bins = np.arange(-1*val,val+bin_size,bin_size)
ax.xaxis.set_major_locator(MultipleLocator(5))
ax.xaxis.set_minor_locator(MultipleLocator(0.5))
ax.yaxis.set_major_locator(MultipleLocator(50))
ax.yaxis.set_minor_locator(MultipleLocator(10))   

# Plot histogram for chosen_minus_modeled
ax.hist(chosen_minus_modeled[column], bins=bins, alpha=0.3, color='blue', label='Chosen Minus Modeled', edgecolor='black')

# Plot histogram for chosen_minus_shortest
ax.hist(chosen_minus_shortest[column], bins=bins, alpha=0.3, color='grey', label='Chosen Minus Shortest', edgecolor='black')

# Set the title and labels for each subplot
ax.set_title(column)
ax.set_ylabel(f'Frequency (N={chosen_minus_modeled.shape[0]})')
ax.set_xlim(min(val * -1, -1), max(val, 1))  # Adjust these limits based on your data
ax.legend()

# Set a common xlabel
ax.set_xlabel(f'Difference (Bin Size = {bin_size})')

#save the figure
plt.savefig(config['figures_fp']/f"{column.replace(' ','_')}_routeattrs.png",dpi=300)

In [None]:
column = 'Ascent (feet)'

# Set up the figure and subplots
fig, ax = plt.subplots()

# Center the histograms around zero by setting limits
max_val = max(chosen_minus_modeled[column].max(), chosen_minus_shortest[column].max())
min_val = min(chosen_minus_modeled[column].min(), chosen_minus_shortest[column].min())
val = np.ceil(max(abs(max_val),abs(min_val)))

bin_size = 5
bins = np.arange(-1*val,val+bin_size,bin_size)
ax.xaxis.set_major_locator(MultipleLocator(15))
ax.xaxis.set_minor_locator(MultipleLocator(5))
ax.yaxis.set_major_locator(MultipleLocator(50))
ax.yaxis.set_minor_locator(MultipleLocator(10))   

# Plot histogram for chosen_minus_modeled
ax.hist(chosen_minus_modeled[column], bins=bins, alpha=0.3, color='blue', label='Chosen Minus Modeled', edgecolor='black')

# Plot histogram for chosen_minus_shortest
ax.hist(chosen_minus_shortest[column], bins=bins, alpha=0.3, color='grey', label='Chosen Minus Shortest', edgecolor='black')

# Set the title and labels for each subplot
ax.set_title("Ascent (feet per mile)")
ax.set_ylabel(f'Frequency (N={chosen_minus_modeled.shape[0]})')
ax.set_xlim(min(val * -1, -1), max(val, 1))  # Adjust these limits based on your data
ax.legend()

# Set a common xlabel
ax.set_xlabel(f'Difference (Bin Size = {bin_size})')

#save the figure
plt.savefig(config['figures_fp']/f"{column.replace(' ','_')}_routeattrs.png",dpi=300)

In [None]:
# from tqdm import tqdm
# test = []
# for tripid, item in tqdm(ready_for_calibration.items()):
#     trip = item['matched_edges']

#     # get links traversed
#     trip_links = links.loc[trip['linkid']] 
#     route = [tuple(x) for x in trip.values]

#     # get the turn movements
#     trip_turns = [(route[i][0],route[i][1],route[i+1][0],route[i+1][1]) for i in range(0,len(route)-1)]
#     trip_turns = [x for x in trip_turns if x[0] != x[2]]
#     trip_turns = turns_df.loc[trip_turns]

#     #general stats
#     general_stats = pd.Series({
#         'length_mi': trip_links['length_mi'].sum().round(1),
#         'ascent_ft': trip_links['ascent_ft'].sum(), # ascent seems a little high
#     })
#     general_stats.index.name = 'general stats'
#     turn_stats = pd.Series({
#         'left_turns': trip_turns['left_turn'].sum(),
#         'right_turns': trip_turns['right_turn'].sum(),
#         'unsig_crossing': trip_turns['unsig_crossing'].sum(),  # these appear to match up to real world
#         'sig_crossings': trip_turns['signalized'].sum()
#     })
#     turn_stats.index.name = 'turn stats'

#     #bike facilities
#     bike_attrs = trip_links[trip_links['facility_fwd'].isin(['multi use path', 'bike lane', 'cycletrack', 'buffered bike lane'])]
#     bike_attrs = bike_attrs.groupby('facility_fwd')['length_mi'].sum()

#     # road variables (no bike facilities)
#     # remove if there's a bicycle facility
#     road_attrs = trip_links[(trip_links['link_type']=='road') & (trip_links['facility_fwd'].isin(['multi use path', 'bike lane', 'cycletrack', 'buffered bike lane'])==False)].copy()
#     # route_attrs = trip_links[trip_links['link_type']=='road'].groupby(['AADT','lanes','speed'])['length_mi'].sum()
#     aadt = road_attrs.groupby('AADT')['length_mi'].sum()
#     lanes = road_attrs.groupby('lanes')['length_mi'].sum()
#     speed = road_attrs.groupby('speed')['length_mi'].sum()

#     #group these
#     concat = [general_stats,bike_attrs,aadt,lanes,speed,turn_stats]
#     concat = pd.concat(concat,keys=[x.index.name for x in concat])

#     test.append(concat)
# # Define the number of columns to plot
# columns = chosen_minus_modeled.columns  # Assuming both DataFrames have the same columns
# num_columns = len(columns)

# # Set up the figure and subplots
# fig, axes = plt.subplots(num_columns, 1, figsize=(10, 100))

# # Loop through each column and create a histogram
# for i, column in enumerate(columns):
    
#     # Center the histograms around zero by setting limits
#     max_val = max(chosen_minus_modeled[column].max(), chosen_minus_shortest[column].max())
#     min_val = min(chosen_minus_modeled[column].min(), chosen_minus_shortest[column].min())
#     val = max(abs(max_val),abs(min_val))

#     bins = np.arange(-1,1,0.05)
    
#     # Plot histogram for chosen_minus_shortest
#     axes[i].hist(chosen_minus_shortest[column].round(2), bins=bins, alpha=0.5, color='grey', label='Chosen Minus Shortest', edgecolor='black')
    
#     # Plot histogram for chosen_minus_modeled
#     axes[i].hist(chosen_minus_modeled[column].round(2), bins=bins, alpha=0.5, color='blue', label='Chosen Minus Modeled', edgecolor='black')

#     # Set the title and labels for each subplot
#     axes[i].set_title(f'Histogram for {column}')
#     axes[i].set_ylabel('Frequency')
    
#     axes[i].set_xlim(min(val * -1, -1), max(val, 1))  # Adjust these limits based on your data

#     # Set major ticks every 0.5 and minor ticks every 0.1
#     axes[i].xaxis.set_major_locator(MultipleLocator(0.25))
#     axes[i].xaxis.set_minor_locator(MultipleLocator(0.05))

#     axes[i].legend()

# # Set a common xlabel
# axes[-1].set_xlabel('Value')

# # Add a legend to the last subplot
# # axes[-1].legend()

# # Show the plot
# plt.tight_layout()  # Adjust subplots to fit in the figure area.
# plt.show()
