# Heavy Truck Model
## FAF Summarization

Author: Maddie Hasani, Fehr & Peers <br/>
Reviewer: Fatemeh Ranaiefar, Fehr & Peers<br/>
Last update: 8/29/2023


## REQUIRED LIBRARIES

In [130]:
import pandas as pd
import numpy as np
import xlsxwriter

## LOAD IN & CLEAN DATA

In [131]:
# 1. Load in some inputs
inputs_sandag_HTM = pd.ExcelFile('inputs_sandag_HTM.xlsx')
sheet_names = [sheet_name for sheet_name in inputs_sandag_HTM.sheet_names if sheet_name.lower() != 'userguide']

# 1.1 Load all sheets into separate DataFrames with lowercase names
for sheet_name in sheet_names:
    df_name = sheet_name.lower()  # Convert sheet name to lowercase
    globals()[df_name] = inputs_sandag_HTM.parse(sheet_name)  # Save DataFrame to a variable with the lowercase name

# 2. Load in FAF data
faf_name = faf.loc[0, 'Name']
faf_path = faf.loc[0, 'Path']
full_faf_path = faf_path + '\\' + faf_name + ".csv"

faf = pd.read_csv(full_faf_path)


## Sum production/attraction tons for the 5 FAZs in SANDAG by mode

In [132]:
# Filter the data to include only records where either the origin or destination FAZ is located within San Diego.
faf_sd_a = faf[faf["dms_dest"].isin(faz_san_diego)]
faf_sd_p = faf[faf["dms_orig"].isin(faz_san_diego)]
faf_sd_a

Unnamed: 0,dms_orig,dms_dest,Mode,Commodity,distons_2017,disvalue_2017,fr_orig,fr_dest,fr_inmode,fr_outmode,Direction,Trade
244,209,607302,1,1,0.000228,0.01,,,,,XI,Domestic
245,209,607303,1,1,0.000407,0.02,,,,,XI,Domestic
246,209,607305,1,1,0.000283,0.01,,,,,XI,Domestic
247,209,607304,1,1,0.000198,0.01,,,,,XI,Domestic
248,209,607306,1,1,0.000399,0.02,,,,,XI,Domestic
...,...,...,...,...,...,...,...,...,...,...,...,...
3206869,5064,607306,5,43,0.400000,0.83,801.0,,5.0,,II,Import
3206870,4064,607306,4,43,0.080000,1.33,804.0,,4.0,,II,Import
3206871,4064,607306,4,43,0.050000,1.67,808.0,,4.0,,II,Import
3206872,5063,607306,5,43,0.210000,0.45,802.0,,5.0,,II,Import


In [133]:
# Groupby and sum for both dataframes
result_a = faf_sd_a.groupby(['Mode'], as_index=False).agg(ton_a = ('distons_2017', 'sum'))
result_p = faf_sd_p.groupby(['Mode'], as_index=False).agg(ton_p = ('distons_2017', 'sum'))

# Reset index and drop the previous index column
result_a.reset_index(drop=True, inplace=True)
result_p.reset_index(drop=True, inplace=True)

# Concatenate results into a single DataFrame
ton_by_mode_sd = pd.merge(result_a, result_p, how="inner", on='Mode')
ton_by_mode_sd.to_csv('ton_by_mode_sd.csv')
ton_by_mode_sd

Unnamed: 0,Mode,ton_a,ton_p
0,1,57665.096213,51035.791883
1,2,1133.924986,512.774382
2,3,338.982006,229.514966
3,4,39.43837,31.816959
4,5,1657.97812,640.853094
5,6,9013.923884,0.33
6,7,8.978598,117.980766


## Cleanup FAF data
1. Filter some truck-related modes
2. Delete unnecessary columns


In [134]:
# include some modes
mode_to_include = othermode_truck.set_index('Mode_Num')['Percentage'].to_dict()
# Filter out rows where 'Mode' is not in the mode_to_include
df = faf[faf['Mode'].isin(mode_to_include.keys())]
df['truck_perc'] = df['Mode'].map(mode_to_include) #assign percentage of truck by mode
df['ton'] = df['distons_2017'] * df['truck_perc']
# drop truck_perc column
df.drop('truck_perc', axis=1, inplace=True)

# Delete unnecessary columns
delete_col = ['distons_2017', 'disvalue_2017', 'Mode', 'fr_orig', 'fr_dest', 'fr_inmode', 'fr_outmode', 'Direction', 'Trade']
df.drop(delete_col, axis=1, inplace=True)

df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['truck_perc'] = df['Mode'].map(mode_to_include) #assign percentage of truck by mode
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['ton'] = df['distons_2017'] * df['truck_perc']
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.drop('truck_perc', axis=1, inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveat

Unnamed: 0,dms_orig,dms_dest,Commodity,ton
0,41,603706,1,2.500000
1,41,603707,1,3.020000
2,41,607100,1,2.220000
3,41,603700,1,2.830000
4,41,606503,1,2.580000
...,...,...,...,...
3259206,132,151,43,0.140000
3259207,139,151,43,0.700000
3259214,211,151,43,0.000051
3259215,212,151,43,0.000171


## Sum production/attraction tons for the 5 FAZs in SANDAG by 15 aggregated commodities

### Aggregate Commodity Level

In [135]:
# 1. Assign SANDAG commodity groups based on SCTG commodity group
# Create a dictionary to map Commodity values to CG values
commodity_to_cg = commodity_group.set_index('SCTG')['CG'].to_dict()

# Use the map function to directly assign CG values
df['CG'] = df['Commodity'].map(commodity_to_cg)

# drop commodity column
df.drop('Commodity', axis=1, inplace=True)

# 2. Aggregate the Tonnage Data by Origin/Dest and Commodity Group
df = df.groupby(['dms_orig', 'dms_dest', 'CG'], as_index=False).agg({'ton': 'sum'})

df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['CG'] = df['Commodity'].map(commodity_to_cg)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.drop('Commodity', axis=1, inplace=True)


Unnamed: 0,dms_orig,dms_dest,CG,ton
0,11,151,CG-10,0.000262
1,11,151,CG-11,0.829865
2,11,151,CG-13,0.330317
3,11,151,CG-14,0.006504
4,11,151,CG-2,0.004000
...,...,...,...,...
429129,611500,611500,CG-4,5.601951
429130,611500,611500,CG-5,12.830000
429131,611500,611500,CG-7,1.266035
429132,611500,611500,CG-8,1.060387


In [136]:
# Filter the data to include only records where either the origin or destination FAZ is located within San Diego.
df_sd_a = df[df["dms_dest"].isin(faz_san_diego)]
df_sd_p = df[df["dms_orig"].isin(faz_san_diego)]
df_sd_a

Unnamed: 0,dms_orig,dms_dest,CG,ton
621,11,607302,CG-11,0.011380
622,11,607302,CG-13,0.008162
623,11,607302,CG-14,0.003529
624,11,607302,CG-2,0.001363
625,11,607302,CG-4,0.010000
...,...,...,...,...
428781,611500,607306,CG-4,0.140000
428782,611500,607306,CG-5,0.000485
428783,611500,607306,CG-7,0.272046
428784,611500,607306,CG-8,0.963747


In [137]:
# Groupby and sum for both dataframes
result_a = df_sd_a.groupby(['CG'], as_index=False).agg(ton_a = ('ton', 'sum'))
result_p = df_sd_p.groupby(['CG'], as_index=False).agg(ton_p = ('ton', 'sum'))

# Reset index and drop the previous index column
result_a.reset_index(drop=True, inplace=True)
result_p.reset_index(drop=True, inplace=True)

# Concatenate results into a single DataFrame
ton_by_cg_sd = pd.merge(result_a, result_p, how="inner", on='CG')
ton_by_cg_sd.to_csv('ton_by_cg_sd.csv')
ton_by_cg_sd

Unnamed: 0,CG,ton_a,ton_p
0,CG-1,2400.973537,3214.791428
1,CG-10,9773.264524,7185.796723
2,CG-11,2978.794095,1439.46633
3,CG-12,3812.454926,4911.040712
4,CG-13,1295.790099,767.887822
5,CG-14,607.812713,457.330599
6,CG-15,4.372672,47.125792
7,CG-2,2011.518876,650.759479
8,CG-4,17297.654689,17396.327653
9,CG-5,5111.014939,8729.29388


### Identify if one end of a OD is in Orange County
This will be used later in the OD distance calculation

In [138]:
# Create a mapping dictionary for FAZ to County
faz_san_diego = faz_county[faz_county["County"] == "San Diego"]["FAZ"]
# create a list of FAZ outside of San Diego
faz_non_sd = faz_county[faz_county["County"] != "San Diego"]["FAZ"]

df['county_orig'] = np.nan
df['county_dest'] = np.nan

# if it's within San Diego
df.loc[df['dms_orig'].isin(faz_san_diego), 'county_orig'] = df.loc[df['dms_orig'].isin(faz_san_diego), 'dms_orig'].astype(str)   
df.loc[df['dms_dest'].isin(faz_san_diego), 'county_dest'] = df.loc[df['dms_dest'].isin(faz_san_diego), 'dms_dest'].astype(str)   

# if it's outside San Diego but within CA, call it Rest of California
df.loc[df['dms_orig'].isin(faz_non_sd), 'county_orig'] = "Rest of California"
df.loc[df['dms_dest'].isin(faz_non_sd), 'county_dest'] = "Rest of California"

# Where county is NAN, it means the counts is outside of California
df.loc[df['county_orig'].isnull(), 'county_orig'] = 'Other States'
df.loc[df['county_dest'].isnull(), 'county_dest'] = 'Other States'

df

Unnamed: 0,dms_orig,dms_dest,CG,ton,county_orig,county_dest
0,11,151,CG-10,0.000262,Other States,Other States
1,11,151,CG-11,0.829865,Other States,Other States
2,11,151,CG-13,0.330317,Other States,Other States
3,11,151,CG-14,0.006504,Other States,Other States
4,11,151,CG-2,0.004000,Other States,Other States
...,...,...,...,...,...,...
429129,611500,611500,CG-4,5.601951,Rest of California,Rest of California
429130,611500,611500,CG-5,12.830000,Rest of California,Rest of California
429131,611500,611500,CG-7,1.266035,Rest of California,Rest of California
429132,611500,611500,CG-8,1.060387,Rest of California,Rest of California


In [151]:
# group by and summarize by origin and destination counties
truck_ton = df.groupby(['county_orig', 'county_dest'], as_index=False).agg(ton = ('ton', 'sum'))
truck_ton.to_csv('truck_ton.csv')
truck_ton

Unnamed: 0,county_orig,county_dest,ton
0,607302,607302,789.4585
1,607302,607303,1395.658536
2,607302,607304,666.331301
3,607302,607305,966.727721
4,607302,607306,1411.540407
5,607302,Other States,874.911144
6,607302,Rest of California,1548.048856
7,607303,607302,1395.828536
8,607303,607303,2477.33043
9,607303,607304,1179.078616


In [142]:
# long to wide format
truck_ton_matrix = truck_ton.pivot(index='county_orig', columns='county_dest', values='ton')
truck_ton_matrix.to_csv('truck_ton_matrix.csv')
truck_ton_matrix

county_dest,607302,607303,607304,607305,607306,Other States,Rest of California
county_orig,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
607302,789.4585,1395.658536,666.331301,966.727721,1411.540407,874.911144,1548.048856
607303,1395.828536,2477.33043,1179.078616,1711.059306,2514.50171,1618.307713,2755.056016
607304,666.281301,1178.818616,564.692159,816.49803,1185.162698,722.284615,1267.594222
607305,966.637721,1710.619306,816.49803,1186.695968,1736.233365,1088.301374,1859.430157
607306,1411.430407,2513.86171,1185.202698,1736.303365,2611.653336,1820.991406,2846.750184
Other States,1381.610791,2509.275134,1070.973206,1674.733455,2895.441046,50486.350317,125538.800432
Rest of California,1949.967194,3582.961248,1573.542535,2428.247363,4175.274463,114875.896983,668115.664362
