*Data-preparation script for visualisation*

This script includes 15 steps:
1. Importing the packages
2. Read in all datafiles, the files that need to be in the same folder are
    - BMMS_overview.csv
    - _ roadnames _list.csv
    - RMMS folder
3. Define values for truck capacity
4. three functions:
    - htm_import
    - process_traffic
    - average_lanes
5. data per Road segments (Part 1)
6. Summary per road (Part 2)
7. Renaming / framing dataframes
8. Fix some of the missing data for segments
9. CHECK THE DATA CLEANING SCRIPT FOR MISSING SEGMENTS
10. RUN THIS LINE OF CODE FOR LINKING THE ROAD DATA WITH THE TRAFFIC (PER SEGMENT DATA)
11. CHECK HOW MUCH MISSING DATA IS THERE
12. Determine vulnerability of bridges based on their scenario likelihood
13. RUN THIS LINE OF CODE FOR CALCULATING THE VULNERABILITY OF EACH ROAD SEGMENT IN THE TRAFFIC DATA FILE
14. NORMALIZING Vulnerability & Criticality
15. Save the dataframe to CSV's to use them for the plotting scripts

After the data-preparation there are three finalised documents, which are used for visualisation:
- Traffic_segment_data.csv
- Bridge_data_with_link_to_traffic.csv
- summary_data_traffic_per_road.csv


In [1]:
#STEP 1: import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
import html5lib
import numpy as np
import logging
warnings.filterwarnings('ignore')

In [20]:
#STEP 2 Read in all the data files
df_all_bmms = pd.read_csv('BMMS_overview.csv', encoding='utf-8',sep = ",")

#get list of all the roads 
roadnames = pd.read_csv('_roadnames_list.csv') # need to have roadnames list in same folder path 
roadnames_list = list(roadnames.columns.unique())

Unnamed: 0,road,km,Criticality,type,LRPName,name,length,condition,structureNr,roadName,...,width,constructionYear,spans,zone,circle,division,sub-division,lat,lon,EstimatedLoc
0,N1,1.800,,Box Culvert,LRP001a,.,11.30,A,117861,Dhaka (Jatrabari)-Comilla (Mainamati)-Chittago...,...,19.50,2005.0,2.0,Dhaka,Dhaka,Narayanganj,Narayanganj-1,23.698739,90.458861,interpolate
1,N1,4.925,,Box Culvert,LRP004b,.,6.60,A,117862,Dhaka (Jatrabari)-Comilla (Mainamati)-Chittago...,...,35.40,2006.0,1.0,Dhaka,Dhaka,Narayanganj,Narayanganj-1,23.694664,90.487775,interpolate
2,N1,8.976,,PC Girder Bridge,LRP008b,Kanch pur Bridge.,394.23,A,119889,Dhaka (Jatrabari)-Comilla (Mainamati)-Chittago...,...,,,,Dhaka,Dhaka,Narayanganj,Narayanganj-1,23.705060,90.523214,interpolate
3,N1,10.880,,Box Culvert,LRP010b,NOYAPARA CULVERT,6.30,A,112531,Dhaka (Jatrabari)-Comilla (Mainamati)-Chittago...,...,12.20,1992.0,2.0,Dhaka,Dhaka,Narayanganj,Vitikandi,23.694391,90.537574,interpolate
4,N1,10.897,,Box Culvert,LRP010c,ADUPUR CULVERT,6.30,A,112532,Dhaka (Jatrabari)-Comilla (Mainamati)-Chittago...,...,12.20,1984.0,2.0,Dhaka,Dhaka,Narayanganj,Vitikandi,23.694302,90.537707,interpolate
5,N1,11.296,,Box Culvert,LRP011a,NAYABARI KASPUR BOX CULVERT,8.30,A,101110,Dhaka (Jatrabari)-Comilla (Mainamati)-Chittago...,...,21.45,1986.0,2.0,Dhaka,Dhaka,Narayanganj,Vitikandi,23.692360,90.540918,interpolate
6,N1,12.239,,Box Culvert,LRP012a,KHAS PARA BOX CULVERT,9.30,A,101117,Dhaka (Jatrabari)-Comilla (Mainamati)-Chittago...,...,21.00,1986.0,2.0,Dhaka,Dhaka,Narayanganj,Vitikandi,23.688412,90.548559,interpolate
7,N1,12.253,,Box Culvert,LRP012b,DAWAN BAG BOX CULVERT,6.10,A,101119,Dhaka (Jatrabari)-Comilla (Mainamati)-Chittago...,...,20.60,1987.0,2.0,Dhaka,Dhaka,Narayanganj,Vitikandi,23.688320,90.548650,interpolate
8,N1,12.660,,PC Girder Bridge,LRP013a,Madanpur Bridge.(L),27.50,A,119897,Dhaka (Jatrabari)-Comilla (Mainamati)-Chittago...,...,,,,Dhaka,Dhaka,Narayanganj,Vitikandi,23.685583,90.551208,interpolate
9,N1,12.660,,PC Girder Bridge,LRP013a,MADAN PUR (R),26.30,A,109841,Dhaka (Jatrabari)-Comilla (Mainamati)-Chittago...,...,9.20,2003.0,1.0,Dhaka,Dhaka,Narayanganj,Vitikandi,23.685583,90.551208,interpolate


In [4]:
#STEP 3 Define the relative values of the trucks according to economic value equation, SEE chapter 1.2.2 for explanation.
Heavy_value = 30.0
Medium_value = 25.0
Small_value = 16.0

In [5]:
#STEP 4.1 - Load in functions
#function that takes the roadname as an argument and returns a panda dataframe with the traffic data for that road
def htm_import(roadname):
        
        #make a temporary dataframe that will be returned by the function
        df_current_road = pd.DataFrame()
        
        #get filename and open the .htm file
        filename = "RMMS/{}.traffic.htm".format(roadname)
        f =  open(filename, "rb")
        
        #find all the dataframes that contain the string of the roadname
        list_road = pd.read_html(f, skiprows = {0,0,5},header  = 1, match = roadname) #the third data frame has 5 skip rows
        df_current_road = list_road[2] #third table in the file is the data frame with the traffic data
        f.close()

        #subset to remove all columns after the 25th column. 
        #The 26th column is a repeat of total AADT data in 25th column and after that there are a number of empty columns 
        #as a result of parsing from .htm that should also be removed
        column_remove =list(df_current_road.columns.values)[25:]
        df_current_road = df_current_road.drop(labels=column_remove, axis = 1)

        #rename the columns
        df_current_road.columns = ["Link no","Link Name","Start LRP","Start Offset","Start Chainage","End LRP","End Offset","End Chainage","Link Length", "Heavy Truck","Medium Truck","Small Truck","Large Bus","Medium Bus","Micro Bus","Utility","Car","Auto Rickshaw","Motor Cycle","Bi-Cycle","Cycle Rickshaw","Cart","Motorized Total","Non Motorized Total","Total Traffic"]       

        #include the roadname in the dataframe
        df_current_road['Road name'] = roadname
        return(df_current_road)

In [9]:
#STEP 4.2 - Load in functions
#involves calculating the economic value of the traffic on each 'link', and processing duplicates for L and R on a link.
#the average value of L and R traffc is used, and then the R values are removed
#also removes links for which there is no data at all, and returns a list of the links that have been removed
def process_traffic(df):

    end_index = len(df)
    drop_indices = [] #list containing indices of duplictates to be removed
    for i in range(end_index):
        
        #check for missing data in the link
        if df.iloc[i]['Heavy Truck'] == 'NS' :
            #all the rows with NS data (i.e missing data) have NS values in this column
            #remove the 'bad data row' - but maybe change to NA instead?
            drop_indices.append(i)
            #report deletion to log
            logging.warning('%s link deleted due to no data',df.iloc[i]['Link no'] )
        elif  i != (end_index-1):
            if df.iloc[i+1]['Heavy Truck'] != 'NS':
     
                #no missing values so continue as normal...
                #EXTEND THIS CONDITION TO MAKE IT MORE ACCURATE (MAYBE NOT NECESSARY): same chainage/name etc also
                if 'L' in df.iloc[i]['Link no'] and 'R' in df.iloc[i+1]['Link no']:

                    if not (df.iloc[i+1]['Heavy Truck'] == 'NS'): #cannot average with next row's value if the next row is corrupt data


                                #use average of all the traffic data from L and R 
                                df.iloc[i, df.columns.get_loc('Heavy Truck')] = (float(df.iloc[i]['Heavy Truck'])+float(df.iloc[i+1]['Heavy Truck']))/2
                                df.iloc[i, df.columns.get_loc('Medium Truck')] = (float(df.iloc[i]['Medium Truck'])+float(df.iloc[i+1]['Medium Truck']))/2
                                df.iloc[i, df.columns.get_loc('Small Truck')] = (float(df.iloc[i]['Small Truck'])+float(df.iloc[i+1]['Small Truck']))/2

                                df.iloc[i, df.columns.get_loc('Large Bus')] = (float(df.iloc[i]['Large Bus'])+float(df.iloc[i+1]['Large Bus']))/2
                                df.iloc[i, df.columns.get_loc('Medium Bus')] = (float(df.iloc[i]['Medium Bus'])+float(df.iloc[i+1]['Medium Bus']))/2
                                df.iloc[i, df.columns.get_loc('Micro Bus')] = (float(df.iloc[i]['Micro Bus'])+float(df.iloc[i+1]['Micro Bus']))/2

                                df.iloc[i, df.columns.get_loc('Utility')] = (float(df.iloc[i]['Utility'])+float(df.iloc[i+1]['Utility']))/2

                                df.iloc[i, df.columns.get_loc('Car')] = (float(df.iloc[i]['Car'])+float(df.iloc[i+1]['Car']))/2
                                df.iloc[i, df.columns.get_loc('Auto Rickshaw')] = (float(df.iloc[i]['Auto Rickshaw'])+float(df.iloc[i+1]['Auto Rickshaw']))/2
                                df.iloc[i, df.columns.get_loc('Motor Cycle')] = (float(df.iloc[i]['Motor Cycle'])+float(df.iloc[i+1]['Motor Cycle']))/2
                                df.iloc[i, df.columns.get_loc('Bi-Cycle')] = (float(df.iloc[i]['Bi-Cycle'])+float(df.iloc[i+1]['Bi-Cycle']))/2        
                                df.iloc[i, df.columns.get_loc('Cycle Rickshaw')] = (float(df.iloc[i]['Cycle Rickshaw'])+float(df.iloc[i+1]['Cycle Rickshaw']))/2
                                df.iloc[i, df.columns.get_loc('Cart')] = (float(df.iloc[i]['Cart'])+float(df.iloc[i+1]['Cart']))/2

                                df.iloc[i, df.columns.get_loc('Motorized Total')] = (float(df.iloc[i]['Motorized Total'])+float(df.iloc[i+1]['Motorized Total']))/2        
                                df.iloc[i, df.columns.get_loc('Non Motorized Total')] = (float(df.iloc[i]['Non Motorized Total'])+float(df.iloc[i+1]['Non Motorized Total']))/2        
                                df.iloc[i, df.columns.get_loc('Total Traffic')] = (float(df.iloc[i]['Total Traffic'])+float(df.iloc[i+1]['Total Traffic']))/2        

                                #remove the duplicate
                                drop_indices.append(i+1)
                                logging.warning('%s link deleted due to duplicate',df.iloc[i]['Link no'] )

                                #rename the Link no (e.g. N1-1R is removed and N1-1L is renamed N1-1)
                                df.iloc[i, df.columns.get_loc('Link no')] = df.iloc[i]['Link no'].replace('L','')

    df.drop(labels = drop_indices, inplace = True)
    
    for i in range(len(df)):      
        #calculate the "economic value of traffic" according to formula
        df.iloc[i, df.columns.get_loc("Economic Traffic")] = float(df.iloc[i]['Heavy Truck'])*Heavy_value+float(df.iloc[i]['Medium Truck'])*Medium_value + float(df.iloc[i]['Small Truck'])*Small_value


    return(df)
    #question: do we also want to rename any single R or L link numbers? Or is this okay? I think it is more informative this way...?

In [8]:
#STEP 4.3 - Load in functions
#function to return the 'average number of lanes over a segment of road'...weighted by the length for which the road
#has that many lanes
#input arguments: start chainage of road segment (km), end chainage of road segment (km), segment length (km), data frame of lane/width dtaa for that road
#output arguments: 'average number of lanes over that segment of road'
def average_lanes (a,b,segment_length, df):
    weighted_lanes = 0.0 
    
    for i in range(len(df)):
        start = df.iloc[i]['startChainage']
        end = df.iloc[i]['endChainage']
        lanes = df.iloc[i]['nrLanes']
        if not((a > end) or (b < start)):
            
            if a <= start and b >= end:
                #entire part of the lane segment is on the road segment
                weighted_lanes += (end-start)*lanes

            elif a >= start and b>= end:
                #end part of the lane segment is on the road segment
                weighted_lanes += (end-a)*lanes

            elif (a < start) and (b < end):
                #beginning part of the lane segment is on the road segment
                weighted_lanes += (b-start)*lanes

            elif (a >= start) and (b<= end):
                #entire road segment is within a lane segment
                weighted_lanes += (b-a)*lanes

        #else no part of that lane segment is on the road segment
    if weighted_lanes == 0.0:
        return (-1) #no lane data found that matched, -1 is the indicator of this error
    
    
        #finally, return the number of lanes, averaged by length of the total lanes
    return (weighted_lanes/segment_length)

In [10]:
#STEP 5 - Get traffic data for all road segments
#MAIN BLOCK OF CODE, PART 1: Road Segments

#create the output data frame with corresponding columns for output
df_traffic_allroads = pd.DataFrame(columns =["Road name","Link no","Link Name","Start LRP","Start Offset","Start Chainage","End LRP","End Offset","End Chainage","Link Length", "Economic Traffic","No Lanes","Heavy Truck","Medium Truck","Small Truck","Large Bus","Medium Bus","Micro Bus","Utility","Car","Auto Rickshaw","Motor Cycle","Bi-Cycle","Cycle Rickshaw","Cart","Motorized Total","Non Motorized Total","Total Traffic"])

#get list of all the roads 
roadnames = pd.read_csv('_roadnames_list.csv') # need to have roadnames list in same folder path 
roadnames_list = list(roadnames.columns.unique())

#iterate through each road and append it to the main dataframe using defined function htm_import
for roadname in roadnames_list:
    df_traffic_allroads = df_traffic_allroads.append(htm_import(roadname), ignore_index = True, sort = False)


#call function to process the data from htm: ie. remove duplicates, and find the criticality of each segment
df_traffic_allroads = process_traffic(df_traffic_allroads)

#next section of code (for loop) calls the function to calculate the average number of lanes in each road segment
#iterate through each road segment on the database
road_name_previous = 'Na' #initialise for first comparison
for i in range(len(df_traffic_allroads)):
    
    road_name_current = df_traffic_allroads.iloc[i]['Road name']
    #check if segmnet of road is on a different road
    if (road_name_current != road_name_previous):
        #open new file
        filename = "RMMS/{}.widths.processed.txt".format(road_name_current)
        f =  open(filename, "rb")
        df_lanes = pd.read_csv(f, sep='\t', lineterminator='\n')
        df_lanes.columns = ['roadNo', 'roadId', 'startChainage', 'endChainage', 'width','nrLanes']

    a = df_traffic_allroads.iloc[i]['Start Chainage']
    b = df_traffic_allroads.iloc[i]['End Chainage']
    length = df_traffic_allroads.iloc[i]['Link Length']
    
    #calculate average value of lanes using function and update the data frame
    df_traffic_allroads.iloc[i, df_traffic_allroads.columns.get_loc('No Lanes')] = average_lanes(a,b,length,df_lanes)
    
    #pass into memory for next comparison
    road_name_previous = road_name_current

#call function that converts all traffic modes into traffic density for that mode
df = df_traffic_allroads
for i in range(len(df)):
    lanes = df.iloc[i]['No Lanes']

    if  lanes == -1 :
    #mark as NA using numpy.nan
        df.iloc[i, df.columns.get_loc('Heavy Truck')] = np.nan
        df.iloc[i, df.columns.get_loc('Medium Truck')] = np.nan
        df.iloc[i, df.columns.get_loc('Small Truck')] = np.nan
                    
        df.iloc[i, df.columns.get_loc('Large Bus')] = np.nan
        df.iloc[i, df.columns.get_loc('Medium Bus')] = np.nan
        df.iloc[i, df.columns.get_loc('Micro Bus')] = np.nan
                    
        df.iloc[i, df.columns.get_loc('Utility')] = np.nan
                    
        df.iloc[i, df.columns.get_loc('Car')] = np.nan
        df.iloc[i, df.columns.get_loc('Auto Rickshaw')] =np.nan
        df.iloc[i, df.columns.get_loc('Motor Cycle')] = np.nan
        df.iloc[i, df.columns.get_loc('Bi-Cycle')] = np.nan    
        df.iloc[i, df.columns.get_loc('Cycle Rickshaw')] = np.nan
        df.iloc[i, df.columns.get_loc('Cart')] = np.nan
                    
        df.iloc[i, df.columns.get_loc('Motorized Total')] = np.nan     
        df.iloc[i, df.columns.get_loc('Non Motorized Total')] = np.nan      
        df.iloc[i, df.columns.get_loc('Total Traffic')] = np.nan
                           
    else:

        #convert from traffic to traffic density
        df.iloc[i, df.columns.get_loc('Heavy Truck')] = float((df.iloc[i]['Heavy Truck']))/lanes
        df.iloc[i, df.columns.get_loc('Medium Truck')] = float((df.iloc[i]['Medium Truck']))/lanes
        df.iloc[i, df.columns.get_loc('Small Truck')] = float((df.iloc[i]['Small Truck']))/lanes
                    
        df.iloc[i, df.columns.get_loc('Large Bus')] = float((df.iloc[i]['Large Bus']))/lanes
        df.iloc[i, df.columns.get_loc('Medium Bus')] = float((df.iloc[i]['Medium Bus']))/lanes
        df.iloc[i, df.columns.get_loc('Micro Bus')] = float((df.iloc[i]['Micro Bus']))/lanes
                    
        df.iloc[i, df.columns.get_loc('Utility')] = float((df.iloc[i]['Utility']))/lanes
                    
        df.iloc[i, df.columns.get_loc('Car')] = float((df.iloc[i]['Car']))/lanes
        df.iloc[i, df.columns.get_loc('Auto Rickshaw')] = float((df.iloc[i]['Auto Rickshaw']))/lanes
        df.iloc[i, df.columns.get_loc('Motor Cycle')] = float((df.iloc[i]['Motor Cycle']))/lanes
        df.iloc[i, df.columns.get_loc('Bi-Cycle')] = float((df.iloc[i]['Bi-Cycle']))/lanes       
        df.iloc[i, df.columns.get_loc('Cycle Rickshaw')] = float((df.iloc[i]['Cycle Rickshaw']))/lanes
        df.iloc[i, df.columns.get_loc('Cart')] = float((df.iloc[i]['Cart']))/lanes
                    
        df.iloc[i, df.columns.get_loc('Motorized Total')] = float((df.iloc[i]['Motorized Total']))/lanes     
        df.iloc[i, df.columns.get_loc('Non Motorized Total')] = float((df.iloc[i]['Non Motorized Total']))/lanes      
        df.iloc[i, df.columns.get_loc('Total Traffic')] = float((df.iloc[i]['Total Traffic']))/lanes        

   








In [9]:
#STEP 6 - Get summary of traffic data for all roads
#MAIN BLOCK OF CODE, PART 2: Summary per Road
#this section of code agregates the data from the road segments, to the total road (for each road)

df1 = df #road segment data frame (input), indexed with i
df2 = pd.DataFrame(columns =["Road name","Economic Traffic","No Lanes","Heavy Truck","Medium Truck","Small Truck","Large Bus","Medium Bus","Micro Bus","Utility","Car","Auto Rickshaw","Motor Cycle","Bi-Cycle","Cycle Rickshaw","Cart","Motorized Total","Non Motorized Total","Total Traffic"], index = range(1000))
df2 = df2.fillna(0) #same data frame aggregated to road level (output), indexed with j
j = 0
segment_length = 0
total_length = 0

road_name_previous = 'Na' #initialise for first comparison
for i in range(len(df1)):

#iterating through segments
    segment_length = df1.iloc[i]['Link Length'] #record segment length each time

    road_name_current = df1.iloc[i]['Road name']
    #check if segmnet of road is on a different road i.e. the next road in the df
    if (road_name_current != road_name_previous):
        #divide data by number of segments in the road to get average
        total_length = 0 #reset number of segments for the road
        j = j+1 #update to next row in road_summary dataframe for next road
        df2.iloc[j, df.columns.get_loc('Road name')] = road_name_current
        #set the road name for the row in road_summary
    else:
        total_length += segment_length
    if type(df1.iloc[i]['Economic Traffic'] ) != str:
        #sum the data from this segment to the data already in the summary file for that roads row
        df2.iloc[j, df2.columns.get_loc('No Lanes')] += (df1.iloc[i]['No Lanes'])*segment_length      
        df2.iloc[j, df2.columns.get_loc('Economic Traffic')] += (df1.iloc[i]['Economic Traffic']) *segment_length

        df2.iloc[j, df2.columns.get_loc('Heavy Truck')] += (df1.iloc[i]['Heavy Truck'])*segment_length
        df2.iloc[j, df2.columns.get_loc('Medium Truck')] += (df1.iloc[i]['Medium Truck'])*segment_length
        df2.iloc[j, df2.columns.get_loc('Small Truck')] += (df1.iloc[i]['Small Truck'])*segment_length

        df2.iloc[j, df2.columns.get_loc('Large Bus')] += (df1.iloc[i]['Large Bus'])*segment_length
        df2.iloc[j, df2.columns.get_loc('Medium Bus')] += (df1.iloc[i]['Medium Bus'])*segment_length
        df2.iloc[j, df2.columns.get_loc('Micro Bus')] += (df1.iloc[i]['Micro Bus'])*segment_length

        df2.iloc[j, df2.columns.get_loc('Utility')] += (df1.iloc[i]['Utility'])*segment_length
                    
        df2.iloc[j, df2.columns.get_loc('Car')] += (df1.iloc[i]['Car'])*segment_length
        df2.iloc[j, df2.columns.get_loc('Auto Rickshaw')] += (df1.iloc[i]['Auto Rickshaw'])*segment_length
        df2.iloc[j, df2.columns.get_loc('Motor Cycle')] += (df1.iloc[i]['Motor Cycle'])*segment_length
        df2.iloc[j, df2.columns.get_loc('Bi-Cycle')] += (df1.iloc[i]['Bi-Cycle'])*segment_length      
        df2.iloc[j, df2.columns.get_loc('Cycle Rickshaw')] += (df1.iloc[i]['Cycle Rickshaw'])*segment_length
        df2.iloc[j, df2.columns.get_loc('Cart')] += (df1.iloc[i]['Cart'])*segment_length
                    
        df2.iloc[j, df2.columns.get_loc('Motorized Total')] += (df1.iloc[i]['Motorized Total'])*segment_length    
        df2.iloc[j, df2.columns.get_loc('Non Motorized Total')] += (df1.iloc[i]['Non Motorized Total'])*segment_length      
        df2.iloc[j, df2.columns.get_loc('Total Traffic')] += (df1.iloc[i]['Total Traffic'])*segment_length       

    
    #pass into memory for next comparison
    road_name_previous = road_name_current 
    
#save output file
df2.to_csv('summary_data_traffic_per_road.csv',sep= ',', header=True)            

In [22]:
#STEP 7 - Renaming / framing dataframes
df_traffic = df

#create empty float colums for summing average delay time & delete prerome points from previous assignment
df_all_bmms["average_delay"] = np.nan

#rename some columns in the traffic data for panda operations
df_traffic = df_traffic.rename(columns={'Total Traffic': 'Total_Traffic','Economic Traffic': 'Economic_Traffic','End Chainage': 'End_Chainage', 'No Lanes': 'No_Lanes','Start Chainage': 'Start_Chainage','Link no': 'Link_No','Link Name':'Link_Name',"Link Length": "Link_Length","Road name":"Road_Name"})

In [12]:
#STEP 8 - fix some of the missing data for segments (90 segments did not allign --> see report)
for row in range(len(df_traffic)):
    if row > 0: 
        if df_traffic.Road_Name.iloc[row-1,] == df_traffic.Road_Name.iloc[row,]:
            preceding_segment_end_chainage = df_traffic.End_Chainage.iloc[row-1,]
            if df_traffic.End_Chainage.iloc[row-1,] != df_traffic.Start_Chainage.iloc[row,]:
                df_traffic.Start_Chainage.iloc[row,] = preceding_segment_end_chainage

In [23]:
#STEP 9 - CHECK THE DATA CLEANING SCRIPT FOR MISSING SEGMENTS --> no non alligned segments should be present
df_traffic["missing_segment"] = ""
for row in range(len(df_traffic)):
    if row > 0:
        if df_traffic.Road_Name.iloc[row-1,] == df_traffic.Road_Name.iloc[row,]:
            if df_traffic.End_Chainage.iloc[row-1,] != df_traffic.Start_Chainage.iloc[row,]:
                df_traffic.missing_segment.iloc[row,] = True
df_traffic[df_traffic["missing_segment"] == True]

Unnamed: 0,Road_Name,Link_No,Link_Name,Start LRP,Start Offset,Start_Chainage,End LRP,End Offset,End_Chainage,Link_Length,...,Car,Auto Rickshaw,Motor Cycle,Bi-Cycle,Cycle Rickshaw,Cart,Motorized Total,Non Motorized Total,Total_Traffic,missing_segment
107,N102,N102-4,Companiganj (Int with Z1205) - Mirpur (Int wit...,LRP024,1242,25.473,LRP038,408,38.866,13.393,...,116.15,1269.61,242.86,62.3492,254.928,0,3040.02,317.277,3357.3,True
117,N104,N104-4,Selonia(Int.with Z1443)-Daganbhuiyan (Int.with...,LRP008,485,8.315,LRP014,994,14.809,6.494,...,87,2341.5,298.5,129,664,0,3899.5,793,4692.5,True
127,N105,N105-4,Purbachal Road- Ulukhola (Int.with Z3010),LRP020d,0,21.221,LRP028,895,28.610,7.389,...,394.054,560.694,304.853,49.9919,177.912,0,4308.62,227.904,4536.52,True
133,N106,N106-4,Munshihat (Int.with Z1629)-Raozan (Int.with Z1...,LRP024,828,24.297,LRP025,820,25.278,0.981,...,148,2740.5,539,288,471.5,0,4454,759.5,5213.5,True
162,N2,N2-4,Bhulta (Int.with R202)-Bhulta (Int.with R203),LRP011,915,11.111,LRP011,1781,11.977,0.866,...,561.452,1560.48,409.749,149.797,1221.62,0,6764.86,1371.42,8136.28,True
195,N207,N207-4,Moulvibazar (Int.with N208)-Sherpur,LRP043,12,42.917,LRPE,0,67.985,25.068,...,161.956,1088.38,195.942,59.1602,62.9364,0,2113.82,122.097,2235.92,True
198,N208,N208-4,Daudabad (Int.with Z2832)-Royal City Chottor(i...,LRP052,385,51.050,LRP054,677,53.342,2.292,...,405.5,1116.5,291,44,36.5,0,2664,80.5,2744.5,True
245,N302,N302-4,Ashulia(Int.with N511)-Zerabo (Int.with Z3007),LRP008,757,8.730,LRP008,3397,11.370,2.640,...,2415.05,1089.72,883.845,32.3899,369.156,0,11557.4,401.546,11959,True
252,N4,N4-4,Kaliakoir (Int.with R315)-Gorai(Int.with Z4011),LRP022,148,20.864,LRP027,492,26.218,5.354,...,1242,1173,660,68,649,0,13120,717,13837,True
268,N401,N401-4,Mymensingh engineering College(Int.withN309)-M...,LRP040,18,40.043,LRPE,0,46.980,6.937,...,102.744,631.333,210.444,97.3361,348.788,9.0126,1705.63,455.136,2160.77,True


In [28]:
#STEP 10 - RUN THIS LINE OF CODE FOR LINKING THE ROAD DATA WITH THE TRAFFIC (PER SEGMENT DATA) USING THE LINK NUMBER (lINK_NO)
df_selected = df_all_bmms
df_selected["Road_segment_no"] =""
df_selected["Road_segment_label"] =""

for row_number in range(len(df_selected)):
    chainage = df_selected.chainage.iloc[row_number,]
    road_number = df_selected.road.iloc[row_number,]
    
    all_segments_in_road_df = df_traffic[df_traffic["Road_Name"] == road_number ]
    
    for segments in range(len(all_segments_in_road_df)):
        segment_start_chainage = all_segments_in_road_df.Start_Chainage.iloc[segments,]
        segment_end_chainage = all_segments_in_road_df.End_Chainage.iloc[segments,]
        segment_no = all_segments_in_road_df.Link_No.iloc[segments,]
        segment_label = all_segments_in_road_df.Link_Name.iloc[segments,]
                    
        if chainage >= segment_start_chainage:
            if chainage < segment_end_chainage:       
                df_selected.Road_segment_no.loc[row_number,] = segment_no
                df_selected.Road_segment_label.loc[row_number,] = segment_label    

df_BMSS_traffic_link = df_selected

In [29]:
#STEP 11 - CHECK HOW MUCH MISSING DATA IS THERE
#3305 bridges mis segment data to this point
number_of_missing_bridges_without_segment_data = (df_BMSS_traffic_link['Road_segment_no'].values == '').sum()
Missing_segment_data_brigdes_df = df_BMSS_traffic_link[df_BMSS_traffic_link['Road_segment_no'] == '']

#The missing bridges are mostly on Z roads -> impact on system results should not be that high and there is also often no traffic data for those roads
list_of_road_with_missing_segment_data = Missing_segment_data_brigdes_df.road.unique()

#drop the missing segments
df_BMSS_traffic_link = df_BMSS_traffic_link[df_BMSS_traffic_link["Road_segment_no"] != '']

In [30]:
#STEP 12 - Determine vulnerability of bridges based on their scenario likelihood --> SEE REPORT FOR EXPLANATION
scenario = {'Scenario': [1,2,3,4,5,6,7,8], 'A': [0,0,0,0,0,0,0.05,0.10], 
            'B': [0,0,0,0,0.05,0.10,0.10,0.20],'C':[0,0,0.05,0.10,0.10,0.20,0.20,0.40],'D':[0,0.05,0.10,0.20,0.20,0.40,0.40,0.80],
            'Scenario_likelihood': [0.222,0.194,0.167,0.139,0.111,0.083,0.056,0.028]}

scenario_df = pd.DataFrame(data=scenario)
scenario_df = scenario_df.drop(scenario_df.index[0])

scenario_df["A_overall"] = scenario_df.Scenario_likelihood *scenario_df.A
scenario_df["B_overall"] = scenario_df.Scenario_likelihood *scenario_df.B
scenario_df["C_overall"] = scenario_df.Scenario_likelihood *scenario_df.C
scenario_df["D_overall"] = scenario_df.Scenario_likelihood *scenario_df.D

sum_row = {col: scenario_df[col].sum() for col in scenario_df}
scenario_df_sum = pd.DataFrame(sum_row, index=["Total"])
scenario_df = scenario_df.append(scenario_df_sum)
#---------> A_overall total is vulnerability of bridge

In [35]:
#STEP 13 - RUN THIS LINE OF CODE FOR CALCULATING THE VULNERABILITY OF EACH ROAD SEGMENT IN THE TRAFFIC DATA FILE
#add coumns to df_traffic for output of vulnerability data
df_traffic["Total_A"] = ""
df_traffic["Total_B"] = ""
df_traffic["Total_C"] = ""
df_traffic["Total_D"] = ""
df_traffic["segment_vulnerability"] = ""
df_traffic["segment_vulnerability_length"] = ""

#iterate through all segments in the traffic file to get the vulnerability per segment
for i in range(len(df_traffic)):
    segment_name = df_traffic.iloc[i]['Link_No']
    
    #make small dataframe to get all bridges on that segment
    df_segments = df_BMSS_traffic_link[df_BMSS_traffic_link["Road_segment_no"] == segment_name]
    Bridge_count = df_segments.groupby(["condition"]).count()
    
    #create series object to count the bridges
    Bridge_count = Bridge_count["road"]
    categories = Bridge_count.index.values.tolist()
    
    #count number of bridges per category and check if they are they are there
    if 'A' in categories:
        total_A_Bridges = Bridge_count['A']
    else:
        total_A_Bridges = 0
    if 'B' in categories:
        total_B_Bridges = Bridge_count['B']
    else:
        total_B_Bridges = 0
    if 'C' in categories:
        total_C_Bridges = Bridge_count['C']
    else:
        total_C_Bridges = 0
    if 'D' in categories:
        total_D_Bridges = Bridge_count['D']
    else:
        total_D_Bridges = 0 
    
    #determine segment vulnerability based on scenario likehood and number of bridges on that segment
    segment_vulnerability = total_A_Bridges * scenario_df.iloc[7]['A_overall'] + total_B_Bridges * scenario_df.iloc[7]['B_overall'] + total_C_Bridges * scenario_df.iloc[7]['C_overall'] + total_D_Bridges * scenario_df.iloc[7]['D_overall'] 
        
    #store total bridges in segment traffic file    
    df_traffic.iloc[i, df_traffic.columns.get_loc('Total_A')] = total_A_Bridges
    df_traffic.iloc[i, df_traffic.columns.get_loc('Total_B')] = total_B_Bridges
    df_traffic.iloc[i, df_traffic.columns.get_loc('Total_C')] = total_C_Bridges
    df_traffic.iloc[i, df_traffic.columns.get_loc('Total_D')] = total_D_Bridges
    
    #store vulnerability in segment traffic file    
    df_traffic.iloc[i, df_traffic.columns.get_loc('segment_vulnerability')] = segment_vulnerability

In [36]:
#STEP 14 - NORMALIZING vulnerability & Criticality
max_vul = df_traffic.segment_vulnerability.max()
min_vul = df_traffic.segment_vulnerability.min()
max_cri = df_traffic.Economic_Traffic.max()
min_cri = df_traffic.Economic_Traffic.min()
#scaling the values on a scale from 0 to 1
df_traffic['segment_vulnerability_normalized'] = (df_traffic.segment_vulnerability - min_vul) / max_vul
df_traffic['Economic_Traffic_normalized'] = (df_traffic.Economic_Traffic - min_cri) / max_cri

df_traffic

Unnamed: 0,Road_Name,Link_No,Link_Name,Start LRP,Start Offset,Start_Chainage,End LRP,End Offset,End_Chainage,Link_Length,...,Total_Traffic,missing_segment,Total_A,Total_B,Total_C,Total_D,segment_vulnerability,segment_vulnerability_length,segment_vulnerability_normalized,Economic_Traffic_normalized
0,N1,N1-1,Jatrabari - Int.with Z1101 (Left) (Left),LRPS,0,0.000,LRPS,822,0.822,0.822,...,3058.12,,0,0,0,0,0,,0,0.71328
2,N1,N1-2R,Int.with Z1101 - Signboard (Left) R111 (Right),LRPS,822,0.822,LRPS,4175,4.175,3.353,...,3719.64,,1,0,0,0,0.0056,,0.0135135,0.671287
3,N1,N1-3,Signboard - Shimrail (Left)R110 (Left),LRPS,4175,4.175,LRPS,7181,7.181,3.006,...,3233.96,,1,0,0,0,0.0056,,0.0135135,0.388635
5,N1,N1-4,Shimrail - Katchpur (Left)N2 (Left),LRPS,7181,7.181,LRP009,260,8.763,1.582,...,3036.55,,0,0,0,0,0,,0,0.386361
7,N1,N1-5,Katchpur - Madanpur (Left)N105 (Left),LRP009,260,8.763,LRP012,439,11.936,3.173,...,5509.78,,4,0,0,0,0.0224,,0.0540541,0.668758
9,N1,N1-6,Madanpur - Langalband (Left)Z1061 (Left),LRP012,439,11.936,LRP013,3411,15.935,3.999,...,5926.6,,7,0,0,0,0.0392,,0.0945946,0.668758
11,N1,N1-7,Langalband - Mograpara Chowrasta (Left)Z1089 (...,LRP013,3411,15.935,LRP013,7520,20.044,4.109,...,6058.5,,4,0,0,0,0.0224,,0.0540541,0.668758
13,N1,N1-8,Mograpara(Int.with Z1089)-Meghna Bridge West E...,LRP013,7520,20.044,LRP022,1935,23.564,3.520,...,3441.84,,7,0,0,0,0.0392,,0.0945946,0.589131
15,N1,N1-9,Meghna Bridge Satrt-Bhaberchar (Left) z1063 (L...,LRP022,1935,23.564,LRP031,162,30.936,7.372,...,3671.63,,15,0,0,0,0.084,,0.202703,0.589131
17,N1,N1-10,Bhaberchar(Int.with Z1063)-Daudkandi Bridge (L...,LRP031,162,30.936,LRP033,3664,36.271,5.335,...,3671.62,,18,0,0,0,0.1008,,0.243243,0.589131


In [37]:
#STEP 15 - Save the dataframe to CSV's to use them for the plotting scripts
df_traffic.to_csv('Traffic_segment_data.csv', sep=',', encoding = 'UTF-8')
df_BMSS_traffic_link.to_csv('Bridge_data_with_link_to_traffic.csv', sep=',', encoding = 'UTF-8')