# Generate OD data from e-scooter positions

In this notebook we process the e-scooter position data, collected using the notebook "1b_collect-escooter-data.ipynb" to generate Origin-Destination data that we can use to define a trips data set.

## Setup

In [None]:
%run -i path.py
%run -i setup.py

## Functions

In [None]:
%run -i functions.py

In [None]:
# create a single dataframe with the collected data

start = 500
end = 555

start_day = '2021-05-26'
end_day = '2021-10-28'

frames = []

for i in range(start,end):
    
    print(i+1,'/',end)
    
    # load the data
    frame = pd.read_csv(PATH['data_api']+ placeid + "/"+'TimeSlot'+str(i)+'_Start'+str(start_day)+'_End'+str(end_day)+'.csv')

    # delete the overlapping data
    frame = frame.drop_duplicates(['id'],ignore_index=True)
    
    # append to a list of df
    frames.append(frame)
    
    clear_output(wait=True)

In [None]:
# gdf of the edges of Turin street network
gdf_edges = gpd.read_file(PATH['data']+'graph_shapefile/Turin/edges.shp')

In [None]:
# find unique values for 'code' in all df, then create a list with all the 'code' unique values found in data collection

unique_code = find_unique_code(frames)

# remove the data in which 'code' refears to more than 1 e-scooter

code_list, doubles = remove_doubles(frames,unique_code)

In [None]:
print("Number of vehicles with unique code: ",len(code_list))
print("Number of codes that we delete because they are not unique: ",len(doubles))

The function OD_data(frames,code,d_T,t_T) finds the positions of the vehicle identified by the variable "code" that are stored in the list of dataframes "frames". Then it identifies the movement made by the given vehicle. 2 threshold are defined to define the movements:
* "d_T" -> threshold (in meters) to define a movement: if the position of a vehicle changes of > d_T meters from frames[i-1] to frames[i], we consider it as a movement.
* "t_T" -> threshold (in minutes) to define a movement: we do not consider a change in the position as a movement if the time between the detection in frames[i-1] and frames[i] is > t_T minutes

It returns the coordinates of the Origin (O_lat and O_lon), the battery level of the Origin (O_battery), the coordinates of the Destination (D_lat and D_lon), the battery level of the Destination (D_battery), the codes that identifies the vehicle that made this movements (codes).

In [None]:
# find OD_data for TimeSlot from i = start to i = end

d_T = 100 # threshold (in meters)
t_T = 90 # threshold (in minutes)

O_lat_tot = []
O_lon_tot = []
O_time_tot = []
O_battery_tot = []
D_lat_tot = []
D_lon_tot = []
D_time_tot = []
D_battery_tot = []
codes_tot = []

for j,code in enumerate(code_list):
    
    print(j+1,'/',len(code_list))
    
    O_lat,O_lon,O_time,O_battery,D_lat,D_lon,D_time,D_battery,codes = OD_data(frames,code,d_T ,t_T)
    
    O_lat_tot = O_lat_tot + O_lat
    O_lon_tot = O_lon_tot + O_lon
    O_time_tot = O_time_tot + O_time
    O_battery_tot = O_battery_tot + O_battery
    D_lat_tot = D_lat_tot + D_lat
    D_lon_tot = D_lon_tot + D_lon
    D_time_tot = D_time_tot + D_time
    D_battery_tot = D_battery_tot + D_battery
    codes_tot = codes_tot + codes
    
    clear_output(wait=True)

In [None]:
data = {
    'O_lat' : O_lat_tot,
    'O_lon' : O_lon_tot,
    'D_lat' : D_lat_tot,
    'D_lon' : D_lon_tot,
    'O_time' : O_time_tot,
    'D_time' : D_time_tot,
    'O_battery' : O_battery_tot,
    'D_battery' : D_battery_tot,
    'code' : codes_tot
}

In [None]:
# set the od_data into a df
OD_matrix0_end = pd.DataFrame(data, columns = list(data.keys()))

#sort dataframe by O_time
OD_matrix0_end = OD_matrix0_end.sort_values(by='O_time')
OD_matrix0_end = OD_matrix0_end.reset_index()
OD_matrix0_end = OD_matrix0_end.drop(['index'],axis=1)

# add a column with O_battery - D_battery: we use this value to remove the movements made by the company (negative values)
OD_matrix0_end['batt_diff'] = OD_matrix0_end['O_battery']-OD_matrix0_end['D_battery']
OD_matrix0_end

In [None]:
# remove movements made by the company
a = OD_matrix0_end[OD_matrix0_end['D_battery'] < 99] # remove data if the destination battery level is > 98 (it means it did not decrease after a trip)
clean_OD = a[a['batt_diff']>0].reset_index()
clean_OD = clean_OD.drop(['index'],axis=1)
clean_OD

In [None]:
# compute the euclidean distance (in meters) between Origin-Destination
distance = list()

for i in range(len(clean_OD)):
    
    print(i+1,'/',len(clean_OD))

    distance_meters = haversine((clean_OD.iloc[i]['D_lat'],clean_OD.iloc[i]['D_lon']), (clean_OD.iloc[i]['O_lat'], clean_OD.iloc[i]['O_lon']), unit="m")
    
    distance.append(distance_meters)
    
    clear_output(wait=True)
    
clean_OD["euclid_distance"] = distance

In [None]:
# export the OD data
clean_OD.to_csv(PATH['data']+ placeid + "/" + "OD_data.csv")