# Drive Share Example Analysis

This is an outline for a combined documentation / example anaylsis of the driveshare.me project by Eric Hong.

## Downloading the Data

Data can be obtained as a json file from driveshare.me. 
Select `Download all curated paths` in the bottom right.


This file contains an array of segment objects.  Each segment object contains its name, a short description, its boundaries in Encoded Polyline format, the number of paths recorded in that segment, and an array of path objects.  The path objects contain the path in Encoded Polyline format and a vehicle object with the information of the vehicle that drove that path.

## Loading Data into Julia

In [None]:
using DriveShare

anon_paths = get_anonymized_paths()
total_duration_s = 0
total_distance_m = 0
for path_data in anon_paths
    total_duration_s += path_data["duration_s"]
    total_distance_m += path_data["distance_m"]
end
println("total duration (seconds): ", total_duration_s)
println("total distance (meters)): ", total_distance_m)
#=
Total distance calculated is less than what is shown on driveshare.me
because get_anonymized_paths() only returns trip objects that contain paths.
As explained on https://developer.automatic.com/api-reference/: 
"In cases where no GPS signal was available during the trip, path may be returned as null."
=#
# original path for picture below: gmpwFlphmV}ClA{@ZyEjBWLYP[TYXYXW\OXQ\Q`@Ob@Qh@Mf@w@`D[lAOj@Of@O`@Qb@a@|@GL{AjCaClEUb@c@v@a@x@S\eAjBu@vAILgAnBsAxBMROPKJOLQLQJSJYL]JmFvAqD`AaAXcAXyA\w@RqAZUFWHKJGFIJY`@U^GLu@nBIZWx@IT

![Ma Image](json.png)

In [None]:
high_level = get_highlevel_info() # DataFrame
println(high_level)
num_trips = size(high_level, 1)
println("There are a total of ", num_trips, " trips.")
# start and end gps points rounded to two decimal places

## Basic Data Analysis

In [None]:
# - number of trips                check
# - number of users                privacy?
# - total trip duration            check
# - types of vehicles              ? 
using Plots
plotlyjs()

utmx = [0.0,93.67476114335459,129.0068562407243,258.310421025474,272.85895561617616,289.04054244748704,306.85518177101807,324.2245617640426,341.5939422175931,358.51807028617145,370.6916806547999,384.49835693583094,398.8989497196487,412.5573579881509,428.1458045972207,441.3590147055917,494.50905559419397,520.6387710826328,535.4852285669396,549.7377730731774,563.0994243327007,577.797204302155,604.668793291095,610.9041802591173,679.4929802047665,777.6259734020132,794.4021856458403,821.4222109144029,847.700123395586,862.546280790132,914.953379365316,956.0776008604846,963.3522073698321,1017.3926200402618,1079.1528782167302,1089.3967643152835,1100.3827655445114,1108.3995123494976,1118.7915016088978,1130.2226079700063,1141.356706272697,1153.5299202143833,1169.11748975463,1186.4862818731617,1323.2093238184925,1425.4916522040094,1463.6434472231851,1502.8343336881364,1554.049436078264,1586.1145946147606,1632.8761918741343,1645.4944206642267,1659.448759970311,1667.4654366582604,1672.8098878380506,1679.7874547967103,1698.3449371558595,1714.5272043775867,1720.762673288609,1765.4509659462822,1774.8046284340885,1795.8872077058945,1804.3498648173481]
utmy = [0.0,0.0,-0.6406935752068108,-0.4922669548598293,0.3580870124049653,2.399274224038095,5.631280409774931,10.834724238066528,16.03812226959559,23.21294828963967,30.367863564726676,38.71357207745689,48.64038509793008,59.7480506255705,72.83704697677643,85.91608276114988,139.02231995783572,164.3893165942827,178.6587887780046,191.3470932874231,201.66368658526204,212.38043326826616,230.2513967923984,234.22379243563276,271.60507761566873,327.6557323644306,337.59102754484593,352.6988055511682,368.9873486419185,376.9412372722136,405.9652773837354,430.2069263187169,433.7884625481806,464.002038683725,495.82499688298327,500.99700307152705,504.9880764230471,507.3885000605535,509.7985597582355,511.81818146632673,513.0472991580918,513.8859783320714,514.3438507349563,513.2303652961298,501.5502321458573,492.8868800757915,490.27837876390817,487.27933127707394,481.5662146489903,478.5382593781705,473.59620683291865,472.46305371236963,471.72987402052365,474.1297749924206,475.7297035539032,478.5200342779345,486.8814487887755,495.23327908856413,499.20443024408024,532.9256278783096,542.0391580665012,560.2756369072218,567.0178574712417]
plot(utmx, utmy)

In [None]:
# I recommend Plots.jl with the PlotlyJS backend
# Plots:
# - {MPG, Cost, Distance, Score, Duration} vs. time for various users
# - Scatter hard brake rate / hard accel rate / min over 70 / vehicle type where each point is a user

## Path Data

In [None]:
# Extract the GPS points in UTM.

In [None]:
# Plot histogram of separation distance between GPS pts over all trips
function distance(a, b)
    return sqrt((a[1]-b[1])^2 + (a[2]-b[2])^2)
    # Distance is in meters.
end

separation_distances = []
all_separation_distances = []
for path_data in anon_paths
    for i = 1:length(path_data["utm"])-1
        d = distance(path_data["utm"][i], path_data["utm"][i+1])
        push!(all_separation_distances, d)
        if d < 150          # Else outliers make it super hard to read anything
            push!(separation_distances, d)
        end
    end
end
histogram(separation_distances, nbins=70)

In [None]:
# If there is any way to get a sense of how close the GPS pts are to the lane centers that would be awesome
# soz

In [None]:
# Plot histogram of (separation distance) / (mean trip velocity) between GPS pts over all trips
# what does this mean?
counter = 0
separation_time = []
for path_data in anon_paths
    mean_trip_velocity = path_data["distance_m"] / path_data["duration_s"] # meters/second
    for i = 1:length(path_data["utm"])-1   
        counter += 1
        t = all_separation_distances[counter] / mean_trip_velocity # seconds
        if t < 15
            push!(separation_time, t)
        end
    end
end
histogram(sep_div_by_mtv, nbins=70)

In [None]:
# use Interact.jl @manipulate to select a trip out of full set of trips
# Plot trip GPS pts (axis-equal)

# why interact.jl? i've plotted one with plotly above but i don't know how to makeit axis-equal.

In [None]:
# Fit a 2D spline through the points using Dierckx.jl
# Plot GPS pts and spline fit (axis-equal)

## Classification

See whether we can extract some simple features from trips and identify who they belong too. (BayesNet?)
Looking at the user id is of course cheating and we don't use the car data for this that would be best. Things like MPG, max and min accel, etc. are best.
Talk to me before doing this one

What are the most important features? Does someone in our lab drive significantly differently?