# Drive Share Example Analysis

This is an outline for a combined documentation / example anaylsis of the driveshare.me project by Eric Hong.

## Downloading the Data

Anonymized path data can be obtained as a json file and high level data as a csv file from http://driveshare.me. 
Click `anonymized paths` or `high level data` in the bottom right.

Examples of how to use the data are shown below.

## Loading Data Directly into Julia

In [None]:
using DriveShare
anon_paths = get_dataset(:trace) # Anonymized paths now loaded.
nothing

ERROR (unhandled task failure): readcb: connection reset by peer (ECONNRESET)
 in yieldto at ./task.jl:71
 in wait at ./task.jl:371
 in wait at ./task.jl:286
 in wait_readnb at stream.jl:374
 in eof at stream.jl:96
 [inlined code] from /home/tim/.julia/v0.4/Requests/src/streaming.jl:66
 in anonymous at task.jl:63


Total distance calculated is less than what is shown on driveshare.me
because `get_dataset(:trace)` only returns trip objects that contain paths.
As explained in the [api-reference](https://developer.automatic.com/api-reference/): 
"In cases where no GPS signal was available during the trip, path may be returned as null."

![Ma Image](json.png)

Here we extract the total duration and distance of all trips.

In [None]:
total_duration_s = 0
total_distance_m = 0
for path_data in anon_paths
    total_duration_s += path_data["duration_s"]
    total_distance_m += path_data["distance_m"]
end
@printf("total duration (seconds): %15.3f\n", total_duration_s)
@printf("total distance (meters)): %15.3f\n", total_distance_m)

The High Level dataset is loaded the same way, but returns a DataFrame.

In [None]:
high_level = get_dataset(:highlevel) # DataFrame
names(high_level)

In [2]:
using DataFrames
high_level = readtable("/home/tim/Downloads/highlevel (1).csv");

In [3]:
using CrossfilterCharts
using DataFrames
df = DataFrame()
for sym in [:make, :model, :distance_m, :average_kmpl, :score_speeding, 
            :hard_brakes, :city_fraction, :highway_fraction]
    df[sym] = high_level[sym]
end

dc(df)

## Basic Data Analysis

In [16]:
@printf("%-50s %10d\n", "number of trips:", size(high_level, 1))
@printf("%-50s %10d\n", "number of unique vehicles:", length(unique(high_level[:vehicle_ID])))
@printf("%-50s %14.3f\n", "total duration [sec]", sum(high_level[:duration_s]))
@printf("%-50s %14.3f\n", "total distance [m]", sum(high_level[:distance_m]))
@printf("%-50s %10d\n", "number of unique vehicle models:", length(unique(high_level[:model])))

number of trips:                                         1759
number of unique vehicles:                                  5
total duration [sec]                                  1660843.400
total distance [m]                                   21452695.000
number of unique vehicle models:                            4


## Number of Trips per Vehicle

In [17]:
include("plot_helper.jl")
plot_number_of_trips_per_vehicle(high_level)

## Trip Distance per Vehicle


In [6]:
plot_histogram_per_vehcile(high_level, :distance_m)

## Fuel Cost Histogram per Vehicle 

In [7]:
plot_histogram_per_vehcile(high_level, :fuel_cost)

## Hard Accelerations per Vehicle

In [8]:
plot_histogram_per_vehcile(high_level, :hard_accels)

## Hard Brakes per Vehicle

In [9]:
plot_histogram_per_vehcile(high_level, :hard_brakes)

## Speeding Score per Vehicle

"driving score for speeding"

In [10]:
plot_histogram_per_vehcile(high_level, :score_speeding)

## Vehicle Comparison

In [11]:
include("plot_helper.jl")
targets = [:distance_m, :duration_s, :average_kmpl, :score_speeding,  :hard_brakes, :city_fraction, :highway_fraction,
    :duration_over_70_s, :duration_over_75_s, :duration_over_80_s, :idling_time_s]
targetA = targets[1]
targetB = targets[11]
scatter_targets(high_level, targetA, targetB)

## Path Data

Here we create a histogram of separation distances between GPS points over all trips.

In [12]:
separation_distances = []
for path_data in anon_paths
    for i = 1:length(path_data["utm"])-1
        d = norm(path_data["utm"][i] - path_data["utm"][i+1])
        push!(separation_distances, d)
    end
end
histogram(filter(x->x < 150, separation_distances), nbins=70)

LoadError: LoadError: UndefVarError: anon_paths not defined
while loading In[12], in expression starting on line 2

Here we create a histogram of (separation distance) / (mean trip velocity), an approximation of the time passed between each point, between GPS points over all trips.

In [13]:
counter = 0
separation_times = []
for path_data in anon_paths
    mean_trip_velocity = path_data["distance_m"] / path_data["duration_s"] # meters/second
    for i = 1:length(path_data["utm"])-1
        counter += 1
        t = separation_distances[counter] / mean_trip_velocity # seconds
        push!(separation_times, t)
    end
end
histogram(filter(x->x < 15, separation_times), nbins=70)

LoadError: LoadError: UndefVarError: anon_paths not defined
while loading In[13], in expression starting on line 3

In [14]:
# use Interact.jl @manipulate to select a trip out of full set of trips
# Plot trip GPS pts (axis-equal)

# why interact.jl? i've plotted one with plotly but i don't know how to makeit axis-equal.

utmx = Float64[]
utmy = Float64[]
path = anon_paths[1]["utm"]
for pt in path
    push!(utmx, pt[1])
    push!(utmy, pt[2])
end
plot(utmx, utmy, xlabel="x position [m]", ylabel="y position [m]", xlims=(-200,2000), ylims=(-200,1500))

LoadError: LoadError: UndefVarError: anon_paths not defined
while loading In[14], in expression starting on line 8

In [15]:
# Fit a 2D spline through the points using Dierckx.jl
# Plot GPS pts and spline fit (axis-equal)
using Dierckx

spl = Spline1D(utmx, utmy)
spl_plot = []
for n=0:1800
    push!(spl_plot, spl(n))
end
p = plot(spl_plot, xlabel="x position [m]", ylabel="y position [m]", xlims=(-200,2000), ylims=(-200,1500))
scatter!(p, utmx, utmy)

LoadError: LoadError: ArgumentError: Dierckx not found in path
while loading In[15], in expression starting on line 3