# Drive Share Example Analysis

This is an outline for a combined documentation / example anaylsis of the driveshare.me project by Eric Hong.

## Downloading the Data

Anonymized path data can be obtained as a json file and high level data as a csv file from http://driveshare.me. 
Click `anonymized paths` or `high level data` in the bottom right.

Examples of how to use the data are shown below.

## Loading Data Directly into Julia

In [16]:
using DriveShare
anon_paths = get_dataset(:trace) # Anonymized paths now loaded.
typeof(anon_paths)

LoadError: LoadError: Expected end of input
Line: 0
Around: ...onth":8}]{"distance_m":20395.4...
                    ^

while loading In[16], in expression starting on line 2

Total distance calculated is less than what is shown on driveshare.me
because `get_dataset(:trace)` only returns trip objects that contain paths.
As explained in the [api-reference](https://developer.automatic.com/api-reference/): 
"In cases where no GPS signal was available during the trip, path may be returned as null."

![Ma Image](json.png)

Here we extract the total duration and distance of all trips.

In [17]:
total_duration_s = 0
total_distance_m = 0
for path_data in anon_paths
    total_duration_s += path_data["duration_s"]
    total_distance_m += path_data["distance_m"]
end
@printf("total duration (seconds): %15.3f\n", total_duration_s)
@printf("total distance (meters)): %15.3f\n", total_distance_m)

total duration (seconds):     1658100.900
total distance (meters)):    21436943.600


The High Level dataset is loaded the same way, but returns a DataFrame.

In [18]:
high_level = get_dataset(:highlevel) # DataFrame
names(high_level)

39-element Array{Symbol,1}:
 :trip_ID              
 :vehicle_ID           
 :start_time           
 :end_time             
 :distance_m           
 :duration_s           
 :fuel_cost            
 :fuel_volume          
 :average_kmpl         
 :average_from_epa_kmpl
 :score_events         
 :score_speeding       
 :hard_brakes          
 ⋮                     
 :battery_voltage      
 :active_dtcs          
 :start_city           
 :start_state          
 :start_country        
 :end_city             
 :end_state            
 :end_country          
 :start_lat            
 :start_long           
 :end_lat              
 :end_long             

In [19]:
using CrossfilterCharts
using DataFrames
df = DataFrame()
for sym in [:make, :model, :distance_m, :average_kmpl, :score_speeding, 
            :hard_brakes, :city_fraction, :highway_fraction]
    df[sym] = high_level[sym]
end

dc(df)

## Basic Data Analysis

In [20]:
@printf("%-50s %10d\n", "number of trips:", size(high_level, 1))
@printf("%-50s %10d\n", "number of unique vehicles:", length(unique(high_level[:vehicle_ID])))
@printf("%-50s %14.3f\n", "total duration [sec]", sum(high_level[:duration_s]))
@printf("%-50s %14.3f\n", "total distance [m]", sum(high_level[:distance_m]))
@printf("%-50s %10d\n", "number of unique vehicle models:", length(unique(high_level[:model])))

number of trips:                                         1764
number of unique vehicles:                                  5
total duration [sec]                                  1667220.600
total distance [m]                                   21522931.900
number of unique vehicle models:                            4


## Number of Trips per Vehicle

In [21]:
include("plot_helper.jl")
plot_number_of_trips_per_vehicle(high_level)

## Trip Distance per Vehicle


In [22]:
plot_histogram_per_vehcile(high_level, :distance_m)

## Fuel Cost Histogram per Vehicle 

In [23]:
plot_histogram_per_vehcile(high_level, :fuel_cost)

## Hard Accelerations per Vehicle

In [24]:
plot_histogram_per_vehcile(high_level, :hard_accels)

## Hard Brakes per Vehicle

In [25]:
plot_histogram_per_vehcile(high_level, :hard_brakes)

## Speeding Score per Vehicle

"driving score for speeding"

In [26]:
plot_histogram_per_vehcile(high_level, :score_speeding)

## Vehicle Comparison

In [28]:
targets = [:distance_m, :duration_s, :average_kmpl, :score_speeding,  :hard_brakes, :city_fraction, :highway_fraction,
    :duration_over_70_s, :duration_over_75_s, :duration_over_80_s, :idling_time_s]
targetA = targets[5]
targetB = targets[8]
scatter_targets(high_level, targetA, targetB)

## Path Data

Here we create a histogram of separation distances between GPS points over all trips.

In [29]:
separation_distances = []
for path_data in anon_paths
    for i = 1:length(path_data["utm"])-1
        d = norm(path_data["utm"][i] - path_data["utm"][i+1])
        push!(separation_distances, d)
    end
end
histogram(filter(x->x < 150, separation_distances), nbins=70, xlabel="separation distance [m]", ylabel="count", size=(900,400), leg=false)

Here we create a histogram of (separation distance) / (mean trip velocity), an approximation of the time passed between each point, between GPS points over all trips.

In [30]:
counter = 0
separation_times = []
for path_data in anon_paths
    mean_trip_velocity = path_data["distance_m"] / path_data["duration_s"] # meters/second
    for i = 1:length(path_data["utm"])-1
        counter += 1
        t = separation_distances[counter] / mean_trip_velocity # seconds
        push!(separation_times, t)
    end
end
histogram(filter(x->x < 15, separation_times), nbins=70, xlabel="approx. time between points [s]", ylabel="count", size=(900,400), leg=false)

In [31]:
plot_drive(3)