# Problem Statement

BMTC's route data-set provides information of various bus routes, bus stops covered, and the duration of the trip from origin to destination.  How good are these estimates of the trip duration?  Can we suggest improvements to the schedule based on the duration of trips in different hours of the day?

# Motivation

Have you ever tried to use Google to get bus timings and estimated travel duration using public transport in Bangalore?  In many cases, it would give you clearly incorrect results.  

For instance, the images below show the travel times from Jayanagar to Majestic

Google estimates a (car) ride from 4th Block Jayanagar to Majestic to take 33 minutes. Also, notice the traffic and the route taken.


<img src="img/car.png" style="height: 400px" title="car" height=400 />

The same route, Google claims, can be covered in 35 minutes in a bus, which includes a 9 minute walk. Also, notice that the traffic doesn't have any reds and oranges!

<img src="img/bus.png" style="height: 400px" title="bus" height=400 />

Google seems to be just using the planned duration of trips provided by BMTC to estimate the duration of a trip. This is **evidently wrong**, and discourages me from taking a bus when I have to get somewhere on time. Can we help improve this situation? 

Also, it is common knowledge that trips between the same origins and destinations take different times based on the time of the day. BMTC's single estimate of a trip duration isn't helping people plan their trips effectively!

# Approach

Uber's [movement dataset](https://movement.uber.com/) for Bangalore provides various statistics about travel time duration between pairs of [wards in Bangalore](https://en.wikipedia.org/wiki/List_of_wards_in_Bangalore), by the hour of day.

BMTC's bus route data provides the list of bus-stops on each route, along with latitude/longitude information for those bus stops. 

We attempt to use these two datasets to get a sense of how accurate BMTC's estimates of trip duration are, and to help them improve these esitmates if possible. 

## Details

To be able to estimate roughly each route's trip duration, we do the following. 

1. Pick a route, and get a list of bus stops on that route. 
1. For each bus stop, figure out the ward name/number in which it lies. Uber's data gives us mean travel times between wards. 
1. Simplify the route to be a hop between wards, instead of hops between bus stops. We group all bus stops within a ward as one hop. 
1. Use Uber's data to get the mean travel time between consecutive wards on the route and add it up to get a *very very* rough estimate of the duration of the trip. 
1. Repeat this for all the routes and get a sense of how accurate/inaccurate the BMTC trip durations are.

## Code

In [1]:
import pandas as pd
from utils import get_bmtc_routes, route_to_wards, estimate_travel_time

data = get_bmtc_routes()

BMTC's data provides bus stops on the route as `json` list, the time for the route, and the route number. Below is a sample of what the data looks like. 

In [2]:
data[:3][['route_no', 'time', 'map_json_content']]

Unnamed: 0,route_no,time,map_json_content
0,1,01:25 Min.,"[{""busstop"": ""Jayanagara 9th Block,JAYANAGARA ..."
1,1E,01:45 Min.,"[{""busstop"": ""JPNagara 6th Phase,JP NAGARA 6TH..."
2,1F,00:50 Min.,"[{""busstop"": ""BTM Layout,BTM Layout 2nd Stage,..."


We use the data downloaded from Uber using `scripts/download-uber-data.py`. The data contains mean travel time in seconds for different days of the week, and a mean for travel times in different durations of the day. 

In [3]:
print('{:>10} -- {:^10} || {}'.format('Route no.', 'BMTC', 'Uber estimate'))
for index in range(15):
    r = data.loc[index]
    (hours, minutes), missing_data = estimate_travel_time(r)
    end = '*\n' if missing_data else '\n'
    print('{:>10} -- {} || {:02}:{:02} Min.'.format(r.route_no.strip(), r.time.strip(), hours, minutes), end=end)
print('* implies data for travel time for some ward pairs is missing')

 Route no. --    BMTC    || Uber estimate
         1 -- 01:25 Min. || 01:01 Min.
        1E -- 01:45 Min. || 01:17 Min.
        1F -- 00:50 Min. || 00:38 Min.
     CCC-1 -- 00:55 Min. || 00:19 Min.
     FDR-1 -- 01:20 Min. || 00:45 Min.
       G-1 -- 01:35 Min. || 01:00 Min.
       K-1 -- 02:10 Min. || 01:24 Min.
     KHC-1 -- 00:50 Min. || 00:39 Min.
      MF-1 -- 00:55 Min. || 00:15 Min.
    NLMF-1 -- 00:35 Min. || 00:20 Min.
     WFS-1 -- 00:55 Min. || 00:59 Min.
         2 -- 00:45 Min. || 00:44 Min.
        2A -- 00:45 Min. || 00:39 Min.
        2B -- 00:50 Min. || 00:54 Min.
        2D -- 00:40 Min. || 00:49 Min.
* implies data for travel time for some ward pairs is missing


For the first few routes in the BMTC dataset, we compare the BMTC provided durations against our estimates using the Uber data.  Wow! Some of our estimates are 3-4 times the estimates given by BMTC. What's going on here?

Let's dig a little deeper, and see why our estimate for the first route looks the way it does!

In [4]:
route = data.loc[0]

In [5]:
route_1_wards = route_to_wards(route)
route_1_wards

[(168, 'Pattabhiram Nagar'),
 (169, 'Byrasandra'),
 (167, 'Yediyur'),
 (154, 'Basavanagudi'),
 (142, 'Sunkenahalli'),
 (140, 'Chamrajapet'),
 (139, 'K R Market'),
 (138, 'Chalavadipalya'),
 (95, 'Subhash Nagar'),
 (94, 'Gandhinagar'),
 (65, 'Kadu Malleshwar Ward'),
 (35, 'Aramane Nagara'),
 (45, 'Malleswaram'),
 (44, 'Marappana Palya'),
 (38, 'HMT Ward')]

We fetch the wards for each bus stop and get a unique list of ward-hops on the bus route.  They seem reasonable, on manual verification. 