# How are Taxis enhancing the Singapore's Public Transport System?
<img src="images/overhead.jpg" width='300pix'>
<center>Image Credits: Ministry of Transport, Singapore</center>

## Introduction

Singapore is a city nation in the South East Asia. It has an extensive public transport network that comprises of a rail network and bus network. In addition, there are street hail taxis and ride sharing services such as Grab and Uber operating the public/private space. Singapore transport system is technology advanced and data driven. The Land Transport Authority (LTA) of Singapore is the regulatory authority for transportation. The LTA publishes a wide variety of transport-related datasets (static and dynamic / realtime) on [DataMall](https://www.mytransport.sg/content/mytransport/home/dataMall.html) for enterprises, third-party developers, and other members of the public to promote citizen co-creation of innovative and inclusive transport solutions.  This is rich data source which makes Singapore suitable for a data science study.

Bus and rail system have fixed routes and schedules thus they have deterministic coverage patterns which meet a fix transportation demand. On the other hand, taxicabs and ride sharing services do not have fixed routes or schedules thus in some sense, they meet the "ad hoc" demands. 

**We are interested to find out if the supply pattern of taxicabs fill the gaps in the public transport network. **

## Existing Work

While public transport modelling for Singapore has been done before, to the best of our knowledge, there have not been an analysis of taxis and public transport together.

Below are some previous studies done on the public transport network of Singapore.

1. Singapore in Motion: Insights on Public Transport Service Level Through Farecard and Mobile Data Analytics, IBM 2016
http://www.kdd.org/kdd2016/papers/files/SingaporeInMotion_v3.pdf
2. Time-Series Data Mining in Transportation: A Case Study on Singapore Public Train Commuter Travel Patterns, SMU 2014 


Two online projects made use of taxi data from the LTA Datamall for visualization
      
1. [Taxi Router SG](https://github.com/cheeaun/taxirouter-sg) by [Lim Chee Aun](https://twitter.com/cheeaun). 
    * Taxi stands in Singapore.
    * All available taxis in the whole Singapore.
    * How many available taxis given a location query
    * How far is the nearest taxi stand given a location query
2. [TaxiSg](http://uzyn.github.io/taxisg/) by [U-Zyn Chua](https://twitter.com/uzyn). 
     * This app helps the commuter to understand the distribution of taxis during a historic window (ranging from 15 minutes to 2 weeks of historic data).
   

## Problem Statement

We are interested to find out if the supply of taxis complement or enhance the bus nework. To do that, we study the following questions:
   
   1. Is there a difference between the density distribution of taxis (AdHoc Requests made by commuters) and Bus/Rail network (Planned Transportation network) over the period of time?
   
   2. Are the taxis really trying to fill the gaps of the public transport system?
   
   3. Is there any change in the distribution of taxis vs Public transport system between weekdays and weekends?
  

## Data Sets

2 real time data sets were collected using the DataMall API between *03/10/2017 to 03/19/2017*.

1. Taxi Availability 
2. Bus Arrival

In addition static dataset were collected to help understand the transport network and land use.

1. Urban Renewal Authority land use zone codes were used to determine the usage of land, whether is it for commercial or residential etc.
   <img src="./images/SG-Land-Use-Map-2030.png" width="700pix">
2. Bus terminus and interchanges locations in Singapore
   <img src="./images/bus_interchange.png" width="800pix">

### Taxi Availability Dataset

### Description
Returns location coordinates of all Taxis that are currently available for hire. Does not include "Hired" or "Busy" Taxis. We polled the API every *1min* for this dataset. A total of **40982444** location records were collected.
                                                                                                             
| **Attributes** 	| **Description**                                                    	|
|----------------	|--------------------------------------------------------------------	|
| Latitude       	| provides the latitude of the location where the taxi is available  	| 
| Longitude      	| provides the longitude of the location where the taxi is available 	| 
| Date           	| provides the date when the taxi was available                      	| 
| Time           	| provides when the time was available                               	| 

### Bus Arrival Dataset

### Description
Returns real-time Bus Arrival information for Bus Services at a queried Bus Stop, including: Estimated Time of Arrival (ETA), Estimated Location, Load info. We polled the API all over the bus stops in Singapore every *6min* for this dataset. A total of **6394212** bus stop arrival records were collected. From the bus stops record, we can generate a snapshot of buses that are in service. Do note that we do not capture buses which are not tagged to a bus stops. This could mean that the API excludes special express service such as the ["Premium Bus Routes"](http://landtransportguru.net/bus/bus-services/premium/) or the night rider services.

                                                                                                                 
| **Attributes**    | **Description**                                                       |
|----------------   |--------------------------------------------------------------------   |
| ServiceNo         | Bus service number   | 
| Status         | Bus Status    | 
| Latitude | Estimated location coordinates of bus |
| Longtitude | Estimated location coordinates of bus |
|Load|  Bus occupancy / crowding: Seats Available, Standing Available, Limited Standing|


### Data Limitations

We have collected data from 03/10 to 03/20.However,due the technical issues such as the maintenance of the API system, we have continuous data only from *03/14-03/19* for both buses and taxis. This period would be used for comparison studies. 

The taxi API does not provide the identity of the individual taxi nor does it show which taxis are currently hired. Similiarly, the bus API does not show when a bus is entirely loaded. 

**Thus we cannot accurately track when the demand of transportation exceeds the supply.**


### Challenges

About *20Gb* of data was collected during that period. The data was processed in Python with the main libraries, Pandas, GeoPandas, Scipy and Shapely. As there is a large amount of data to process, we need to ensure that the runtime of the processing is sub qudratic $<O(n^2)$ with respect to the data. We found 2 factors to be critical in ensuring that the workflows run within reasonable time (< 1hour).

1. Multiprocessing. We used the Multiprocess library and design our workflow using the [map-reduce](https://en.wikipedia.org/wiki/MapReduce) model.

2. Fast spatial computation. The spatial data structure, KDtrees, was used to enable faster query of points. This is more efficient than doing naive nearest neighbor comparison which is quadratic. The underlying KDTree data structure was provided by the packages *rtree* and scipy's implementation. More details on KDtree can be found at [KDTree Algorithm](https://en.wikipedia.org/wiki/K-d_tree) and an intutive video explanation of [how KDTree works](https://www.youtube.com/watch?v=Z4dNLvno-EY).

We found that the data is best processed on a desktop. Laptop does not have sufficient memory to do most of the processing. The data was processed on a machine with the following specifications.

1. 10-core Intel i7
2. 32Gb RAM
3. Ubuntu 16.04 64bit
4. Python 3

## Modelling Transportation Flow

We assume that the transport authority and bus operators have model supply of buses such that it matches the demand. This is a reasonable assumption as LTA and the [Public Transport Council of Singapore](https://www.ptc.gov.sg/) does review of the public transport network regularly. The authority does not dictate the taxi supply pattern but we assume that the taxis driver are savvy such that they the would go to locations with demand. 

**Hence the supply pattern of both the buses and taxis are good estimates of the demand.**


### Coordinate Reference System Reprojection

LTA Datamall reports the coordinate in WGS84 which is Latitude and Longitude. To simplify calculations when handling geo spatial data, we reprojected all the data collected to a Universal Traverse Mercator coordinates [UTM](https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system). This is a cartesian coordinate system and 1 map unit represents 1 meters on the ground. To UTM project zone for Singapore is 48N and it has EPSG code of 32648.

### Kernal Density Estimation
       
Kernel Density Estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable. Using KDE, we can model the underlying distribution of buses and taxi. The distribution can then be used for statistical analysis and prediction tasks. Here is a good introduction to [Kernel Density Esitmation](https://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/).

Each bus/taxi location is treated as a discrete sample of the underlying continuous distribution. As the taxi locations are recorded every 1-2 minute and the buses are recorded at every 6 minute, we perform the estimate on every 15mins worth of data for both giving use about 3 bus snapshots and 7 taxi snapshots.
   
There are some factors to consider when performing the estimation:

   1. What kernel function to use.
   2. What bandwidth or window size to use?
   
The below infographic will help to understand the Factors to consider while designing KDE   
  <a href="http://research.cs.tamu.edu/prism/lectures/pr/pr_l7.pdf"> <img src="images/KernelDensityInfographic.png"  width="700px"></a>
  
While there are many different kernels, we decided to use the Epanechnikov kernel which provides the most unbiased estimation of KDE.

  <img src="images/600px-Kernels.svg.png" width=300px>
  
The next parameter to decide is the bandwidth of the kernel. The bandwidth of kernel is directly proportional to the area of influence for a sample. Larger bandwidth kernel gives smoother estimates while smaller bandwidth results in multiple distribution peaks.

The true bandwidth will reduce the error between estimated probability density and the true density. However, we don't know the true density of the bus and taxi data. We decided to go with a kernel bandwidth of 500 meters radius and 1 kilometer radius since the speed limit for most roads in Singapore is 60km/h and within a snapshot of 1min, it is reasonable to think that the vehicle would travel for about 1 kilometer.
  

### Process for calculating the KDE
 
 1. For a stream of datapoints with timestamp and coordinates (In both the cases of bus and taxi), group them into a timeframe of 15 minutes. 
 
 2. For each group of data points in the time window, fit a KDE.
 
 3. As the KDE is a continuous distribution, after fitting a model, discrete locations are sampled uniformly at intervals of 100m from KDE to generate the KDE plots. 

<table width="100%">
<tr>
<td> <center> Sampled locations </center> </td>
<td> <center> Generated KDE </center> </td>
</tr>
<tr> <td> <img src="images/kde_sample.png" width=400px> </td> 
<td> <img src="images/bus_14_2015.jpg" width=400px> </td>
</tr>
</table>

### Videos of KDE  

You can find the videos for [KDE for Bus with Bandwidth of 1 KM](https://www.youtube.com/watch?v=d6Tnzi37iLw) and
[KDE for Taxis with Bandwidth of 1 KM](https://youtu.be/C0NoSleuWt4)

In [1]:
%%HTML
<table width="100%">
<tr>
   <th>Taxi KDE for five days with 500 meters bandwidth</th>
   <th>Bus KDE for five days with 500 meters bandwidth</th>
</tr>
<tr>
<td align="left" valign="top" width="50%">
<video width="475" height="300"  controls>
  <source src="./videos/taxi_five_days_500.mp4" type="video/mp4">
</video>
</td>
<td align="left" valign="top" width="50%">
<video controls width="475" height="300" >
  <source src="./videos/bus_five_days_500.mp4" type="video/mp4">
</video>
</td>
</tr>


</table>

Taxi KDE for five days with 500 meters bandwidth,Bus KDE for five days with 500 meters bandwidth
,


## Outliers

### Outlier 1: Changi Airport

One major outlier for taxis is Singapore's International Airport, **Changi Airport**. The Changi Airport is one of the biggest area of concentration for taxis as there is always demand. At the same time, there are not many buses which are going to this destination. Thus it is always high for taxis and low for buses. In our initial data exploration, this location heavily skew our analysis as it dominated all the other locations in Singapore. Hence we decided to remove the taxis location from the airport in our analysis. Once we remove this outlier, we can see more hotspots for taxis.

<table width="100%">
<tr>
<td> <center> Taxi KDE  </center> </td>
<td> <center> Without Airport </center> </td>
</tr>
<tr> <td> <img src="images/taxi_heat.png" width=400px> </td> 
<td> <img src="images/taxi_na_heat.png" width=400px> </td>
</tr>
</table>

### Outlier 2: Sin Ming Area

We notice that there was another location where many taxis would appear at certain periods in the day. This was surprising as the location, **Sin Ming Road**, is an industrial area. Upon further investigation, we realized that the location was a taxi maintenance hub that serviced all the major taxi providers. Taxis would appear there for get repairs and fuel top ups. Coincidentally, the Land Transport Authority have their taxi service office located there as well.

<table width="100%">
<tr>
<td> <center> Taxi KDE outlier #2  </center> </td>
<td> <center> Sin Ming area from Google Maps </center> </td>
</tr>
<tr> <td> <img src="images/taxi_out2.jpg" width=400px> </td> 
<td> <img src="images/sinming.png" width=400px> </td>
</tr>
</table>



## Analysis of Weekday Taxi and Bus Distribution

We chose *14-17 March* for the analysis of the taxi and bus distribution. We excluded 18-19 as they are on a weekend. A comparison would be done for weekend vs weekday in the later section. 


First we plot the average number of buses and taxi for every 15 minutes time period throughout the day

<img src="./images/tcount.png" width="800pix">

Immediately, we notice that there is a huge difference between 12am to 4am. This is because the buses in Singapore do not operate during that period causing the huge difference in the numbers. We can observe some patterns in the number of taxi vs buses. However, these numbers does not tell us the distribution across Singapore. For that we used the [KL Divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) distance metric to see the distance between bus and taxi spatial-temporal distributions. Similar to generating the KDE plots, a uniform grid of locations is sampled from both bus and taxi estimates at 1km interval to generate a probability vector at every 15min interval. The KL divergence is then computed for taxi to the bus distribution. 

<img src="./images/KL1.png" width="1000pix">

We used an average smoothing window of size 4, (4x 15mins = 1hr) as the original KL divergence had some noise. This is due to the bus and taxi api reporting the data at different rates resulting in noise. For this plot, the larger the value of the KL divergence, the greater the dissimilarity between the 2 distributions. The target distribution is the bus distribution, so the KL divergence reports the distance of the taxi distribution to the bus.

Here is a plot of the KL divergence for weekdays without the period of 0000 to 0400.
<img src="./images/KL2.png" width="800pix">

We can roughly see that there are roughly 6 distinct periods.

1. 12am to 7am
2. 7am to 11am
3. 11am to 3pm
4. 3pm to 7pm
5. 7pm to 10pm
6. 10pm to 12am

### Analysis of the periods

### 12:00 to 07:00 - Wee Hours

In [2]:
from IPython.display import HTML

HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Show/Hide Code."></form>''')

In [2]:
%%HTML
<table width="100%">
<tr><th>Taxi Traffic during wee hours</th><th>Bus Traffic during wee hours</th></tr>
<tr>
<td valign="top" width="50%">
<video width="475" height="300"  controls>
  <source src="./videos/taxi_wee_hours.mp4" type="video/mp4">
</video>
</td>
<td valign="top" width="50%">
<video controls width="475" height="300" >
  <source src="./videos/bus_wee_hours.mp4" type="video/mp4">
</video>
</td>
</tr>
</table>

Taxi Traffic during wee hours,Bus Traffic during wee hours
,


** Bus**

Public transport network in Singapore does not function 24/7, hence during 1am-5am everyday there are no buses. However there are bus location captured in those wee hour period as indicated by the bright spots in the bus video. But we don't have enough evidence to conclude anything about the bright dots in the video but we speculate they are buses whose location transponders are not turned off when they are parked for the night.

**Taxi**

This is period where taxi becomes the proxy to the demand for public transport network. The bright spots in the taxi video are the areas where taxi becomes available for a ride. We can see a pattern near city areas where taxi becomes available often during this period. If we include Airports we can see the same pattern there as well. As there are meagre public transport facilities during the night, people generally commute via taxis and ubers. There is a nightly surcharge, which regulated.

** Bus vs Taxi **
<img src="./images/KL0.png" width="400pix">

There is not much comparison during this period as both distributions are very different

### Known Taxi Cycle in Singapore

Before we start the deeper analysis, it is important to know the taxi supply cycle. In Singapore, the fares for taxis are regulated and there is a fixed pricing structure that all drivers have to adhere to. There are additional surcharges through the day and it is commonly observed and validated that the taxi drivers have adapted their rountines to maximize the surcharges, i.e. taxi supply increases when there are additional surcharges that the customer has to pay. The pricing structure can be found [here](http://www.taxisingapore.com/taxi-fare/).  

We found a few sources online describing the taxi drivers daily trend. Here is a table on the taxi-ridership info on different time period. [Source-Quora](https://www.quora.com/Why-do-taxis-in-Singapore-change-shift-at-the-same-time).

| Time Period        | Surcharge                    | Potential Rider's Requests                                                                 | Trip Type           |
|--------------------|------------------------------|--------------------------------------------------------------------------------------------|---------------------|
| 5 AM - 6 AM        | YES - YES MIDNIGHT SURCHARGE | Shift/service workers from suburbs to  different areas of the city                         | Long Trip           |
| 6 AM- 7:30 AM      | NO                           | School going / morning business operators                                                   | Short Trip          |
| 7:30 AM - 9:30 AM  | YES - MORNING PEAK SURCHARGE | Office going / tourist                                                                      | Depends on the Need |
| 9:30 AM - 11:00 AM | NO                           | Miscellaneous                                                                              | Depends on the Need |
| 11 AM - 3 PM       | NO                           | Miscellaneous/ lunch goers                                                                 | Short/Medium        |
| 3 PM - 5 PM        | NO                           | Shopping malls and tourist places                                                          | Depends on the Need |
| 5 PM - 6 PM        | NO                           | Early knockoff workers <Mostly from city  to suburbs>                                      | Long Trip           |
| 6 PM - 12 AM       | YES - EVENING PEAK SURCHARGE | Riders mostly from city travelling to suburbs                                              | Long Trip           |
| 12 AM - 5 AM       | YES - MIDNIGHT SURCHARGE    | Adhoc requests around the city mostly riders  originating from city center/ Tourist places | Depends on the need |

A single taxi vehicle might have multiple drivers. Once or twice a day, the drivers would swap in a process that is commonly called *shift-changes*. During shift changes, the supply of taxi drops drastically and this is discussed [here](https://www.reddit.com/r/singapore/comments/4kga3c/beating_change_shift_taxis_drivers_at_game/#bottom-comments). We will talk more about shift change later in a different section.

### 7:00 to 11:00 - The Office Hours

**Bus**

Buses follow a schedule. We don't see much of change in the bus allocation (even the weekdays and weekend patterns looks similar). Although we have found bus data at 10 AM to be inconsistent with previous window (09:45 AM) and the next window (10:15 AM) during dates (15,16,17,18 and 19). This anormaly is probably due the bus api as we consistently see less buses being report at 10AM on all days.


**Taxis**

<table width="100%">
<tr>
<td align="left" valign="top" width="30%">
<img src="./images/demo/taxi_15_0700.jpg">
</td>
<td align="left" valign="top" width="30%">
<img src="./images/demo/taxi_15_0715.jpg">
</td>
<td align="left" valign="top" width="30%">
<img src="./images/demo/taxi_15_0730.jpg">
</td>
</tr>
<tr>
<td align="left" valign="top" width="30%">
<img src="./images/demo/taxi_15_0745.jpg">
</td>
<td align="left" valign="top" width="30%">
<img src="./images/demo/taxi_15_0800.jpg">
</td>
<td align="left" valign="top" width="30%">
<img src="./images/demo/taxi_15_0815.jpg">
</td>
</tr>
<tr>
<td align="left" valign="top" width="30%">
<img src="./images/demo/taxi_15_0830.jpg">
</td>
<td align="left" valign="top" width="30%">
<img src="./images/demo/taxi_15_0845.jpg">
</td>
<td align="left" valign="top" width="30%">
<img src="./images/demo/taxi_15_0900.jpg">
</td>
</tr>
</table>


As we have already mentioned, taxis show up only when they are available. These 9 frames provide the taxi availablity during each time window respectively. The snapshot is taken on a weekday. Taxis seem to be available in the outskirts of the city initially. But the availablity progress towards the city centre as time pass by. At 9:00 AM we see lot of taxis available in the CBD but not in the outskirts. We infer that people utilise the taxis to travel from residential areas to commercial areas. They are generally office going folks. 
 
During the weekend (18,19) the distribution of taxis seems to be dispersed among different neighborhoods in Singapore during this period, further supporting this hypothesis. 

Also we see Sin Min outlier in effect as that when most of the drivers go to for fuel before starting their day.


**Bus vs Taxi**

<img src="./images/KL3.png" width="400pix">

What we see here is the pattern where the taxi Distribution is following the bus Distribution. As the time progresses from 7 AM to 11 AM, The distance and hence dissimilarities between two distribution reduces. We believe that at this time, there is lot of inbound traffic movement of both buses and taxis towards the city which means there is a demand for transportation coming into the city.

In [3]:
%%HTML
<table width="100%">
<tr>
   <th>Taxi Distribution between 11 AM - 3 PM</th>
   <th>Bus Distribution between 11 AM - 3 PM</th>
</tr>
<tr>
<td align="left" valign="top" width="50%">
<video width="475" height="300"  controls>
  <source src="./videos/taxi_mid_hours.mp4" type="video/mp4">
</video>
</td>
<td align="left" valign="top" width="50%">
<video controls width="475" height="300" >
  <source src="./videos/bus_mid_hours.mp4" type="video/mp4">
</video>
</td>
</tr>

</table>

Taxi Distribution between 11 AM - 3 PM,Bus Distribution between 11 AM - 3 PM
,


### 11:00 to 15:00- AM Off Peak

<img src="./images/KL4.png" width="400pix">

The distance between both distributions drop. We speculate that is because it is the off peak period hence there are no specific concentrations of taxi demand. The taxis would start to ply the routes which are similar to the buses since buses meet the baseline demand. However we would need more data to support this concretely.

In [4]:
%%HTML
<table width="100%">
<tr>
   <th>Taxi Distribution between 3 PM - 4 PM</th>
   <th>Bus Distribution between 3 PM - 4 PM</th>
</tr>
<tr>
<td align="left" valign="top" width="50%">
<video width="475" height="300"  controls>
  <source src="./videos/taxi_dull_hours.mp4" type="video/mp4">
</video>
</td>
<td align="left" valign="top" width="50%">
<video controls width="475" height="300" >
  <source src="./videos/bus_dull_hours.mp4" type="video/mp4">
</video>
</td>
</tr>

</table>

Taxi Distribution between 3 PM - 4 PM,Bus Distribution between 3 PM - 4 PM
,


### 15:00 to 19:00- Start of PM rush hour

<img src="./images/KL5.png" width="400pix">

** Bus vs Taxi **

15:00 to 16:00 is the time of slow to no business for taxis. Generally taxi drivers use this timeslot for personal breaks, meal from whichever place they are in. They may also get back to work at this point of time or change shifts. That is why we see available taxis dispersed all around the city with one bright spot in the North Singapore Area,Yishun. Hence we see the KL divergence distance to be the lowest between two distributions.

From 17:00 onwards the PM rush starts to kick in and the distributions start to converge as the taxis would appear at different places to pick up the early knock off workers and customers from the malls as suggested in the taxi cycle tables.

In [5]:
%%HTML
<table width="100%">
<tr>
   <th>Taxi Distribution between 7 PM - 10 PM</th>
   <th>Bus Distribution between 7 PM - 10 PM</th>
</tr>
<tr>
<td align="left" valign="top" width="50%">
<video width="475" height="300"  controls>
  <source src="./videos/taxi_eve_hours.mp4" type="video/mp4">
</video>
</td>
<td align="left" valign="top" width="50%">
<video controls width="475" height="300" >
  <source src="./videos/bus_eve_hours.mp4" type="video/mp4">
</video>
</td>
</tr>

</table>

Taxi Distribution between 7 PM - 10 PM,Bus Distribution between 7 PM - 10 PM
,


### 19:00-22:00 

<img src="./images/KL6.png" width="400pix">

This is the time where taxis serve as proxies to public transport due to the demand from the city office workers. Between 7 PM and 9PM we see lot of taxi availablity within the city limits and after 9 PM we see a dispersed taxi availablity distribution across the city. Hence we see an increase in the divergence initially from 7 PM to 9 PM but it eventually gets reduced after the 9 PM. During this time window (9 PM - 10 PM) taxi distribution can be approximated by the bus distribution which is also spread out all around the suburbs as well. 

In [6]:
%%HTML
<table width="100%">
<tr>
   <th>Taxi Distribution between 10 PM - 12 AM</th>
   <th>Bus Distribution between 10 PM - 12 AM</th>
</tr>
<tr>
<td align="left" valign="top" width="50%">
<video width="475" height="300"  controls>
  <source src="./videos/taxi_night_hours.mp4" type="video/mp4">
</video>
</td>
<td align="left" valign="top" width="50%">
<video controls width="475" height="300" >
  <source src="./videos/bus_night_hours.mp4" type="video/mp4">
</video>
</td>
</tr>

</table>

Taxi Distribution between 10 PM - 12 AM,Bus Distribution between 10 PM - 12 AM
,


### 22:00-00:00 Night Time
<img src="./images/KL7.png" width="400pix">

Continuing the trend in the previous time window, taxis are wide spread across the suburbs initially at 10 PM but the availability gets converged towards the center of the city by the midnight.

Why do taxis move towards the city by mid night? Most of the suburbs destinations the taxi goes before its return trip to cities are residential areas. They don't have much potential for business there. So we see an inbound traffic towards the city. Hence the aired taxi either completes the ride to suburb and just end the day's business or continues to the city proper in search of business.  

Between this time period we see the KL divergence to reduce as the taxis and buses seem to be dispersed in all the parts of the city. As the time progresses after 12 AM we see a steep increase in the KL distance between bus and taxi distribution because the bus services cedes by that time.

*The daily cycle then repeats.*


### Weekday vs Weekend

** Taxi **
<img src="./images/KL20.png" width="800pix" >
The taxi distributions differs significantly in the AM peak hour period. This further supports the hypothesis that the taxis are meeting the demand of city going office workers. Surprisingly there is not a similar difference in the PM peak hour.

** Bus **
<img src="./images/KL21.png" width="800pix" >
There are no periods where the weekend buses differ much from the weekday buses. The inital difference is due to the buses starting later on weekends.

## Further Studies

While we were wondering about why there is no nightly public transport facilities when there is an excellent public transport connectivity in the day time, we ran into this [Reddit](https://www.reddit.com/r/singapore/comments/2jz2f5/why_is_singapores_night_public_transportation_so/) post.


## Conclusion. 

1. Taxis are definitely filling the gaps that are left by public transport(bus) service. We did identify the pattern in the midnight period (12 AM - 7 AM), Morning (7 AM - 11 AM) and the evenings (7 PM- 12 AM).

2. There are not many public transport services that are originating from the city to the suburbs in the evening. 

3. In KL divergence between weekday vs weekend taxi distribution, we conclude that office goers take taxis in the morning and utilize the public transport services in the night.



# Next Steps

1. Predicting the taxi demand from the existing distribution buses and taxi.
2. A Recommendation to Singapore Transport Authority to introduce more public transport services during the night.
3. Suggestions to introduce dynamic pricing in taxis to compete with ride sharing services as well as adjust taxi supply to better meet demand, for example during the "shift-change" period.
