In [6]:
from IPython.display import HTML

HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Show/Hide Code."></form>''')

# How are Taxis enhancing the Singapore's Public Transport System.

## Introduction
Bus and rail system have fixed routes and schedules. Therefore they have deterministic coverage patterns which meet the general transportation demand. Taxicabs and ride sharing services do not have fixed routes or schedules thus in some sense, they meet the "ad hoc" demands. The goal of the project is to see if the supply pattern of taxicabs fill the gaps in the public transport network thereby enhancing it. 
The study would be done for Singapore, a city state which provides plentiful data for the public transport network. While public transport modelling has been done before, to the best of our knowledge, there have not been an analysis of taxis and public transport together.

## Existing Work

1. Singapore in Motion: Insights on Public Transport Service Level Through Farecard and Mobile Data Analytics, IBM 2016
http://www.kdd.org/kdd2016/papers/files/SingaporeInMotion_v3.pdf
2. Time-Series Data Mining in Transportation: A Case Study on Singapore Public Train Commuter Travel Patterns, SMU 2014 

We got inspired from two other projects. The projects are developed and made open source for commuters to use. They are.
   
   
   1. [Taxi Router SG](https://github.com/cheeaun/taxirouter-sg) by [Lim Chee Aun](https://twitter.com/cheeaun). Its idea was to showcase the following details
       
       * Taxi stands in Singapore.
       * Shows all available taxis in the whole Singapore.
       * How many available taxis around the commuter?
       * How far is the nearest taxi stand around the commuter?

   2. [TaxiSg](http://uzyn.github.io/taxisg/) by [U-Zyn Chua](https://twitter.com/uzyn). This app helps the commuter to understand the distribution of taxis during a historic window (ranging from 15 minutes to 2 weeks of historic data).
   
   
   Both the apps get their data from a government organisation called, The Land Transport Authority (LTA) of Singapore. LTA publishes a wide variety of transport-related datasets (static and dynamic / realtime) on their DataMall platform for enterprises, third-party developers, and other members of the public to promote citizen co-creation of innovative and inclusive transport solutions. 

## Problem Statement

   We wanted to answer the following questions and provide inferences based on our results.
   
   1. Is there a difference between the density distribution of taxis (AdHoc Requests made by commuters) and Bus/Rail network (Planned Transportation network) over the period of time?
   
   2. Are the taxis really trying to fill the gaps of the public transport system? (Fully loaded buses)
   
   3. Is there any change in the distribution of taxis vs Public transport system between weekdays and weekends?
  
   

## Data Sets
2 dynamic data sets are collected were collected using the API between 03/15/2017 to 03/19/2017.

1. Taxi Availability 
2. Bus Arrival

2 static data was collected
1. Urban Land Authority Master Planning Sub Zone 2014
2. MRT train schedule (work in progress)

The URA Zone codes were use to determine the usage of land, whether is it for commercial or residential etc.
<img src="images/ura_2014.png" width='500pix'>

## Taxi Availability Dataset

### Description
Returns location coordinates of all Taxis that are currently available for hire. Does not include "Hired" or "Busy" Taxis. We polled the API every *1min* for this dataset. A total of **40982444** location records were collected.
                                                                                                             
| **Attributes** 	| **Description**                                                    	|
|----------------	|--------------------------------------------------------------------	|
| Latitude       	| provides the latitude of the location where the taxi is available  	| 
| Longitude      	| provides the longitude of the location where the taxi is available 	| 
| Date           	| provides the date when the taxi was available                      	| 
| Time           	| provides when the time was available                               	| 



## Bus Arrival Dataset

### Description
Returns real-time Bus Arrival information for Bus Services at a queried Bus Stop, including: Estimated Time of Arrival (ETA), Estimated Location, Load info. We polled the API all over the bus stops in Singapore every *6min* for this dataset. A total of **6394212** bus stop arrival records were collected.

                                                                                                                 
| **Attributes**    | **Description**                                                       |
|----------------   |--------------------------------------------------------------------   |
| ServiceNo         | Bus service number   | 
| Status         | Bus Status    | 
| Latitude | Estimated location coordinates of bus |
| Longtitude | Estimated location coordinates of bus |
|Load|  Bus occupancy / crowding: Seats Available, Standing Available, Limited Standing|


## Data Collection

    

### LTA Datamall

   The LTA publishes a wide variety of transport-related datasets (static and dynamic / realtime) on [DataMall](https://www.mytransport.sg/content/mytransport/home/dataMall.html) for enterprises, third-party developers, and other members of the public to promote citizen co-creation of innovative and inclusive transport solutions. 
 
Use of LTA’s datasets and APIs on DataMall constitutes acceptance of the [Singapore Open Data Licence](https://www.mytransport.sg/content/mytransport/home/dataMall/opendatalicence.html) and the [API Terms of Service](https://www.mytransport.sg/content/mytransport/home/dataMall/apitermsofservice.html).

### Dates of Collection
   We have collected data continously from 14th March 2017 to 19th March 2017 for both buses and taxis.

## Modelling transportation flow

### Projection from Lat Lon to UTM [hui han] 
   We wanted to change the projection from the regular (Latitude,Longitude) to Universal Transverse Mercator(UTM) for our project. You can find out the basics of  UTM [here](http://gisgeography.com/utm-universal-transverse-mercator-projection/). The primary reason for our decision is to avoid representing the Geo Locations in a distorted manner and calculate densities with bandwidth units of meters/kilometers

### Density Estimation
       
   To answer all the questions in our problem statement, we had to start with a Kernel Density estimation of Taxis and Buses for a window (15 minute window) slided over a period of five days. Kernel Desity Estimation will help us to identify the latent distribution from which the Taxi and Bus data originate. Also, Knowledge of the Distribution would help us to predict the taxis in the future based on the historic allocation of taxis and buses. We initially collected a random 15 minute sample of Bus Data and Taxi Data (for the same dates) and plotted Distribution of the Random sample. Before getting into the design decisions to formulate algorithm, we identified an outlier in the dataset.
   
   [image goes here]
   

### Outlier Removal 

  The Changi Airport is one of the biggest Taxi hub in Singapore. At the same time, there are not many buses which are going to this destination. Our reasoning is that, commuters use train to travel to the airport. So if we compare the distribution between taxis and buses and if we try to answer the questions in the problem statement, our answers in the airport location would be biased. Hence we decided to remove Taxis from the Airport area location. When we remove the Airports area from equation, we see many hotspot locations in the sampled dataset.
  
 KDE After Removal of Airport: (Buses vs Taxis)
 
<table width="100%">
<tr>
   <th>Bus Distribution</th>
   <th>Taxi Distribution after removing airport datapoints</th>
</tr>
<tr>
<td align="left" valign="top" width="50%">
<img src="./images/bus_14_2015.jpg">
</td>
<td align="left" valign="top" width="50%">
<img src="./images/taxi_14_2015.jpg" >
</td>
</tr>
</table>
 
 
   
  
  
  
  
  


### Different Kernel Density Estimation

  
  Here is a primer to [Kernel Density Esitmation](https://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/). We chose KDE because:
Its non-parametric, which means that it attempt to estimate the density directly from the data without assuming a particular form for the underlying distribution. We cannot assume real-life events to happend based on certain distribution. We do have problems with Kernel Density Estimation.

   1. What Bandwidth or window to use?
   2. What Kernel function to use to count the number of Spatial points that fall in the vicinity of the proposed bandwidth.
   
   The below infographic will help to understand the Factors to consider while designing KDE
   
  <a href="http://research.cs.tamu.edu/prism/lectures/pr/pr_l7.pdf"> <img src="images/KernelDensityInfographic.png"  width="700px"></a>
  

  
  The Infographic shows the most simplest of all the kernel functions which is there. But Parzen Kernals weight every point from the center in the same way. It yields Density Estimates that have discontinuities. So we had to choose a Kernel which smoothen the effects of the above drawbacks. Though there are many [Kernels](https://goo.gl/fCN79N), We decided to go with the Epanechnikov kernel which provides the most unbiased estimation of KDE.



  <img src="images/600px-Kernels.svg.png">
  
  
  
  Now we have decided on a Kernel, We have to decide on a free parameter, the Bandwidth. The problems with deciding on Bandwidth are that, if we increase the bandwidth (a radius for spatial area) We might oversmooth the Kernel density that we are trying to estimate (even an Epanechnikov kernel with a wrong bandwidth would become an unbiased estimator). If we use a small bandwidth, we might undersmooth, i.e, we might get muliple distribution peaks in the spatial graph. We will have to find a h that will reduce the error between estimated density and the true density. But we don't know the true density of the spatial data that we have and its altogether a different problem to solve. So we decided to go with a kernel bandwidth of 500 meters radius and 1 KM radius. We will provide both the analysis below.
  
  
### Generating large scale KDE 


#### Methodology


 We have 3 steps here. 
 
 1. For a stream of datapoints with timestamp and UTM coordinates (In both the cases of bus and taxi), group them into a timeframe of 15 minutes.
 
 2. For each data point in the grouped time window, Fit the surrounding points that fall within the given bandwidth (500 meters/1000 meters), Score these points which are within the bandwidth based on the Kernel function (Epanechnikov Kernel Function). Distance between two points is calculated by [Minkowski Distance](https://en.wikipedia.org/wiki/Minkowski_distance) which in two dimension is nothing but Euclidean Distance.We used [KDTree Algorithm](https://en.wikipedia.org/wiki/K-d_tree) to make the distance computation faster. Here is an intutive video explanation of [how KDTree works](https://www.youtube.com/watch?v=Z4dNLvno-EY).
 
 3. After deriving the kernel density model for a given datapoint with a bandwidth and kernel funcion, we sample the datapoints within the data model and calculate the total log probability under the model.
 
 
### Video Simulation of KDE 
-----------------------------

In [5]:
%%HTML
<table width="100%">
<tr>
   <th>Taxi Kde for five days with 500 meters bandwidth</th>
   <th>Bus Kde for five days with 500 meters bandwidth</th>
</tr>
<tr>
<td align="left" valign="top" width="50%">
<video width="475" height="300"  controls>
  <source src="./videos/taxi_five_days_500.mp4" type="video/mp4">
</video>
</td>
<td align="left" valign="top" width="50%">
<video controls width="475" height="300" >
  <source src="./videos/bus_five_days_500.mp4" type="video/mp4">
</video>
</td>
</tr>

</table>

Taxi Kde for five days with 500 meters bandwidth,Bus Kde for five days with 500 meters bandwidth
,


You can find the [KDE for Bus with Bandwidth of 1 KM](https://www.youtube.com/watch?v=d6Tnzi37iLw) and
[KDE for Taxis with Bandwidth of 1 KM](https://youtu.be/C0NoSleuWt4)

## Inference

### Comparison of Transport Densities

**12 AM to 6 AM - Wee Hours**

In [3]:
%%HTML
<table width="100%">
<tr>
   <th>Taxi Traffic during wee hours </th>
   <th>Bus Traffic during wee hours</th>
</tr>
<tr>
<td valign="top" width="50%">
<video width="475" height="300"  controls>
  <source src="./videos/taxi_wee_hours.mp4" type="video/mp4">
</video>
</td>
<td valign="top" width="50%">
<video controls width="475" height="300" >
  <source src="./videos/bus_wee_hours.mp4" type="video/mp4">
</video>
</td>
</tr>
</table>

Taxi Traffic during wee hours,Bus Traffic during wee hours
,


Public transport system in Singapore do not function 24/7. But they do have minimal services during the mid-night between important interchanges and terminals. An interchange is a larger junction when compared to a terminal. Here are the most important bus stations in Singapore. 

<img src="./images/Bus_terminus_interchange.jpg" height="750pix" width="750pix">


**Bus Movements**

The bright spots during the wee hours in bus traffic video are the bus movements. We do know that there are bus services between important interchanges in the city. But we don't have enough evidence to conclude that the bright dots in the video are confirming the same traffic patterns that we have as our belief.

**Taxi Movements**

 This is period where taxi becomes the proxy to the public transport network. The bright spots in the taxi video are the areas where taxi becomes available for a ride. We can see a pattern near CBD areas where taxi becomes available often during this period. If we include Airports we can see the same pattern there as well. As there are meagre public transport facilities during the night, people generally commute via taxis and ubers. There is a nightly surcharge regulated by the government. You can find the details [here](http://www.taxisingapore.com/taxi-fare/). Taxis are overpriced during the nightly hours.
 
 

Here is a table on the taxi-ridership info on different time period. [Source-Quora](https://www.quora.com/Why-do-taxis-in-Singapore-change-shift-at-the-same-time) and [Taxi Surcharge info](http://www.taxisingapore.com/taxi-fare/) 

| Time Period        | Surcharge                    | Potential Rider's Requests                                                                 | Trip Type           |
|--------------------|------------------------------|--------------------------------------------------------------------------------------------|---------------------|
| 5 AM - 6 AM        | YES - YES MIDNIGHT SURCHARGE | Shift/Service Workers from Suburbs to  different areas of the city                         | Long Trip           |
| 6 AM- 7:30 AM      | NO                           | School Going/ Morning business operators                                                   | Short Trip          |
| 7:30 AM - 9:30 AM  | YES - MORNING PEAK SURCHARGE | Office Going/ Tourist                                                                      | Depends on the Need |
| 9:30 AM - 11:00 AM | NO                           | Miscellaneous                                                                              | Depends on the Need |
| 11 AM - 3 PM       | NO                           | Miscellaneous/ Lunch goers                                                                 | Short/Medium        |
| 3 PM - 5 PM        | NO                           | Shopping Malls and Tourist Places                                                          | Depends on the Need |
| 5 PM - 6 PM        | NO                           | Early knockoff Workers <Mostly from city  to suburbs>                                      | Long Trip           |
| 6 PM - 12 AM       | YES - EVENING PEAK SURCHARGE | Riders mostly from City travelling to suburbs                                              | Long Trip           |
| 12 AM - 5 AM       | YES - MID NIGHT SURCHARGE    | Adhoc Requests around the city mostly riders  originating from city center/ Tourist places | Depends on the need |


 
 While we were wondering about why there is no nightly public transport facilities when there is an excellent public transport connectivity in the day time, we ran into this [Reddit](https://www.reddit.com/r/singapore/comments/2jz2f5/why_is_singapores_night_public_transportation_so/) post. Sometimes, users cannot avail taxis as well during the nightly hours as the drivers will be on a [shift-change](https://www.reddit.com/r/singapore/comments/4kga3c/beating_change_shift_taxis_drivers_at_game/#bottom-comments). We will talk more about shift change later in a different section.
 


**7:00 AM to 11:00 AM - The Office Hours**

In [1]:
%%HTML
<table width="100%">
<tr>
   <th>Taxi movement during office hours</th>
   <th>Bus movement during office hours</th>
</tr>
<tr>
<td valign="top" width="50%">
<video width="475" height="300"  controls>
  <source src="./videos/taxi_peak_hours.mp4" type="video/mp4">
</video>
</td>
<td valign="top" width="50%">
<video controls width="475" height="300" >
  <source src="./videos/bus_peak_hours.mp4" type="video/mp4">
</video>
</td>
</tr>
</table>

Taxi movement during office hours,Bus movement during office hours
,


Here is an idea of where people live in Singapore and how the Land is utilized in Singapore.   
   
   <img src="./images/SG-Land-Use-Map-2030.png" height="750pix" width="750pix">


**Taxis:**


<table width="100%">
<tr>
<td align="left" valign="top" width="30%">
<img src="./images/demo/taxi_15_0700.jpg">
</td>
<td align="left" valign="top" width="30%">
<img src="./images/demo/taxi_15_0715.jpg">
</td>
<td align="left" valign="top" width="30%">
<img src="./images/demo/taxi_15_0730.jpg">
</td>
</tr>
<tr>
<td align="left" valign="top" width="30%">
<img src="./images/demo/taxi_15_0745.jpg">
</td>
<td align="left" valign="top" width="30%">
<img src="./images/demo/taxi_15_0800.jpg">
</td>
<td align="left" valign="top" width="30%">
<img src="./images/demo/taxi_15_0815.jpg">
</td>
</tr>
<tr>
<td align="left" valign="top" width="30%">
<img src="./images/demo/taxi_15_0830.jpg">
</td>
<td align="left" valign="top" width="30%">
<img src="./images/demo/taxi_15_0845.jpg">
</td>
<td align="left" valign="top" width="30%">
<img src="./images/demo/taxi_15_0900.jpg">
</td>
</tr>
</table>


 As we have already mentioned, Taxis show up only when they are available. These 9 frames provide the taxi availablity during each time window respectively. The snapshot is taken on a weekday. Taxis seem to be available in the outskirts of the city initially. But the availablity progress towards the city centre as time pass by. At 9:00 AM we see lot of taxis available in the CBD but not in the outskirts. We infer that people utilise the taxis to travel from residential areas to commercial areas. They are generally office going folks. 
 
But during the weekend (18,19) the distribution of taxis seems to be dispersed among different neighborhoods in Singapore. 

**Buses:**

 Buses follow a schedule. We don't see much of change in the bus allocation (even the weekdays and weekend patterns looks similar). Although we have found bus data at 10 AM to be inconsitent with previous window (09:45 AM) and the next window (10:15 AM) during dates (15,16,17,18 and 19).
 

**Buses Vs Taxi**

We used [KL Divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) distance metric to see the distance between latent bus and taxi distributions.


<img src="./images/KL711.jpg" width="60%">


What we see here is the pattern where the Taxi Distribution is following the bus Distribution. As the time progresses from 7 AM to 11 AM, The Distance and hence dissimilarities between two distribution reduces. We believe that at this time, there is lot of inbound traffic movement of both buses and taxis towards the city which means there is a demand for transportation coming into the city.

In [7]:
%%HTML
<table width="100%">
<tr>
   <th>Taxi Distribution between 11 AM - 3 PM</th>
   <th>Bus Distribution between 11 AM - 3 PM</th>
</tr>
<tr>
<td align="left" valign="top" width="50%">
<video width="475" height="300"  controls>
  <source src="./videos/taxi_mid_hours.mp4" type="video/mp4">
</video>
</td>
<td align="left" valign="top" width="50%">
<video controls width="475" height="300" >
  <source src="./videos/bus_mid_hours.mp4" type="video/mp4">
</video>
</td>
</tr>

</table>

Taxi Distribution between 11 AM - 3 PM,Bus Distribution between 11 AM - 3 PM
,


<img src="./images/KLD11-3.jpg" width="60%">



The KL Divergence graph shows that the dissimilarities between taxi and bus distribution reduces over the period of time. But it is hard to conclude the same from the data.

In [8]:
%%HTML
<table width="100%">
<tr>
   <th>Taxi Distribution between 3 PM - 4 PM</th>
   <th>Bus Distribution between 3 PM - 4 PM</th>
</tr>
<tr>
<td align="left" valign="top" width="50%">
<video width="475" height="300"  controls>
  <source src="./videos/taxi_dull_hours.mp4" type="video/mp4">
</video>
</td>
<td align="left" valign="top" width="50%">
<video controls width="475" height="300" >
  <source src="./videos/bus_dull_hours.mp4" type="video/mp4">
</video>
</td>
</tr>

</table>

Taxi Distribution between 3 PM - 4 PM,Bus Distribution between 3 PM - 4 PM
,


<img src="./images/KLD3-4.jpg" width="60%">

This is the time of slow to no business. Genearlly Taxi-drivers use this timeslot for personal breaks, meal from whichever place they are in. They may also get back to work at this point of time or change shifts. That is why we see available taxis dispersed all around the city with one bright spot in the North Singapore Area,Yishun. We will reason this issue in the end. We can also see the bus distribution serving as a good proxy for taxi distribution at this point of time. Hence we see the KL divergence distance to be the lowest between two distributions.


In [10]:
%%HTML
<table width="100%">
<tr>
   <th>Taxi Distribution between 7 PM - 10 PM</th>
   <th>Bus Distribution between 7 PM - 10 PM</th>
</tr>
<tr>
<td align="left" valign="top" width="50%">
<video width="475" height="300"  controls>
  <source src="./videos/taxi_eve_hours.mp4" type="video/mp4">
</video>
</td>
<td align="left" valign="top" width="50%">
<video controls width="475" height="300" >
  <source src="./videos/bus_eve_hours.mp4" type="video/mp4">
</video>
</td>
</tr>

</table>

Taxi Distribution between 7 PM - 10 PM,Bus Distribution between 7 PM - 10 PM
,


<img src="./images/KLD7-10.jpg" width="60%">

This is the time where taxis serve as proxies to public transport due to the demand. Between 7 PM and 9PM we see lot of taxi availablity within the city limits and after 9 PM we see a dispersed taxi availablity distribution across the city. Hence we see an increase in the KL divergence metric initially from 7 PM to 9 PM but it eventually gets reduced after the 9 PM. During this time window (9 PM - 10 PM) taxi distribution can be approximated by the bus distribution which is also spread out all around the suburbs as well. 

In [11]:
%%HTML
<table width="100%">
<tr>
   <th>Taxi Distribution between 10 PM - 12 AM</th>
   <th>Bus Distribution between 10 PM - 12 AM</th>
</tr>
<tr>
<td align="left" valign="top" width="50%">
<video width="475" height="300"  controls>
  <source src="./videos/taxi_night_hours.mp4" type="video/mp4">
</video>
</td>
<td align="left" valign="top" width="50%">
<video controls width="475" height="300" >
  <source src="./videos/bus_night_hours.mp4" type="video/mp4">
</video>
</td>
</tr>

</table>

Taxi Distribution between 10 PM - 12 AM,Bus Distribution between 10 PM - 12 AM
,


<img src="./images/KLD10-12.jpg" width="60%">

Continuing the trend in the previous time window, Taxis are wide spread across the suburbs initially at 10 PM but the availability gets converged towards the center of the city by the midnight.

Why do taxis move towards the city by mid night?

  Most of the suburbs destinations the taxi goes before its return trip to cities are residential areas. They don't have a scope for business there. So we see an inbound traffic towards the city. Hence the aired taxi either completes the ride to suburb and just end the day's business or continues to the city proper in search of business.
  

Between this time period we see the KL distance to reduce as the taxis and buses seem to be dispersed in all the parts of the city. As the time progresses after 12 AM we see a steep increase in the KL distance between bus and Taxi distribution because the bus services cedes by that time.

[KL Distance After 12 image here]

** Weekday vs Weekend KL Divergence plots**

Images and Explanation by Hui

** Why taxis are highly available at any given point of time in Yishun Area**

Images and Explanation by Hui

## Code Appendix

## Taxi API Code

In [5]:
#!/home/bks4line/anaconda2/bin/python
# Author : Karthik Balasubramanian

import json
import urllib
from urlparse import urlparse
import httplib2 as http #External library
import pandas as pd
import time
from datetime import datetime
from pytz import timezone
import os
#  please get your account keys and place here
headers = { 'AccountKey' : 'XXXXX','accept' : 'application/json'}

uri = 'http://datamall2.mytransport.sg/' #Resource URL
path = 'ltaodataservice/Taxi-Availability?$skip='
fmt =  '%Y-%m-%d_%H:%M:%S'
sg = timezone('Asia/Singapore')
my_path = %pwd
dir_path = my_path+"/data"



def get_data_from_LTA(filename):
    
    global headers,uri,path,fmt,sg,dir_path

    
    #Build query string & specify type of API call
    
    final_list = []
    target = urlparse(uri + path+str(len(final_list)))

    
    
    method = 'GET'
    body = ''

    #Get handle to http
    h = http.Http()
    
    # Obtain results
    response, content = h.request(target.geturl(),method,body,headers)

    # Parse JSON to print
    jsonObj = json.loads(content)
    
    final_list.extend(jsonObj["value"])
    
    while(len(jsonObj["value"])>0):
        target = urlparse(uri + path+str(len(final_list)))
        # print target.geturl()
        response, content = h.request(target.geturl(),method,body,headers)
        jsonObj = json.loads(content)
        final_list.extend(jsonObj["value"])
    
    
    time_now_in_sg = datetime.now(sg)
    date_and_time_ff =  time_now_in_sg.strftime(fmt)
    date_and_time = date_and_time_ff.split("_")
    date_in_sg = [date_and_time[0]]*len(final_list)
    time_in_sg =  [date_and_time[1]]*len(final_list)
    
    df = pd.DataFrame(final_list)
    df['date'] = pd.Series(date_in_sg, index=df.index)
    df['time'] = pd.Series(time_in_sg, index=df.index)
    
    if not filename:
        filename =  dir_path+"/taxi_"+date_and_time_ff+".csv"
        df.to_csv(filename)
    else:
        file_size_exceed = float(os.path.getsize(filename))/float(5e+6)
        if file_size_exceed>1.0:
            print "file_size_exceed"
            filename = dir_path+"/taxi_"+date_and_time_ff+".csv"
            print "new file name {0}".format(filename)
            df.to_csv(filename)
        else:
            print "file size not exceeded"
            df.to_csv(filename, mode='a', header=False)

    return filename


#  run the below code 

 
# starttime =  time.time()
# filename = None
# # get_data_from_LTA(filename=None)
# while True:
#     filename = get_data_from_LTA(filename)
#     starttime =  time.mktime(datetime.now().timetuple())
#     time.sleep(50.0 - ((time.time() - starttime) % 60.0))

