We have the following data from the Transport for London API's

* [Bike Station Locational Data](https://api.tfl.gov.uk/bikepoint)
* [Trip Data ](https://cycling.data.tfl.gov.uk/)

We have raw data on which lists the id's of all bike stations in central london along with the lat/lon coordinates. 
We have trip data on individual bike trips which provides us with the start and end date/time of the trips and the start and end station id's. 

The data currently needs processing so it can be used later to perform anaylsis. 
In particular, the bike station locational datas id's are in the form "stationid_x" where x is the station id. The trip data only contains a single id. We will convert the locational datas id's into the same form as the trip id's. 

We will also add a demand column to each locational data with an integer value of 0. This demand value represents the excess of supply/demand of individual bikes at the given station. 

The trip start and end trip id's have some missing id's which are processed by pandas as NaN. We will convert these values to -1 to signify a missing id. 

Finally we will convert each of trip start and end dates to a datetime object. 


In [25]:
import pandas as pd
import glob
import os 
import numpy as np
from os import listdir
from os.path import isfile, join


def getAndProcessData():
    ##  Read data on bikestations  ## 
    url="https://api.tfl.gov.uk/bikepoint"
    bikeStations = pd.read_json(url)
    bikeStations["demand"] = 0
    def idGenerator(x):
            return x.split("_")[1]
    bikeStations.id = np.vectorize(idGenerator)(bikeStations.id)
    notebook_path = os.path.abspath("Data_Wrangling.ipynb")
    tripsPath = os.path.join(os.path.dirname(notebook_path), "_TfL Cycling Data")
    onlyfiles = [f for f in listdir(tripsPath) if isfile(join(tripsPath, f))]
    li = []
    for filename in onlyfiles:
        df = pd.read_csv("_TfL Cycling Data/"+ filename, index_col=None, header=0)
        li.append(df)
    trips = pd.concat(li, axis=0, ignore_index=True)
    trips[['EndStation Id']] = trips[['EndStation Id']].fillna(value=-1) # Endstations have some empty values, fill with -1 
    trips['EndStation Id'] = trips['EndStation Id'].astype(np.int64)     # Cast as int as empty values default to float 
    trips[['End Date','Start Date']] = trips[['End Date','Start Date']].apply(lambda _: pd.to_datetime(_,format = "%d/%m/%Y %H:%M")) # Convert dates to datetime objects 
    return trips,bikeStations
    


In [26]:
trips,bikeStations = getAndProcessData()
bikeStations.head()


Unnamed: 0,$type,id,url,commonName,placeType,additionalProperties,children,childrenUrls,lat,lon,demand
0,"Tfl.Api.Presentation.Entities.Place, Tfl.Api.P...",84,/Place/BikePoints_84,"Breams Buildings, Holborn",BikePoint,[{'$type': 'Tfl.Api.Presentation.Entities.Addi...,[],[],51.515937,-0.111778,0
1,"Tfl.Api.Presentation.Entities.Place, Tfl.Api.P...",85,/Place/BikePoints_85,"Tanner Street, Bermondsey",BikePoint,[{'$type': 'Tfl.Api.Presentation.Entities.Addi...,[],[],51.500647,-0.0786,0
2,"Tfl.Api.Presentation.Entities.Place, Tfl.Api.P...",86,/Place/BikePoints_86,"Sancroft Street, Vauxhall",BikePoint,[{'$type': 'Tfl.Api.Presentation.Entities.Addi...,[],[],51.489479,-0.115156,0
3,"Tfl.Api.Presentation.Entities.Place, Tfl.Api.P...",87,/Place/BikePoints_87,"Devonshire Square, Liverpool Street",BikePoint,[{'$type': 'Tfl.Api.Presentation.Entities.Addi...,[],[],51.516468,-0.079684,0
4,"Tfl.Api.Presentation.Entities.Place, Tfl.Api.P...",88,/Place/BikePoints_88,"Bayley Street , Bloomsbury",BikePoint,[{'$type': 'Tfl.Api.Presentation.Entities.Addi...,[],[],51.518587,-0.132053,0


In [28]:
trips.head()


Unnamed: 0,Rental Id,Duration,Bike Id,End Date,EndStation Id,EndStation Name,Start Date,StartStation Id,StartStation Name
0,73768697,1620.0,7457.0,2018-03-10 12:41:00,807,"Bevington Road West, North Kensington",2018-03-10 12:14:00,647,"Richmond Way, Shepherd's Bush"
1,73703818,180.0,10899.0,2018-03-07 15:14:00,446,"York Hall, Bethnal Green",2018-03-07 15:11:00,722,"Finnis Street, Bethnal Green"
2,73729421,1020.0,4813.0,2018-03-08 13:57:00,159,"Great Marlborough Street, Soho",2018-03-08 13:40:00,797,"Ossulston Street, Somers Town"
3,73749828,240.0,8307.0,2018-03-09 08:57:00,578,"Hollybush Gardens, Bethnal Green",2018-03-09 08:53:00,719,"Victoria Park Road, Hackney Central"
4,73822143,240.0,8433.0,2018-03-13 07:00:00,578,"Hollybush Gardens, Bethnal Green",2018-03-13 06:56:00,719,"Victoria Park Road, Hackney Central"
