As a starting point to solving the bike reposition problem, we want to understand how the supply and demand of bikes change at particular bike stations over time. 

We have two DataFrames, the bike station spatial data and trip data with start and end locations corresponding to bike stations. 

## Processing Demand/Supply 

We need some sort of way of estimating what the demand and supply would be at different points. We will take two approaches to this. 

* Demand/Supply at end of 24 hours
* Demand/Supply at end of the week 

We can estimate these values in the following way. 

* Set demand at each station to 0 
* Iterate from 1-24 (for 24 hours) or 0-6 for weeks.
     * At each iteration record a bike leaving a station as an increase in demand (+1) and a bike arriving the station as decrease in demand (-1).
     * Divide the demand by the number of days/weeks respectively 

This will leave us with a DataFrame containing the the Bike Station's locations and their average demand at the end of the specified time period.


We first load in the data bike station and trip data. 

In [5]:
import pandas as pd 
import os 
import numpy as np 

bikeStations = pd.read_pickle(os.path.join(os.getcwd(),r'pickles\bikeStations.pkl'))
trips = pd.read_pickle(os.path.join(os.getcwd(),r'pickles\trips.pkl'))
bikeStations.head(2)


Unnamed: 0,$type,id,url,commonName,placeType,additionalProperties,children,childrenUrls,lat,lon,demand
0,"Tfl.Api.Presentation.Entities.Place, Tfl.Api.P...",84,/Place/BikePoints_84,"Breams Buildings, Holborn",BikePoint,[{'$type': 'Tfl.Api.Presentation.Entities.Addi...,[],[],51.515937,-0.111778,0
1,"Tfl.Api.Presentation.Entities.Place, Tfl.Api.P...",85,/Place/BikePoints_85,"Tanner Street, Bermondsey",BikePoint,[{'$type': 'Tfl.Api.Presentation.Entities.Addi...,[],[],51.500647,-0.0786,0


In [2]:
trips.head(2)

Unnamed: 0,Rental Id,Duration,Bike Id,End Date,EndStation Id,EndStation Name,Start Date,StartStation Id,StartStation Name
0,73768697,1620.0,7457.0,2018-03-10 12:41:00,807,"Bevington Road West, North Kensington",2018-03-10 12:14:00,647,"Richmond Way, Shepherd's Bush"
1,73703818,180.0,10899.0,2018-03-07 15:14:00,446,"York Hall, Bethnal Green",2018-03-07 15:11:00,722,"Finnis Street, Bethnal Green"


We create a function that is iterates over a specific range and iteratively calculates the amount of bikes at a particular station. 
This function will produce a demand value where a positive value signfies demand for bikes and a negative value signifies excess supply of bikes. 

In [9]:
def processTripsOverTime(trips,bikeStations,av,range,t):
    bikeStations = bikeStations.copy()
    def average(f,n):
        return f/n
    for i in range:
        ##Masks for start and end hours##
        if t == "W":
            timeMaskEnd = (trips['End Date'].dt.day_of_week >= i) & (trips['End Date'].dt.day_of_week  < i +1)
            timeMaskStart = (trips['Start Date'].dt.day_of_week  >= i) & (trips['Start Date'].dt.day_of_week  < i +1)
        if t =="D":
            timeMaskEnd = (trips['End Date'].dt.hour >= i) & (trips['End Date'].dt.hour < i +1)
            timeMaskStart = (trips['Start Date'].dt.hour >= i) & (trips['Start Date'].dt.hour < i +1)
        
        ##Apply time masks and count the occurences of trips starting and ending in the hour for all trips 
        endFrame= trips[timeMaskEnd]
        endFrameCounts = endFrame['EndStation Id'].value_counts().to_frame()
        startFrame = trips[timeMaskStart]
        startFrameCounts = startFrame['StartStation Id'].value_counts().to_frame()
        
        endFrameCounts.index = endFrameCounts.index.map(str)
        startFrameCounts.index = startFrameCounts.index.map(str)
        endFrameCounts = endFrameCounts['EndStation Id']
        startFrameCounts = startFrameCounts['StartStation Id']
        # Merge bikestations with counted occurences of start and end
        # First Merge ending occurences, use left to ignore invalid ids( -1 ) this creates a column called 'endstation id' in bikestations 
        # For columns with no endstations, set the count to 0 using fillna
        bikeStations= pd.merge(bikeStations, endFrameCounts, how='left', left_on='id',right_index= True)   
        bikeStations['EndStation Id'] = bikeStations['EndStation Id'].fillna(0)
        bikeStations['demand'] -= bikeStations['EndStation Id']   #Minus demand from number of bikes ending as it creates a surplus 
        bikeStations = bikeStations.drop(columns= 'EndStation Id')       # Drop the counts once combined with endstationid 
        bikeStations= pd.merge(bikeStations, startFrameCounts, how='left', left_on='id',right_index= True)
        bikeStations['StartStation Id'] = bikeStations['StartStation Id'].fillna(0)
        bikeStations['demand'] += bikeStations['StartStation Id'] #Same for startstationid but instead add to demand since bikes are leaving 
        bikeStations = bikeStations.drop(columns='StartStation Id')
    
    bikeStations['demand'] = average(bikeStations['demand'],av)
    return bikeStations

The function takes a range object to iterate over, an average which specifies the number of days/weeks and string specifiying weeks or days. 
It creates a time mask at each iteration and applies it to the trips objects. 
It then counts the number of occurences of trips starting and ending at a specific location over a time period. 
At the end of the iterations it takes the average of the demand values. 


We can apply the function to estimate the amount of bikes at the end of a week and at the end of 24 hours.
First we calculate the number of days/weeks over which the trips range 

In [10]:
first = trips["Start Date"].min()
last = trips["Start Date"].max()
def diff(start, end):
    x = pd.to_datetime(end) - pd.to_datetime(start)
    return int(x / np.timedelta64(1, 'W'))
weeks = diff(first,last)
months = weeks /4
days = weeks * 7 

Next we apply the function for 24 hours. We pass in a range object of (0,24) and the number of days. 

In [13]:
bikeStationsDemandDaily = processTripsOverTime(trips,bikeStations,days,range(0,24),"D")
bikeStationsDemandDaily.head()

Unnamed: 0,$type,id,url,commonName,placeType,additionalProperties,children,childrenUrls,lat,lon,demand
0,"Tfl.Api.Presentation.Entities.Place, Tfl.Api.P...",84,/Place/BikePoints_84,"Breams Buildings, Holborn",BikePoint,[{'$type': 'Tfl.Api.Presentation.Entities.Addi...,[],[],51.515937,-0.111778,-0.093452
1,"Tfl.Api.Presentation.Entities.Place, Tfl.Api.P...",85,/Place/BikePoints_85,"Tanner Street, Bermondsey",BikePoint,[{'$type': 'Tfl.Api.Presentation.Entities.Addi...,[],[],51.500647,-0.0786,-0.332738
2,"Tfl.Api.Presentation.Entities.Place, Tfl.Api.P...",86,/Place/BikePoints_86,"Sancroft Street, Vauxhall",BikePoint,[{'$type': 'Tfl.Api.Presentation.Entities.Addi...,[],[],51.489479,-0.115156,0.375595
3,"Tfl.Api.Presentation.Entities.Place, Tfl.Api.P...",87,/Place/BikePoints_87,"Devonshire Square, Liverpool Street",BikePoint,[{'$type': 'Tfl.Api.Presentation.Entities.Addi...,[],[],51.516468,-0.079684,-0.132738
4,"Tfl.Api.Presentation.Entities.Place, Tfl.Api.P...",88,/Place/BikePoints_88,"Bayley Street , Bloomsbury",BikePoint,[{'$type': 'Tfl.Api.Presentation.Entities.Addi...,[],[],51.518587,-0.132053,-0.314286


Next we apply the function over a week. 

In [14]:
bikeStationsDemandWeekly  = processTripsOverTime(trips,bikeStations,weeks,range(0,7),"W")
bikeStationsDemandWeekly.head()

Unnamed: 0,$type,id,url,commonName,placeType,additionalProperties,children,childrenUrls,lat,lon,demand
0,"Tfl.Api.Presentation.Entities.Place, Tfl.Api.P...",84,/Place/BikePoints_84,"Breams Buildings, Holborn",BikePoint,[{'$type': 'Tfl.Api.Presentation.Entities.Addi...,[],[],51.515937,-0.111778,-0.654167
1,"Tfl.Api.Presentation.Entities.Place, Tfl.Api.P...",85,/Place/BikePoints_85,"Tanner Street, Bermondsey",BikePoint,[{'$type': 'Tfl.Api.Presentation.Entities.Addi...,[],[],51.500647,-0.0786,-2.329167
2,"Tfl.Api.Presentation.Entities.Place, Tfl.Api.P...",86,/Place/BikePoints_86,"Sancroft Street, Vauxhall",BikePoint,[{'$type': 'Tfl.Api.Presentation.Entities.Addi...,[],[],51.489479,-0.115156,2.629167
3,"Tfl.Api.Presentation.Entities.Place, Tfl.Api.P...",87,/Place/BikePoints_87,"Devonshire Square, Liverpool Street",BikePoint,[{'$type': 'Tfl.Api.Presentation.Entities.Addi...,[],[],51.516468,-0.079684,-0.929167
4,"Tfl.Api.Presentation.Entities.Place, Tfl.Api.P...",88,/Place/BikePoints_88,"Bayley Street , Bloomsbury",BikePoint,[{'$type': 'Tfl.Api.Presentation.Entities.Addi...,[],[],51.518587,-0.132053,-2.2


Finally we save the weekly and daily estimates a pickles for later use. 



In [15]:
bikeStationsDemandDaily.to_pickle(os.path.join(os.getcwd(),r'pickles\bikeStationsDemandDaily.pkl'))
bikeStationsDemandWeekly.to_pickle(os.path.join(os.getcwd(),r'pickles\bikeStationsDemandWeekly.pkl'))