# Performance Indicators

## Set Up

In [185]:
import seaborn as sns

%matplotlib inline

In [244]:
%run src/data/helper.py

In [245]:
%run src/data/periods.py

In [188]:
import pickle

readings = pickle.load(open("data/parsed/readings_weather_dataset_final.p", "rb"))
stations = pickle.load(open('data/parsed/stations_dataset_final.p', 'rb'))

## Introduction

The Service Level Agreement (SLA) signed by TfL and Serco outlines 34 performance indicators (PI) to measure the quality of the provided service. In this research we focused on the PIs that are related to bicycle distribution, PIs 24, 25, 26 and 27. These PIs take into account the following distinctions of time and stations:

* **Station Priority**: Each docking station was classified as **Priority 1 or Priority 2** based on it's importance for the overall operation of the London Cycle Hire Scheme. The 100 most used docking stations were given the Priority 1 classification, while the remaining stations were classified as Priority 2.
* ** Peak Hours**: The hours during which crowding on the public transportation system and the traffic on roads are  the highest are called **Peak Hours**. The SLA defines them to be from 7:00 to 10:00 and 16:00 to 19:00, all the other times are considered non-peak hours.

## Empty Stations PIs

A docking docking station with no fully functional bicycles is considered to be **empty**. One or more biycles marked for repair can be present in an empty docking station. 

In [246]:
empty_entries = find_zero_periods(readings, 'NbBikes')

In [247]:
empty_groups = get_ellapsed_time(empty_entries, by='GroupId').sort_values(by=['Ellapsed'], ascending=False)

The data was processed to compute the periods during which the stations were empty. The following dataframe shows this information.

In [248]:
empty_groups[['Id','Period','Ellapsed']].head()

Unnamed: 0,Id,Period,Ellapsed
52455,BikePoints_86,"(2016-05-27 12:57:50.593000, 2016-06-10 17:18:...",20420.0
49354,BikePoints_742,"(2016-05-24 20:37:37.840000, 2016-06-06 16:23:...",18466.0
51394,BikePoints_791,"(2016-06-03 12:36:28.897000, 2016-06-14 16:52:...",16096.0
51390,BikePoints_791,"(2016-05-20 10:40:04.140000, 2016-05-31 12:59:...",15980.0
52112,BikePoints_817,"(2016-06-03 11:27:24.940000, 2016-06-13 13:32:...",14525.0


The data shows that several stations were empty during entire days. It is very unlikely that these stations did not receive any bike during these time periods as the stations had fairly regular activity before and after. Therefore, we'll assume that these empty periods were caused by malfunctions in the stations or in the data collection process and should be discarded. 

In order to remove this erroneous readings, we will delete any empty periods of more than or equal to 720 minutes (12 hours). This number is what we consider the worst case scenario for an inactive station. It was computed by taking into account the 10 hours restriction placed by some boroughs that forbids the service provider to redistribute bicycles between 22:00 and 8:00 and the 2 hours that might be needed for the redistribution to take place.

In [249]:
invalid_group_ids = empty_groups[empty_groups.Ellapsed >= 720].GroupId
empty_entries = empty_entries[~empty_entries.GroupId.isin(invalid_group_ids)]

The empty periods were further divided by day, morning peak hours, evening, peak hours, and non-peak hours.

In [250]:
empty_periods = get_ellapsed_time(empty_entries, by='PeriodId')
empty_periods = add_station_info(empty_periods, stations, ['Priority', 'Id'])
empty_periods['Day'] = empty_periods['Period'].apply(lambda x: get_period_day(x))
empty_periods['PeakHours'] = empty_periods['Period'].apply(lambda x: is_peaktime(x)[1])

In [251]:
empty_periods[['Id','Period','Ellapsed','Priority','Day','PeakHours']].sample(5)

Unnamed: 0,Id,Period,Ellapsed,Priority,Day,PeakHours
39173,BikePoints_407,"(2016-06-21 18:18:09.350000, 2016-06-21 18:33:12)",15.0,1.0,2016-06-21,EVENING_PEAK
13163,BikePoints_195,"(2016-06-21 17:43:04.990000, 2016-06-21 17:48:...",5.0,2.0,2016-06-21,EVENING_PEAK
31991,BikePoints_357,"(2016-05-31 16:55:36.033000, 2016-05-31 17:10:...",15.0,2.0,2016-05-31,EVENING_PEAK
50936,BikePoints_561,"(2016-06-21 18:13:10.187000, 2016-06-21 18:18:...",5.0,2.0,2016-06-21,EVENING_PEAK
41556,BikePoints_436,"(2016-06-22 20:25:37.133000, 2016-06-22 21:10:...",45.0,1.0,2016-06-22,NON-PEAK


### PI 24 - Accumulated Empty Time

The number of minutes for which each docking station is empty over a calendar day shall be accumulated. Then, these empty periods will be summed and grouped to assess the goodness of the bicycle distribution according to the following criteria:

|Classification|Time|Acceptable Service Level|
|-----|------|------|
|Priority 1|Non-Peak Hours|Less than 3000 minutes for all stations|
|Priority 1|Peak Hours|Less than 1000 minutes for all stations per peak period |
|Priority 2|Non-Peak Hours|Less than 9000 minutes for all stations|
|Priority 2|Peak Hours|Less than 18000 minutes for all stations per peak period |




In [243]:
filter_by_id(readings, 'BikePoints_101').to_csv('chido.csv')

In [241]:
a = empty_periods[empty_periods.Day == datetime(2016,5,16).date()]
b = a[a.PeakHours == 'NON_PEAK']
c = b[b.Priority == 1]
c = filter_by_id(c, 'BikePoints_101')
for i, row in c[0:20].iterrows():
    print row.Ellapsed, row['Period']

175.0 (Timestamp('2016-05-16 01:00:43.297000'), Timestamp('2016-05-16 03:56:03.683000'))
55.0 (Timestamp('2016-05-16 20:23:11.830000'), Timestamp('2016-05-16 21:18:17.297000'))
20.0 (Timestamp('2016-05-16 19:37:00.460000'), Timestamp('2016-05-16 19:57:01.780000'))
373.0 (Timestamp('2016-05-16 21:38:20.420000'), Timestamp('2016-05-17 03:51:05.730000'))


In [252]:
pi24_results = empty_periods.groupby(['Day', 'Priority', 'PeakHours']).sum()

In [253]:
pi24_results

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Ellapsed
Day,Priority,PeakHours,Unnamed: 3_level_1
2016-05-16,1.0,EVENING_PEAK,1647.0
2016-05-16,1.0,MORNING_PEAK,857.0
2016-05-16,1.0,NON-PEAK,9368.0
2016-05-16,2.0,EVENING_PEAK,8840.0
2016-05-16,2.0,MORNING_PEAK,13651.0
2016-05-16,2.0,NON-PEAK,50382.0
2016-05-17,1.0,EVENING_PEAK,1994.0
2016-05-17,1.0,MORNING_PEAK,1068.0
2016-05-17,1.0,NON-PEAK,21661.0
2016-05-17,2.0,EVENING_PEAK,8096.0


In [220]:
pi24_results

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Ellapsed
Day,Priority,PeakHours,Unnamed: 3_level_1
2016-05-16,1.0,EVENING_PEAK,1647.0
2016-05-16,1.0,MORNING_PEAK,857.0
2016-05-16,1.0,NON_PEAK,18428.0
2016-05-16,2.0,EVENING_PEAK,8840.0
2016-05-16,2.0,MORNING_PEAK,13651.0
2016-05-16,2.0,NON_PEAK,79052.0
2016-05-17,1.0,EVENING_PEAK,1994.0
2016-05-17,1.0,MORNING_PEAK,1068.0
2016-05-17,1.0,NON_PEAK,22310.0
2016-05-17,2.0,EVENING_PEAK,8096.0


#### Priority 1	Non-Peak Hours

In [None]:
pi24_results.xs((1.0, 'NON-PEAK'), level=('Priority', 'PeakHours')).plot(kind='bar')

In [None]:
a = pi24_results.xs((1.0, 'NON-PEAK'), level=('Priority', 'PeakHours'))
a.reset_index(level=0, inplace=True)
ax = sns.barplot(x='Day', hue='Day', y="Ellapsed", data=a)

In [None]:
sns.distplot(empty_periods.Ellapsed, kde=False)

### PI 24. Bicycle Distribution – Empty Docking Stations

### PI 26. Empty Station Maximum Time Period – Priority 1 Docking Stations

In [None]:
empty_periods.describe()

In [None]:
sns.distplot(empty_periods.Ellapsed, kde=False)

In [None]:
highp_stationids = stations[stations['Priority'] == 1].Id
highp_readings = readings[readings['Id'].isin(highp_stationids)]