# Performance Indicators

## Set Up

In [92]:
import seaborn as sns

%matplotlib inline

In [165]:
%run src/data/helper.py

In [166]:
%run src/data/periods.py

In [95]:
import pickle

readings = pickle.load(open("data/parsed/readings_weather_dataset_final.p", "rb"))
stations = pickle.load(open('data/parsed/stations_dataset_final.p', 'rb'))

## Introduction

The Service Level Agreement (SLA) signed by TfL and Serco outlines 34 performance indicators (PI) to measure the quality of the provided service. In this research we focused on the PIs that are related to bicycle distribution, PIs 24, 25, 26 and 27. These PIs take into account the following distinctions of time and stations:

* **Station Priority**: Each docking station was classified as **Priority 1 or Priority 2** based on it's importance for the overall operation of the London Cycle Hire Scheme. The 100 most used docking stations were given the Priority 1 classification, while the remaining stations were classified as Priority 2.
* ** Peak Hours**: The hours during which crowding on the public transportation system and the traffic on roads are  the highest are called **Peak Hours**. The SLA defines them to be from 7:00 to 10:00 and 16:00 to 19:00, all the other times are considered non-peak hours.

## Empty Stations PI

A docking docking station with no fully functional bicycles is considered to be **empty**. One or more biycles marked for repair can be present in an empty docking station. 

TO-REPHRASE The number of minutes, in whole minutes, that each Docking Station is full shall be accumulated over a calendar day. For avoidance of doubt, minutes shall not accumulate until a Docking Station has been empty for a full minute.

In [167]:
empty_entries = find_zero_periods(readings, 'NbBikes', group=False)

The data has been processed to compute the periods in which the stations were empty. The following dataframe contains this information.

In [168]:
empty_entries.head()

Unnamed: 0,Id,PeriodId,Timestamp
0,BikePoints_359,dc765159-c6c8-4e13-b889-c5eef6a6c216,2016-05-16 17:16:40.373000
1,BikePoints_359,dc765159-c6c8-4e13-b889-c5eef6a6c216,2016-05-16 18:06:46.650000
2,BikePoints_359,af7946e6-a10c-4abc-af65-0a55ccf788b1,2016-05-16 20:13:09.770000
3,BikePoints_359,af7946e6-a10c-4abc-af65-0a55ccf788b1,2016-05-16 23:59:59.999999
4,BikePoints_359,3f24903e-07ca-427c-bb3b-e263d9688448,2016-05-17 00:00:00.000000


### PI 24 - Accumulated Empty Time

The number of minutes for which each docking station is empty over a calendar day shall be accumulated. Then, these empty periods will be summed and grouped to assess the goodness of the bicycle distribution according to the following criteria:

|Classification|Time|Acceptable Service Level|
|-----|------|------|
|Priority 1|Non-Peak Hours|Less than 3000 minutes for all stations|
|Priority 1|Peak Hours|Less than 1000 minutes for all stations per peak period |
|Priority 2|Non-Peak Hours|Less than 9000 minutes for all stations|
|Priority 2|Peak Hours|Less than 18000 minutes for all stations per peak period |




In [None]:
empty_periods = group_ellapsed(empty_entries)
empty_periods = add_station_info(empty_periods, stations, ['Priority', 'Id'])

In [None]:
for i, row in empty_periods.iterrows():
    print i, row['Period'], get_period_day(row['Period'])

In [None]:
#empty_periods['PeakHours'].apply(lambda x: get_peak_hours(x))
empty_periods

The data shows that several stations were empty during entire days. It is very unlikely that these stations did not receive any bike during these time periods as the stations had fairly regular activity before and after. Therefore, we'll assume that these empty periods were caused by malfunctions in the stations or in the data collection process and should be discarded. In order to remove this erroneous readings, we will delete any empty periods of more than or equal to 720 minutes (12 hours). This number is what we consider the worst case scenario for an inactive station. It was computed by taking into account the 10 hours restriction placed by some boroughs that forbids the service provider to redistribute bicycles between 22:00 and 8:00 and the 2 hours that might be needed for the redistribution to take place.

In [None]:
empty_periods = empty_periods[~(empty_periods['Ellapsed'] > 720)]

In [None]:
empty_periods.describe()

In [None]:
sns.distplot(empty_periods.Ellapsed, kde=False)

### PI 24. Bicycle Distribution – Empty Docking Stations

### PI 26. Empty Station Maximum Time Period – Priority 1 Docking Stations

In [None]:
empty_periods.describe()

In [None]:
sns.distplot(empty_periods.Ellapsed, kde=False)

In [None]:
highp_stationids = stations[stations['Priority'] == 1].Id
highp_readings = readings[readings['Id'].isin(highp_stationids)]