# Project on Tartu Smart Bike data analysis
## Introduction to Data Science - LTAT.02.002


# Table of Contents
1. [Data exploring](#exploring)  
    1.1. [Performing the exploration on the routes dataset from June](#exploring1)
2. [Data cleaning](#cleaning)  
    2.1. [Removing rows with missing and incorrect values in columns 'startstationname' and 'endstationname'](#cleaning1)  
    2.2. [Removing rides where start and end station are the same (and rides that are less than 100 m.)](#cleaning2)  
    2.3. [Correcting spelling differences in Membership column](#cleaning3)  
    2.4. [Converting 'unlockedat' and 'lockedat' from string to date](#cleaning4)  
    2.5. [Adding 'weekday' as a column to datasets](#cleaning5)  
    2.6. [Exporting cleaned datasets to csv](#cleaning6)  
    2.7. [Merging all 4 route datasets](#merging)  
3. [Data analysis](#analysis)  
    3.1. [Creating a new dataset for stations](#analysis1)  
    3.2. [Finding the most popular stations](#analysis2)  
    3.3. [Distribution of memberships throughout four months](#analysis3)  
    3.4. [The average length and duration of bike rides and number of violations within 4 months](#analysis4)  
    3.5. [Bike usage distribution throughout the day in 2-hour intervals](#analysis5)  
    3.5.1. [Workdays vs weekend](#analysis51)   
    3.5.2. [Bike usage distribution on every weekday](#analysis52)  
    3.6. [Heatmap: Length of rides based on weekday and starting station ](#heatmap)
   

## 1. Data exploring <a name="exploring"></a>
**Used data:**  Confidential Tartu Smart Bike rental and locations data (June 2019 - September 2019)

In [1]:
import pandas as pd
import numpy as np
# reading in data from csv-s
# csv files are in a folder named 'data_for_IDS2019_project_team_W17' 
# that's in the same folder as project root folder 'tartu-smart-bike'
data_routes_june = pd.read_csv("..\\data_for_IDS2019_project_team_W17\\bicycle_data\\routes_201906.csv",sep=',')
data_routes_july = pd.read_csv("..\\data_for_IDS2019_project_team_W17\\bicycle_data\\routes_201907.csv",sep=',')
data_routes_august = pd.read_csv("..\\data_for_IDS2019_project_team_W17\\bicycle_data\\routes_201908.csv",sep=',')
data_routes_september = pd.read_csv("..\\data_for_IDS2019_project_team_W17\\bicycle_data\\routes_201909.csv",sep=',')

datasets = [data_routes_june, data_routes_july, data_routes_august, data_routes_september]

### 1.1 Performing the exploration on the routes dataset from June <a name="exploring1"></a>

In [2]:
data_routes_june.dtypes

route_code            int64
cyclenumber           int64
unlockedat           object
unlockedattime       object
lockedat             object
lockedattime         object
startstationname     object
endstationname       object
rfidnumber           object
length              float64
DurationMinutes     float64
CycleType            object
costs                 int64
Membership           object
dtype: object

In [3]:
for column in data_routes_june.columns:
    print(data_routes_june[column].value_counts())
    print()

1560100206883    2
1560103709436    2
1560098082028    2
1560092928595    2
1560325697121    2
1560186500597    2
1560104282400    2
1560079811191    2
1560019239348    2
1560092040307    2
1560275782057    2
1560105841532    2
1560180777492    2
1559991888983    2
1560273041215    2
1560105323387    2
1560198431598    2
1560017348637    2
1560340586346    2
1560245671235    2
1560058276023    2
1560270771546    2
1560189102885    1
1560267703597    1
1560262561020    1
1560145037546    1
1560074676717    1
1560311929962    1
1560156638452    1
1560262016138    1
                ..
1560260667888    1
1560143223708    1
1560254328799    1
1560190980043    1
1560276572104    1
1560340794816    1
1560016398286    1
1560095305679    1
1560186673088    1
1559990069190    1
1560018120647    1
1559985487813    1
1560189571035    1
1560167948249    1
1560172406750    1
1560329435100    1
1560198430717    1
1560174247901    1
1560025781783    1
1560157867987    1
1560089716694    1
156001068437

In [4]:
# Missing values in the datasets are defined as '-- Määramata --' and 'Undetermined'
def missing_values(data):
    total_rows = len(data.index)
    for column in data.columns:
        total_missing = 0
        if type(data[column].loc[0]) == str:
            total_missing += data[column].str.contains('-- Määramata --').sum()
            total_missing += data[column].str.contains('Undetermined').sum()
        if total_missing > 0:
            print ('Number of missing values in column \'' + column + '\': ' + str(total_missing) )

In [5]:
for data in datasets:
    missing_values(data)
    print()

Number of missing values in column 'startstationname': 400
Number of missing values in column 'endstationname': 254

Number of missing values in column 'startstationname': 306
Number of missing values in column 'endstationname': 261

Number of missing values in column 'startstationname': 495
Number of missing values in column 'endstationname': 539

Number of missing values in column 'startstationname': 490
Number of missing values in column 'endstationname': 628



## 2. Data cleaning <a name="cleaning"></a>
### 2.1. Removing rows with missing and incorrect values in columns 'startstationname' and 'endstationname' <a name="cleaning1"></a>

In [6]:
def remove_missing_rows(data):
    # Missing values in the datasets are defined as '-- Määramata --' and 'Undetermined'
    indices = data[(data['startstationname'] == '-- Määramata --')].index
    data.drop(indices, inplace=True)
    indices = data[(data['startstationname'] == 'Undetermined')].index
    data.drop(indices, inplace=True)
    indices = data[(data['endstationname'] == '-- Määramata --')].index
    data.drop(indices, inplace=True)
    indices = data[(data['endstationname'] == 'Undetermined')].index
    data.drop(indices, inplace=True)
    
    # Stations 'Warehouse', 'Töökoda', 'SPARK Demo' and 'Delta' are not publically usable
    indices = data[(data['startstationname'] == 'Warehouse')].index
    data.drop(indices, inplace=True)
    indices = data[(data['endstationname'] == 'Warehouse')].index
    data.drop(indices, inplace=True)
    indices = data[(data['startstationname'] == 'Delta')].index
    data.drop(indices, inplace=True)
    indices = data[(data['endstationname'] == 'Delta')].index
    data.drop(indices, inplace=True)
    indices = data[(data['startstationname'] == 'Töökoda')].index
    data.drop(indices, inplace=True)
    indices = data[(data['endstationname'] == 'Töökoda')].index
    data.drop(indices, inplace=True)
    indices = data[(data['startstationname'] == 'SPARK Demo')].index
    data.drop(indices, inplace=True)
    indices = data[(data['endstationname'] == 'SPARK Demo')].index
    data.drop(indices, inplace=True)
    
    # Removing all rows where there are NaN values
    data.dropna(inplace=True)

In [7]:
for data in datasets:
    remove_missing_rows(data)

### 2.2. Removing rides where start and end station are the same (and rides that are less than 100 m). <a name="cleaning2"></a>
The ones with the same start and end station are the 'rides' where the bike is undocked and then re-docked into the same dock without actually riding anywhere.   
Not sure if it's okay to remove less than 100m rides because it seems like in some rides the distance measuring was not working because start and end station are different and ride duration is like 20 minutes. Or maybe these are the ones where the bikes are reallocated, so that's why it doesn't measure any distance travelled.

In [8]:
def remove_incorrect_rides(data):
    #indices = data[(data['length'] < 0.100)].index
    #data.drop(indices, inplace=True)
    indices = data[(data['startstationname'] == data['endstationname'])].index
    data.drop(indices, inplace=True)

In [9]:
for data in datasets:
    remove_incorrect_rides(data)

### 2.3. Correcting spelling differences in Membership column <a name="cleaning3"></a>

In [10]:
data_routes_june.Membership.value_counts()

3 month free membership            17867
3-Month Free Membership            14017
Free Membership (with bus card)     8281
1 month free membership             6679
1-month Free Membership             2530
1 Year Membership                      5
1 mo                                   1
Name: Membership, dtype: int64

In [11]:
def correct_spelling(data):
    data.Membership = data.Membership.replace(to_replace ="3 month free membership", value = "3-Month Free Membership")
    data.Membership = data.Membership.replace(to_replace ="1 DAY MEMBERSHIP", value = "1 Day Membership")
    data.Membership = data.Membership.replace(to_replace ="1 YEAR MEMBERSHIP", value = "1 Year Membership")
    data.Membership = data.Membership.replace(to_replace ="1 WEEK MEMBERSHIP", value = "1 Week Membership")
    data.Membership = data.Membership.replace(to_replace ="1 month free membership", value = "1-month Free Membership")
    data.drop(data[(data.Membership == 'Free M')].index, inplace=True)
    data.drop(data[(data.Membership == '1 mo')].index, inplace=True)

In [12]:
for data in datasets:
    correct_spelling(data)

In [13]:
data_routes_june.Membership.value_counts()

3-Month Free Membership            31884
1-month Free Membership             9209
Free Membership (with bus card)     8281
1 Year Membership                      5
Name: Membership, dtype: int64

### 2.4. Converting 'unlockedat' and 'lockedat' from string to date <a name="cleaning4"></a>

In [14]:
import datetime
def string_to_date(data):
    data.unlockedat = pd.to_datetime(data.unlockedat, format="%Y-%m-%d")
    data.lockedat = pd.to_datetime(data.lockedat, format="%Y-%m-%d")

In [15]:
for data in datasets:
    string_to_date(data)

### 2.5. Adding 'weekday' as a column to datasets <a name="cleaning5"></a>
Weekday is based on unlocking date.

In [16]:
def add_weekday(data):
    data.insert(2,"weekday",data['unlockedat'].dt.day_name())

In [17]:
for data in datasets:
    add_weekday(data)

In [18]:
data_routes_june.weekday.value_counts()

Monday       13977
Sunday       12327
Tuesday       9277
Wednesday     7125
Saturday      6668
Thursday         5
Name: weekday, dtype: int64

### 2.6. Exporting cleaned datasets to csv <a name="cleaning6"></a>
Cleaned data will be outside of project root directory because the data is confidential. You must have directory named "Cleaned data" on the same level as project root folder.

In [19]:
data_routes_june.to_csv(path_or_buf="..\\Cleaned data\\routes_2019_06_clean.csv",index=False)
data_routes_july.to_csv(path_or_buf="..\\Cleaned data\\routes_2019_07_clean.csv",index=False)
data_routes_august.to_csv(path_or_buf="..\\Cleaned data\\routes_2019_08_clean.csv",index=False)
data_routes_september.to_csv(path_or_buf="..\\Cleaned data\\routes_2019_09_clean.csv",index=False)

### 2.7. Merging all 4 routes datasets <a name="merging"></a>

In [20]:
for data in datasets:
    print(data.shape)

(49379, 15)
(51917, 15)
(51106, 15)
(50268, 15)


In [21]:
routes_data = pd.concat(datasets)
routes_data.shape

(202670, 15)

## 3. Data analysis <a name="analysis"></a>

### 3.1. Creating a new dataset for stations <a name="analysis1"></a>

In [22]:
pd.set_option('display.max_rows', 500)

# number of bike unlockings per station
unlockings_per_station = routes_data.startstationname.value_counts()
print(unlockings_per_station)
print()

# number of bike lockings per station
lockings_per_station = routes_data.endstationname.value_counts()
print(lockings_per_station)

Uueturu                 11964
Soola                    7940
Eeden                    6172
Pirogovi plats           5941
Vabadussild              4735
Aleksandri               4667
Lõunakeskus              4624
Turusild                 4403
Veeriku                  4346
Annelinna keskus         4202
Anne saun                4071
Raudteejaam              4034
Rahu                     3959
Magistri                 3929
Annelinna kiir           3865
Aparaaditehas            3764
Tarmeko                  3705
Roosi                    3672
Riiamäe                  3647
Annelinn                 3632
Kaunase puiestee         3506
Kalda tee                3360
Raatuse kool             3205
Sõpruse sild             3133
Pargi                    3110
Vanemuise park           3025
Sõbra                    2969
Näituse                  2955
Papli                    2944
Kuu                      2900
Raatuse                  2896
Karete                   2882
AHHAA                    2862
Põhja puie

In [23]:
stations = pd.DataFrame()
stations['Name'] = unlockings_per_station.index
stations['Unlockings'] = unlockings_per_station.values
stations['Lockings'] = ""
for i in range(len(stations.Name)):
    stations['Lockings'].iloc[i] = lockings_per_station.get(stations.Name.iloc[i])
stations

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)


Unnamed: 0,Name,Unlockings,Lockings
0,Uueturu,11964,11938
1,Soola,7940,8004
2,Eeden,6172,6091
3,Pirogovi plats,5941,6057
4,Vabadussild,4735,4746
5,Aleksandri,4667,4730
6,Lõunakeskus,4624,4727
7,Turusild,4403,4361
8,Veeriku,4346,4185
9,Annelinna keskus,4202,4268


### 3.2. Finding the most popular stations <a name="analysis2"></a>
We wanted to find bike `usage per dock` in each station, so we would have been able to find the most popular stations' not influenced by the size of the stations. However, we did not have the data about the actual sizes of the bike stations.

### 3.3. Distribution of memberships throughout four months <a name="analysis3"></a>
For a better overview we decided to divide memberships into 3 groups:
1. Free memberships
2. Paid memberships
3. With bus card

In [24]:
def new_membership_value(row):
    if row['Membership'] == "3-Month Free Membership" or row['Membership'] == "1-month Free Membership":
        return "Free memberships"
    if row['Membership'] == "Free Membership (with bus card)":
        return "With bus card"
    else:
        return "Paid memberships"

def add_column_liikmelisus(data):
    #adding the new column 'Liikmelisus'
    data['Liikmelisus'] = data.apply (lambda row: new_membership_value(row), axis=1)
    

In [None]:
# Adding column 'liikmelisus' to every month's dataset
for data in datasets:
    add_column_liikmelisus(data)

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline 
plt.style.use("ggplot")  # Default plot style
months = ["June", "July", "August", "September"]
for i in range(len(datasets)):
    counts = datasets[i]['Liikmelisus'].value_counts(normalize=True)
    colors = ["mediumblue", "dodgerblue", "slateblue"]
    if counts.index[0] == 'With bus card':
        colors = ["dodgerblue", "mediumblue", "slateblue"]
    plt.subplots(figsize=(30,15))
    plt.pie(counts, labels=None, colors = colors,
        autopct="%1.2f%%", textprops={"name":"sans", "family":"serif", "weight":"normal", 'color':"w", "size":"36"})
    plt.title(months[i],fontdict={'fontsize':40})
    plt.legend(bbox_to_anchor=(1, 0.65),prop={'size': 30}, labels=counts.index)
    plt.savefig('Memberships\\' + months[i]+'.pdf')
    plt.show()


### 3.4. The average length and duration of bike rides and number of violations within 4 months <a name="analysis4">
A violation occurs when the bike is unlocked for more than 5 hours straight.

In [None]:
print("The average length of rides:", round(routes_data["length"].mean(), 3), " km")
print("The average duration of rides:", round(routes_data["DurationMinutes"].mean(), 3), "(minutes)","\n")

print("Length of the longest ride:", round(routes_data["length"].max(), 3), " km")
print("Duration of the longest lasting ride:", round(routes_data["DurationMinutes"].max(), 3), "(minutes)") 

In [None]:
print("Number of violations:", routes_data["DurationMinutes"][routes_data["DurationMinutes"] > 300].count())

### 3.5.  Bike usage distribution throughout the day in 2-hour intervals<a name="analysis5"></a>


### 3.5.1 Workdays vs Weekend <a name="analysis51"></a>

In [None]:
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt

def minutesPerDay(tme):
    hours, minutes = tme.split(':')
    return (hours*60)+minutes

def checkTime(tme, tmeRange):
    return minutesPerDay(tmeRange[0]) < minutesPerDay(tme) < tmeRange[1]

def makePlot(performance, name, nr):
    objects = ('00-02', '02-04','04-06','06-08','08-10', '10-12','12-14', '14-16','16-18', '18-20','20-22','22-24')
    y_pos = np.arange(len(objects))
    color_l = ['blue', 'blue']

    plt.subplots(figsize=(10,7))
    plt.bar(y_pos, performance, align='center', alpha=0.5, color=color_l[nr])
    plt.xticks(y_pos, objects)
    plt.ylabel('Usage %')
    plt.title(name)
    plt.savefig('Bike usage\\' + name + '.pdf')
    plt.show()

weekdays = [["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"], ["Saturday", "Sunday"]]
names = ["Workdays", "Weekend"]
name_nr = 0
color_nr = 0

for e in weekdays:
    loe = loe1 = loe2 = loe3 = loe4 = loe5 = loe6 = loe7 = loe8 = loe9 = loe10 = loe11 = 0
    for i in range(22429):
        if data_routes_july.weekday.iloc[i] in e:
            vahe = data_routes_july.unlockedattime.iloc[i].split(":")
            vahe = vahe[0]+":"+vahe[1]
            if checkTime(vahe, ('00:00', '02:00')) == True:
                loe+=1
            if checkTime(vahe, ('02:00', '04:00')) == True:
                loe1+=1    
            if checkTime(vahe, ('04:00', '06:00')) == True:
                loe2+=1
            if checkTime(vahe, ('06:00', '08:00')) == True:
                loe3+=1
            if checkTime(vahe, ('08:00', '10:00')) == True:
                loe4+=1
            if checkTime(vahe, ('10:00', '12:00')) == True:
                loe5+=1
            if checkTime(vahe, ('12:00', '14:00')) == True:
                loe6+=1
            if checkTime(vahe, ('14:00', '16:00')) == True:
                loe7+=1
            if checkTime(vahe, ('16:00', '18:00')) == True:
                loe8+=1
            if checkTime(vahe, ('18:00', '20:00')) == True:
                loe9+=1
            if checkTime(vahe, ('20:00', '22:00')) == True:
                loe10+=1                
            if checkTime(vahe, ('22:00', '24:00')) == True:
                loe11+=1                
                
    for j in range(15356, 42506):
        if data_routes_august.weekday.iloc[j] == e:
            vahe = data_routes_august.unlockedattime.iloc[j].split(":")
            vahe = vahe[0]+":"+vahe[1]
            if checkTime(vahe, ('00:00', '02:00')) == True:
                loe+=1
            if checkTime(vahe, ('02:00', '04:00')) == True:
                loe1+=1    
            if checkTime(vahe, ('04:00', '06:00')) == True:
                loe2+=1
            if checkTime(vahe, ('06:00', '08:00')) == True:
                loe3+=1
            if checkTime(vahe, ('08:00', '10:00')) == True:
                loe4+=1
            if checkTime(vahe, ('10:00', '12:00')) == True:
                loe5+=1
            if checkTime(vahe, ('12:00', '14:00')) == True:
                loe6+=1
            if checkTime(vahe, ('14:00', '16:00')) == True:
                loe7+=1
            if checkTime(vahe, ('16:00', '18:00')) == True:
                loe8+=1
            if checkTime(vahe, ('18:00', '20:00')) == True:
                loe9+=1
            if checkTime(vahe, ('20:00', '22:00')) == True:
                loe10+=1                
            if checkTime(vahe, ('22:00', '24:00')) == True:
                loe11+=1    
                
    for m in range(4342,37439):
        if data_routes_september.weekday.iloc[m] == e:
            vahe = data_routes_september.unlockedattime.iloc[m].split(":")
            vahe = vahe[0]+":"+vahe[1]
            if checkTime(vahe, ('00:00', '02:00')) == True:
                loe+=1
            if checkTime(vahe, ('02:00', '04:00')) == True:
                loe1+=1    
            if checkTime(vahe, ('04:00', '06:00')) == True:
                loe2+=1
            if checkTime(vahe, ('06:00', '08:00')) == True:
                loe3+=1
            if checkTime(vahe, ('08:00', '10:00')) == True:
                loe4+=1
            if checkTime(vahe, ('10:00', '12:00')) == True:
                loe5+=1
            if checkTime(vahe, ('12:00', '14:00')) == True:
                loe6+=1
            if checkTime(vahe, ('14:00', '16:00')) == True:
                loe7+=1
            if checkTime(vahe, ('16:00', '18:00')) == True:
                loe8+=1
            if checkTime(vahe, ('18:00', '20:00')) == True:
                loe9+=1
            if checkTime(vahe, ('20:00', '22:00')) == True:
                loe10+=1                
            if checkTime(vahe, ('22:00', '24:00')) == True:
                loe11+=1    
                
    performance = [loe,loe1,loe2,loe3,loe4,loe5,loe6,loe7,loe8,loe9,loe10,loe11]
    summa = sum(performance)
    for i in range(len(performance)):
        performance[i] = performance[i]/summa*100
                  
    makePlot(performance, names[name_nr], color_nr)
    color_nr +=1
    name_nr +=1



### 3.5.2 Bike usage distribution on every weekday <a name="analysis52">

In [None]:
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt

def minutesPerDay(tme):
    hours, minutes = tme.split(':')
    return (hours*60)+minutes

def checkTime(tme, tmeRange):
    return minutesPerDay(tmeRange[0]) < minutesPerDay(tme) < tmeRange[1]

def makePlot(loe,loe1,loe2,loe3,loe4,loe5,loe6,loe7,loe8,loe9,loe10,loe11, day, nr):
    objects = ('00        ','02        ','04        ','06        ','08        ','10        ','12        ','14        ',
               '16        ','18        ','20        ','22        ')
    y_pos = np.arange(len(objects))
    performance = [loe,loe1,loe2,loe3,loe4,loe5,loe6,loe7,loe8,loe9,loe10,loe11]
    color_l = ['black', 'orange','yellow','blue', 'brown','green','red']

    plt.bar(y_pos, performance, align='center', alpha=0.5, color=color_l[nr])
    plt.xticks(y_pos, objects)
    plt.ylabel('Usage')
    plt.title(day)

    plt.show()

weekdays = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
color_nr = 0

for e in weekdays:
    loe = loe1 = loe2 = loe3 = loe4 = loe5 = loe6 = loe7 = loe8 = loe9 = loe10 = loe11 =0
    for i in range(22429):
        if data_routes_july.weekday.iloc[i] == e:
            vahe = data_routes_july.unlockedattime.iloc[i].split(":")
            vahe = vahe[0]+":"+vahe[1]
            if checkTime(vahe, ('00:00', '02:00')) == True:
                loe+=1
            if checkTime(vahe, ('02:00', '04:00')) == True:
                loe1+=1    
            if checkTime(vahe, ('04:00', '06:00')) == True:
                loe2+=1
            if checkTime(vahe, ('06:00', '08:00')) == True:
                loe3+=1
            if checkTime(vahe, ('08:00', '10:00')) == True:
                loe4+=1
            if checkTime(vahe, ('10:00', '12:00')) == True:
                loe5+=1
            if checkTime(vahe, ('12:00', '14:00')) == True:
                loe6+=1
            if checkTime(vahe, ('14:00', '16:00')) == True:
                loe7+=1
            if checkTime(vahe, ('16:00', '18:00')) == True:
                loe8+=1
            if checkTime(vahe, ('18:00', '20:00')) == True:
                loe9+=1
            if checkTime(vahe, ('20:00', '22:00')) == True:
                loe10+=1                
            if checkTime(vahe, ('22:00', '24:00')) == True:
                loe11+=1                
                
    for j in range(15356, 42506):
        if data_routes_august.weekday.iloc[j] == e:
            vahe = data_routes_august.unlockedattime.iloc[j].split(":")
            vahe = vahe[0]+":"+vahe[1]
            if checkTime(vahe, ('00:00', '02:00')) == True:
                loe+=1
            if checkTime(vahe, ('02:00', '04:00')) == True:
                loe1+=1    
            if checkTime(vahe, ('04:00', '06:00')) == True:
                loe2+=1
            if checkTime(vahe, ('06:00', '08:00')) == True:
                loe3+=1
            if checkTime(vahe, ('08:00', '10:00')) == True:
                loe4+=1
            if checkTime(vahe, ('10:00', '12:00')) == True:
                loe5+=1
            if checkTime(vahe, ('12:00', '14:00')) == True:
                loe6+=1
            if checkTime(vahe, ('14:00', '16:00')) == True:
                loe7+=1
            if checkTime(vahe, ('16:00', '18:00')) == True:
                loe8+=1
            if checkTime(vahe, ('18:00', '20:00')) == True:
                loe9+=1
            if checkTime(vahe, ('20:00', '22:00')) == True:
                loe10+=1                
            if checkTime(vahe, ('22:00', '24:00')) == True:
                loe11+=1    
                
    for m in range(4342,37439):
        if data_routes_september.weekday.iloc[m] == e:
            vahe = data_routes_september.unlockedattime.iloc[m].split(":")
            vahe = vahe[0]+":"+vahe[1]
            if checkTime(vahe, ('00:00', '02:00')) == True:
                loe+=1
            if checkTime(vahe, ('02:00', '04:00')) == True:
                loe1+=1    
            if checkTime(vahe, ('04:00', '06:00')) == True:
                loe2+=1
            if checkTime(vahe, ('06:00', '08:00')) == True:
                loe3+=1
            if checkTime(vahe, ('08:00', '10:00')) == True:
                loe4+=1
            if checkTime(vahe, ('10:00', '12:00')) == True:
                loe5+=1
            if checkTime(vahe, ('12:00', '14:00')) == True:
                loe6+=1
            if checkTime(vahe, ('14:00', '16:00')) == True:
                loe7+=1
            if checkTime(vahe, ('16:00', '18:00')) == True:
                loe8+=1
            if checkTime(vahe, ('18:00', '20:00')) == True:
                loe9+=1
            if checkTime(vahe, ('20:00', '22:00')) == True:
                loe10+=1                
            if checkTime(vahe, ('22:00', '24:00')) == True:
                loe11+=1    
                  
    makePlot(loe,loe1,loe2,loe3,loe4,loe5,loe6,loe7,loe8,loe9,loe10,loe11, e, color_nr)
    color_nr +=1



### 3.6. Heatmap: Length of rides based on weekday and starting station <a name="heatmap">

In [None]:
import seaborn as sns
weekdays = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
routes_data.weekday = pd.Categorical(routes_data.weekday,categories=weekdays)
routes_data = routes_data.sort_values('weekday')
crosstab_df = pd.crosstab(routes_data.weekday, routes_data.startstationname, values=routes_data.length, aggfunc='mean').round(2)
#crosstab_df
plt.subplots(figsize=(30,15))
sns.heatmap(crosstab_df, vmin=1.2, annot=True, square=True, linewidth=0.5, fmt='.1f', 
            cbar=True, cbar_kws={'fraction':0.006}, cmap='YlGnBu')
plt.title("Average lengths of bike rides based on weekday and starting station", {'fontsize':20}, pad = 13)
plt.xlabel("")
plt.ylabel("")
plt.savefig('ride_lengths.pdf')