6 MONTHS OF CYCLING DATA

As a cyclist aficionado, I have gathered bicycle data from two different sources: bicycle rides from my bicycles recorded on a bicycle speedometer and bicycle rides from bicing (Barcelona's bicycle sharing scheme) obtained from their website. My bicycle rides were made from 01/07/2019 to 31/12/2019. 

I recorded the data gathered from the bicycle speedometer every morning after a daily cycle. The information obtained includes: date, elapsed time, trip distance, average speed, maximum speed, calorie consumption, carbon offset, total distance. On the other hand, the data obtained from the bicing website includes: initial time, end time, duration, cost. 

The purpose to this project is to work with this data in order to find out information about my cycling habits during the 6 months period. 

I would like to find out: 
•	Estimate the carbon offset. trip distance, average speed, calories consumption and total distance of the bicing usage by using data from the bicycle speedometer.
•	Find out the bicing, speedometer and combined total distance
•	Which day / week / month of most / mean / least bicycle use?
•	Which bicycle was used the most?
•	 How much total carbon offset was contribute by using the bicycle? Which month was the highest contributor
•	How many total calories were burned? Does the weather impact this number?
FILES:
bici data.csv
bicing data.csv


In [111]:
# import libraries numpy, datetime and pandas
import numpy as np
import pandas as pd
import datetime as dt


In [112]:
# read CSV files 
speedometer = pd.read_csv(open('bici_data.csv'))
bike_sharing = pd.read_csv(open('bicing_data.csv'))

In [113]:
# check the first 5 rows to check if the files are correctly uploaded
print(speedometer.head())
print(bike_sharing.head())

     Fecha elapsed time  trip distance  average speed  maximun speed  \
0   6/8/19      1.50.13          34.57           18.8           38.8   
1   6/9/19      1.07.39          17.23           15.3           32.7   
2  6/10/19       .47.47          16.45           20.7           36.8   
3  6/11/19          NaN            NaN            NaN            NaN   
4  6/12/19      1.45.41          35.15           19.9           39.5   

   calorie consumption  carbon offset  total distance  
0                498.0           5.18            95.0  
1                200.0           2.58           112.0  
2                252.0           2.46           129.0  
3                  NaN            NaN             NaN  
4                533.0           5.27           164.0  
        Fecha   inicio       fin  Duracion Coste
0  28/12/2019  9:58:42  10:05:53  12:07:11     0
1  28/12/2019  6:38:35   6:41:57  12:03:22  0:35
2  28/12/2019  4:11:11   4:14:35  12:03:24  0:35
3  28/12/2019  2:24:55   2:44:15  1

In [114]:
#code to change titles
# for speedometer = [ 'date', 'elapsed_time', 'trip_distance', 'average_speed', 'maximum_speed', 'calorie_consumption', 
#'carbon_offset', 'total_distance']
# for bike_sharing = ['date', initial_time', 'end_time', 'duration', 'cost']

In [115]:
print(speedometer.columns)
print(bike_sharing.columns)

Index(['Fecha', 'elapsed time', 'trip distance', 'average speed',
       'maximun speed', 'calorie consumption', 'carbon offset',
       'total distance'],
      dtype='object')
Index(['Fecha', 'inicio', 'fin', 'Duracion', 'Coste'], dtype='object')


In [158]:
speedometer.columns = ['date', 'elapsed_time', 'trip_distance', 'average_speed', 'maximum_speed', 'calorie_consumption', 
'carbon_offset', 'total_distance']
bike_sharing.columns = ['date','initial_time','end_time', 'elapsed_time', 'cost']

In [159]:
print(speedometer.columns)
print(bike_sharing.columns)

Index(['date', 'elapsed_time', 'trip_distance', 'average_speed',
       'maximum_speed', 'calorie_consumption', 'carbon_offset',
       'total_distance'],
      dtype='object')
Index(['date', 'initial_time', 'end_time', 'elapsed_time', 'cost'], dtype='object')


In [120]:
# for speedometer: drop nan values

In [121]:
speedometer = speedometer.dropna()

In [122]:
print(speedometer.head())

      date elapsed_time  trip_distance  average_speed  maximum_speed  \
0   6/8/19      1.50.13          34.57           18.8           38.8   
1   6/9/19      1.07.39          17.23           15.3           32.7   
2  6/10/19       .47.47          16.45           20.7           36.8   
4  6/12/19      1.45.41          35.15           19.9           39.5   
5  6/13/19      1.55.31          35.05           18.2           40.3   

   calorie_consumption  carbon_offset  total_distance  
0                498.0           5.18            95.0  
1                200.0           2.58           112.0  
2                252.0           2.46           129.0  
4                533.0           5.27           164.0  
5                489.0           5.25           199.0  


In [128]:
# for speedometer: replace in 'elapsed_time' the . to :


In [153]:
speedometer['elapsed_time'] = speedometer['elapsed_time'].str.replace('.',':')

In [154]:
print(speedometer.head())

      date elapsed_time  trip_distance  average_speed  maximum_speed  \
0   6/8/19      1:50:13          34.57           18.8           38.8   
1   6/9/19      1:07:39          17.23           15.3           32.7   
2  6/10/19       :47:47          16.45           20.7           36.8   
4  6/12/19      1:45:41          35.15           19.9           39.5   
5  6/13/19      1:55:31          35.05           18.2           40.3   

   calorie_consumption  carbon_offset  total_distance  
0                498.0           5.18            95.0  
1                200.0           2.58           112.0  
2                252.0           2.46           129.0  
4                533.0           5.27           164.0  
5                489.0           5.25           199.0  


In [157]:
# for bike_sharing eliminate 12: in the duration column 


In [161]:
bike_sharing['elapsed_time'] = bike_sharing['elapsed_time'].str.replace('12:','')

In [162]:
print(bike_sharing.head())

         date initial_time  end_time elapsed_time  cost
0  28/12/2019      9:58:42  10:05:53        07:11     0
1  28/12/2019      6:38:35   6:41:57        03:22  0:35
2  28/12/2019      4:11:11   4:14:35        03:24  0:35
3  28/12/2019      2:24:55   2:44:15        19:20     0
4  28/12/2019      1:58:12   2:08:08        09:56  0:35


In [163]:
print(speedometer.head())

      date elapsed_time  trip_distance  average_speed  maximum_speed  \
0   6/8/19      1:50:13          34.57           18.8           38.8   
1   6/9/19      1:07:39          17.23           15.3           32.7   
2  6/10/19       :47:47          16.45           20.7           36.8   
4  6/12/19      1:45:41          35.15           19.9           39.5   
5  6/13/19      1:55:31          35.05           18.2           40.3   

   calorie_consumption  carbon_offset  total_distance  
0                498.0           5.18            95.0  
1                200.0           2.58           112.0  
2                252.0           2.46           129.0  
4                533.0           5.27           164.0  
5                489.0           5.25           199.0  


In [None]:
# for speedometer: create a function that gives the total sum, max, min, average per day and month for elapsed_time,
#trip_distance, average speed, maximum speed, calorie consumption, carbon offset, total distance

In [176]:
dt_object = dt.datetime(speedometer['elapsed_time'])
dt_string = dt_object.strftime("%H:%M:%S")
dt_object.strftime("%d/%m/%Y")
print(dt_string)
    


#total_elapsed_time = dt.time.speedometer['elapsed_time'].sum()
max_elapsed_time = speedometer['elapsed_time'].max()
min_elapsed_time = speedometer['elapsed_time'].min()


#print(total_elapsed_time)
print(max_elapsed_time)
print(min_elapsed_time)


TypeError: cannot convert the series to <class 'int'>

In [None]:
# for bike_sharing: create a function that populates new columns [elapsed_time, trip_distance, average speed, 
#maximum speed, calorie consumption, carbon offset, total distance] based on speedometer averages 

In [None]:
# for bike_sharing: create a function that gives the total sum, max, min, average per day and month for elapsed_time,
#trip_distance, average speed, maximum speed, calorie consumption, carbon offset, total distance

In [None]:
# for speedometer and bike_sharing: select graph(s) 

I am going to draw up conclusions on the totals from the different data set. 

In [3]:
# combine speedometer and bike_sharing into a combined file
# combined = ['date', 'elapse time', 'average speed', 'calorie_consumption','carbon_offset', total_distance]

In [None]:
# for combined: calculate general calculations including totals, mean, average

In [None]:
# fo combined: determine which graph(s) suits better

conclusion