# Overview of the Dataset
This analysis is centered around the 'London Bike Sharing Dataset' sourced from three distinct platforms: TfL's cycling data, freemeteo.com for weather data, and the UK government's bank holiday information. Spanning from January 1, 2015, to December 31, 2016, the dataset integrates various elements such as time stamps, bike share counts, weather conditions, and temporal factors like holidays and weekends. It provides a comprehensive view of bike-sharing patterns in London, under different meteorological and temporal conditions.

# Objective
The primary goal of this analysis is to predict future bike share trends in London. By delving into historical data, the project aims to uncover underlying patterns and correlations that influence bike sharing habits. This insight could be instrumental for urban planners, bike-sharing companies, and policy makers in optimizing and promoting sustainable urban transportation.

# Methodology
The methodology employed in this analysis is rigorous and multi-faceted:

## Data Cleaning: 
Initial steps involved handling duplicates and missing values to ensure data integrity.
## Data Transformation: 
Key transformations included normalizing humidity values and renaming columns for better clarity. Additionally, categorical data like weather conditions and seasons were mapped to meaningful labels.
## Exploratory Data Analysis (EDA): 
The EDA phase focused on understanding variable distributions, unique value counts.

# Key Insights and Findings
Several significant trends and patterns emerged from the analysis:
There's a noticeable fluctuation in bike shares relative to weather conditions and seasons, indicating a strong correlation between meteorological factors and biking habits.
Weekdays and weekends show distinct biking patterns, possibly influenced by commuting routines and leisure activities.
The absence of significant outliers in the dataset suggests a consistent and reliable recording of bike sharing data.

# Visualizations
The project leverages  Tableau for visualizations:
Advanced visualizations in Tableau offered a more dynamic and interactive representation of bike-sharing trends over time and across different conditions.
While predictive modeling wasn't the focus of this initial analysis, it's identified as a key area for future exploration.
## Link to tableau public dashbord
https://public.tableau.com/views/LondonBikes_17052555645380/LondonBikeRides?:language=en-US&:display_count=n&:origin=viz_share_link

# Code

# Data analysis with python (Pandas)

## Initial Setup and Data Loading

In [1]:
# import the pandas library
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


/kaggle/input/london_merged.csv


In [2]:
# read the csv file as a pandas dataframe
bikes = pd.read_csv("/kaggle/input/london_merged.csv")
bikes

Unnamed: 0,timestamp,cnt,t1,t2,hum,wind_speed,weather_code,is_holiday,is_weekend,season
0,2015-01-04 00:00:00,182,3.0,2.0,93.0,6.0,3.0,0.0,1.0,3.0
1,2015-01-04 01:00:00,138,3.0,2.5,93.0,5.0,1.0,0.0,1.0,3.0
2,2015-01-04 02:00:00,134,2.5,2.5,96.5,0.0,1.0,0.0,1.0,3.0
3,2015-01-04 03:00:00,72,2.0,2.0,100.0,0.0,1.0,0.0,1.0,3.0
4,2015-01-04 04:00:00,47,2.0,0.0,93.0,6.5,1.0,0.0,1.0,3.0
...,...,...,...,...,...,...,...,...,...,...
17409,2017-01-03 19:00:00,1042,5.0,1.0,81.0,19.0,3.0,0.0,0.0,3.0
17410,2017-01-03 20:00:00,541,5.0,1.0,81.0,21.0,4.0,0.0,0.0,3.0
17411,2017-01-03 21:00:00,337,5.5,1.5,78.5,24.0,4.0,0.0,0.0,3.0
17412,2017-01-03 22:00:00,224,5.5,1.5,76.0,23.0,4.0,0.0,0.0,3.0


## Data Exploration

In [3]:
bikes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17414 entries, 0 to 17413
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   timestamp     17414 non-null  object 
 1   cnt           17414 non-null  int64  
 2   t1            17414 non-null  float64
 3   t2            17414 non-null  float64
 4   hum           17414 non-null  float64
 5   wind_speed    17414 non-null  float64
 6   weather_code  17414 non-null  float64
 7   is_holiday    17414 non-null  float64
 8   is_weekend    17414 non-null  float64
 9   season        17414 non-null  float64
dtypes: float64(8), int64(1), object(1)
memory usage: 1.3+ MB


In [4]:
bikes.shape

(17414, 10)

## Data Cleaning

In [5]:
# drop duplicates values
bikes.drop_duplicates(inplace = True)

In [6]:
# drop duplicates values
bikes.dropna(inplace = True)

In [7]:
# There is no duplicates or null values
bikes

Unnamed: 0,timestamp,cnt,t1,t2,hum,wind_speed,weather_code,is_holiday,is_weekend,season
0,2015-01-04 00:00:00,182,3.0,2.0,93.0,6.0,3.0,0.0,1.0,3.0
1,2015-01-04 01:00:00,138,3.0,2.5,93.0,5.0,1.0,0.0,1.0,3.0
2,2015-01-04 02:00:00,134,2.5,2.5,96.5,0.0,1.0,0.0,1.0,3.0
3,2015-01-04 03:00:00,72,2.0,2.0,100.0,0.0,1.0,0.0,1.0,3.0
4,2015-01-04 04:00:00,47,2.0,0.0,93.0,6.5,1.0,0.0,1.0,3.0
...,...,...,...,...,...,...,...,...,...,...
17409,2017-01-03 19:00:00,1042,5.0,1.0,81.0,19.0,3.0,0.0,0.0,3.0
17410,2017-01-03 20:00:00,541,5.0,1.0,81.0,21.0,4.0,0.0,0.0,3.0
17411,2017-01-03 21:00:00,337,5.5,1.5,78.5,24.0,4.0,0.0,0.0,3.0
17412,2017-01-03 22:00:00,224,5.5,1.5,76.0,23.0,4.0,0.0,0.0,3.0


In [8]:
# count the unique values in the weather_code column
bikes.weather_code.value_counts()

weather_code
1.0     6150
2.0     4034
3.0     3551
7.0     2141
4.0     1464
26.0      60
10.0      14
Name: count, dtype: int64

In [9]:
# count the unique values in the season column
bikes.season.value_counts()

season
0.0    4394
1.0    4387
3.0    4330
2.0    4303
Name: count, dtype: int64

In [10]:
# changing the humidity values to percentage (i.e. a value between 0 and 1)
bikes.hum = bikes.hum / 100

## Data Transformation

In [11]:
# Renameing column names to make them more informative
new_cols_dict ={
    'timestamp':'time',
    'cnt':'count_bike_share', 
    't1':'temp_C',
    't2':'temp_feels_like_C',
    'hum':'humidity',
    'wind_speed':'wind_speed_kph',
    'weather_code':'weather_category',
    'is_holiday':'is_holiday',
    'is_weekend':'is_weekend',
    'season':'season'
}

# Renaming the columns to the specified column names
bikes.rename(new_cols_dict, axis=1, inplace=True)

In [12]:
# creating a season dictionary so that we can map the integers 0-3 to the actual written values
season_dict = {
    0.0:'spring',
    1.0:'summer',
    2.0:'autumn',
    3.0:'winter'
}

# creating a weather dictionary so that we can map the integers to the actual written values
weather_category_dict = {
    1.0:'Clear',
    2.0:'Scattered clouds',
    3.0:'Broken clouds',
    4.0:'Cloudy',
    7.0:'Rain',
    10.0:'Rain with thunderstorm',
    26.0:'Snowfall'
}


# mapping the values 0-3 to the seasons
bikes.season = bikes.season.map(season_dict)

# mapping the values to weather category
bikes.weather_category = bikes.weather_category.map(weather_category_dict)

In [13]:
# checking our dataframe to see if the mappings have worked
bikes.head()

Unnamed: 0,time,count_bike_share,temp_C,temp_feels_like_C,humidity,wind_speed_kph,weather_category,is_holiday,is_weekend,season
0,2015-01-04 00:00:00,182,3.0,2.0,0.93,6.0,Broken clouds,0.0,1.0,winter
1,2015-01-04 01:00:00,138,3.0,2.5,0.93,5.0,Clear,0.0,1.0,winter
2,2015-01-04 02:00:00,134,2.5,2.5,0.965,0.0,Clear,0.0,1.0,winter
3,2015-01-04 03:00:00,72,2.0,2.0,1.0,0.0,Clear,0.0,1.0,winter
4,2015-01-04 04:00:00,47,2.0,0.0,0.93,6.5,Clear,0.0,1.0,winter


## Data Export

In [14]:
# writing the final dataframe to an excel file that we will use in our Tableau visualisations. The file will be the 'london_bikes_modified.xlsx' file and the sheet name is 'bike_data'
bikes.to_excel('london_bikes_modified.xlsx', sheet_name='bike_data')

# Creating a dashboard using Tableau

In [15]:
%%HTML
<div class='tableauPlaceholder' id='viz1705308157480' style='position: relative'><noscript><a href='#'><img alt='London Bike Rides ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Lo&#47;LondonBikes_17052555645380&#47;LondonBikeRides&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='LondonBikes_17052555645380&#47;LondonBikeRides' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Lo&#47;LondonBikes_17052555645380&#47;LondonBikeRides&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='en-US' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1705308157480');                    var vizElement = divElement.getElementsByTagName('object')[0];                    if ( divElement.offsetWidth > 800 ) { vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';} else if ( divElement.offsetWidth > 500 ) { vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';} else { vizElement.style.width='100%';vizElement.style.height='1177px';}                     var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

# Conclusion:
The analysis sheds light on the dynamic nature of bike-sharing in London, influenced by an array of factors like weather, time, and holidays. These insights underscore the potential of data-driven approaches in urban mobility planning. Future work could include predictive modeling to forecast bike share demands, further analysis of external factors like public events, and a comparative study across different cities or time frames to generalize the findings.