___

<p style="text-align: center;"><img src="https://docs.google.com/uc?id=1lY0Uj5R04yMY3-ZppPWxqCr5pvBLYPnV" class="img-fluid" alt="CLRSWY"></p>

___

# WELCOME!

Welcome to **"Bike Demand Visualization Project"** which is the capstone project of Data Visualization Lessons . As you know recently, free or affordable access to bicycles has been provided for short-distance trips in an urban area as an alternative to motorized public transport or private vehicles. Thus, it is aimed to reduce traffic congestion, noise and air pollution.  

The aim of this project is to reveal the current patterns in the data by showing the historical data of London bike shares with visualization tools.

This will allow us to X-ray the data as part of the EDA process before setting up a machine learning model.



---
---

# #Determines



#### Features

- timestamp - timestamp field for grouping the data
- cnt - the count of a new bike shares
- t1 - real temperature in C
- t2 - temperature in C “feels like”
- hum - humidity in percentage
- wind_speed - wind speed in km/h
- weather_code - category of the weather
- is_holiday - boolean field - 1 holiday / 0 non holiday
- is_weekend - boolean field - 1 if the day is weekend
- season - category field meteorological seasons: 0-spring ; 1-summer; 2-fall; 3-winter.

**"weather_code" category description:**
* 1 = Clear ; mostly clear but have some values with haze/fog/patches of fog/ fog in vicinity 
* 2 = scattered clouds / few clouds 
* 3 = Broken clouds 
* 4 = Cloudy 
* 7 = Rain/ light Rain shower/ Light rain 
* 10 = rain with thunderstorm 
* 26 = snowfall 
* 94 = Freezing Fog
---

Initially, the task of discovering data will be waiting for you as always. Recognize features, detect missing values, outliers etc.  Review the data from various angles in different time breakdowns. For example, visualize the distribution of bike shares by day of the week. With this graph, you will be able to easily observe and make inferences how people's behavior changes daily. Likewise, you can make hourly, monthly, seasonally etc. analyzes. In addition, you can analyze correlation of variables with a heatmap.  


# Tasks


### 1.Import libraries

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from pandas.plotting import register_matplotlib_converters
from pylab import rcParams
sns.set_style("darkgrid")
%matplotlib inline

import warnings
warnings.filterwarnings("ignore")

### 2.Read dataset

In [None]:
df = pd.read_csv("../input/london-bike-sharing-dataset/london_merged.csv")
df.head()

In [None]:
df.info()

### 3.Check missing values and if there are any dublicate rows or not.

In [None]:
df.duplicated().value_counts()

In [None]:
df.isnull().sum()

### 4.Plot the distribution of various discrete features on (Season, haliday, weekend and weathercode)

In [None]:
# season
sns.countplot("season", data=df);


In [None]:
# is_holiday
sns.countplot("is_holiday", data=df);

In [None]:
# is_weekend
ax = sns.countplot("is_weekend", data=df);
for p in ax.patches:
    ax.annotate((p.get_height()), (p.get_x()+0.2, p.get_height()+20));

In [None]:
# weather_code
ax = sns.countplot("weather_code", data=df);

for p in ax.patches:
    ax.annotate( str(p.get_height()),  (p.get_x()+0.2, p.get_height()+20) );

In [None]:
df.season.value_counts()

In [None]:
sns.countplot(data = df, x = "weather_code");

code = df.weather_code.value_counts().sort_index()
for index, value in enumerate(code):
    plt.text(index, value, str(value), ha="center", va="bottom")

**"weather_code" category description:**
* 1 = Clear ; mostly clear but have some values with haze/fog/patches of fog/ fog in vicinity 
* 2 = scattered clouds / few clouds 
* 3 = Broken clouds 
* 4 = Cloudy 
* 7 = Rain/ light Rain shower/ Light rain 
* 10 = rain with thunderstorm 
* 26 = snowfall 
* 94 = Freezing Fog

### 5.Look at the data type of each variable, transform `timestamp` in type, and set it as index.

In [None]:
df.info()

In [None]:
df.timestamp = pd.to_datetime(df.timestamp)
df.set_index("timestamp",inplace=True)
df.head()

- Tam 2 yillik bir timestamp var. `year, month, day_of_month, day_of_week, hour` bilgilerini alarak feature engineering yapalim

### 6.Make feature engineering. Extract new columns (day of the week, day of the month, hour, month, season, year etc.)

you can use strftime('%Y-%m') for year_mount

In [None]:
# df["year"] =  df.index.year
df["month"] =  df.index.month
df["day_of_month"] =  df.index.day
df["day_of_week"] =  df.index.dayofweek
df["hour"] = df.index.hour

df["year-month"] = df.index.strftime("%Y-%m")
df.head()

In [None]:
df.head()

### 7.Visualize the correlation with a heatmap

In [None]:
# for all variables 
plt.figure(figsize=(20,10))
sns.heatmap(df.corr(), cmap="coolwarm", annot=True)

In [None]:
df.corr()[["cnt"]]

In [None]:
# for the target variable
plt.figure(figsize=(2,6))
sns.heatmap(df.corr()[["cnt"]].sort_values(by="cnt", ascending=False), annot=True, cmap='BrBG', vmin=-1)

### 8. Visualize the correlation of the target variable and the other features with barplot

In [None]:
df.corr()[["cnt"]].sort_values(by="cnt", ascending=False)

In [None]:
#with pandas
plt.figure(figsize=(10,6))

df.corr()[["cnt"]].sort_values(by="cnt").plot.barh()

In [None]:
df.corr()["cnt"].sort_values().plot.barh();

### 9.Plot bike shares over time use lineplot

In [None]:
plt.figure(figsize=(20,6))
sns.lineplot(x=df.index, y="cnt", data=df)

### 10.Plot bike shares by months and year_of_month (use lineplot, pointplot, barplot)

In [None]:
df.info()

In [None]:
df_sum = pd.DataFrame(df.groupby("year-month").cnt.sum())
df_sum

In [None]:
# with lineplot
plt.figure(figsize=(20,5))
sns.lineplot(x="year-month",y="cnt", data = df_sum)
plt.xticks(rotation=90);

In [None]:
# alternative solutions
plt.figure(figsize=(17,4))
sns.lineplot(data = df, x = "year-month", y = "cnt", ci = None, estimator = sum)
plt.xticks(rotation = 90)
plt.show()

In [None]:
# with lineplot
plt.figure(figsize=(15,5))
sns.lineplot(x= "month", y="cnt", data=df)

In [None]:
# with pointplot
plt.figure(figsize=(15,5))
sns.pointplot(x= "month", y="cnt", data=df, ci=100)

In [None]:
# with barplot
plt.figure(figsize=(15,5))
sns.barplot(x= "month", y="cnt", data=df, ci=95)

### 11.Plot bike shares by hours on (holidays, weekend, season)

In [None]:
# with lineplot (*whether it is a holiday or not*)
plt.figure(figsize=(15,4))
sns.lineplot(x= "hour", y="cnt", data=df, hue="is_holiday")

In [None]:
# with lineplot (*You may want to see seasonal breakdowns*)
plt.figure(figsize=(15,4))
sns.lineplot(x= "hour", y="cnt", data=df, hue="season")

 season - category field meteorological seasons: 0-spring ; 1-summer; 2-fall; 3-winter.

In [None]:
# with pointplot
fig, ax = plt.subplots(ncols=1, nrows=4, figsize = (18,15))

sns.pointplot(data=df, x="hour", y="cnt", ax = ax[0])
sns.pointplot(data=df, x="hour", y="cnt", ax = ax[1], hue="is_holiday")
sns.pointplot(data=df, x="hour", y="cnt", ax = ax[2], hue="is_weekend")
sns.pointplot(data=df, x="hour", y="cnt", ax = ax[3], hue="season");

In [None]:
# alternative solution
plt.figure(figsize=(20,20))
plt.subplot(4,1,1)
sns.pointplot(df.hour, df.cnt);
plt.subplot(4,1,2)
sns.pointplot(df.hour, df.cnt, hue = df.is_holiday);
plt.subplot(4,1,3)
sns.pointplot(df.hour, df.cnt, hue = df.is_weekend);
plt.subplot(4,1,4)
sns.pointplot(df.hour, df.cnt, hue = df.season);

### 12.Plot bike shares by day of week
- You may want to see whether it is a holiday or not

In [None]:
sns.barplot(x="day_of_week", y="cnt", data=df);

In [None]:
# with barplot
sns.barplot(x="day_of_week", y="cnt", data=df, hue="is_weekend")

In [None]:
# with pointplot
fig , ax = plt.subplots(2,1, figsize=(18,9))

sns.pointplot(data=df, x="day_of_week", y="cnt", ax=ax[0])
sns.pointplot(data=df, x="day_of_week", y="cnt", ax=ax[1], hue="season")


### 13.Plot bike shares by day of month

In [None]:
#with lineplot
plt.figure(figsize=(15, 5))
sns.lineplot(data=df, x="day_of_month", y="cnt")

pd.DataFrame(df.groupby("day_of_month").cnt.mean().astype("int")).T

In [None]:
#with lineplot
df_cnt =pd.DataFrame(df.groupby("day_of_month").cnt.mean().astype("int"))

plt.figure(figsize=(15, 5))
sns.lineplot(data=df_cnt, x="day_of_month", y="cnt")


### 14.Plot bike shares by year
### Plot bike shares on holidays by seasons

In [None]:
len(df)

In [None]:
# with barplot
# with barplot
plt.figure(figsize=(10,6))
sns.barplot(data=df[df["is_holiday"]==1], x="season",y="cnt")

### 15.Visualize the distribution of bike shares by weekday/weekend with piechart and barplot

In [None]:
df.is_weekend.value_counts()

In [None]:
# pie chart
fig, ax = plt.subplots(figsize=(6,6))

ax.pie(df.is_weekend.value_counts(),
      labels=["weekday","weekend"],
       labeldistance=0.4,
        autopct="%.1f%%"
      );

In [None]:
# alternative solution
labels = ["weekday","weekend"]
df.is_weekend.value_counts().plot(kind = 'pie',labels= labels,autopct='%1.2f%%')

In [None]:
bölüm = df.is_weekend.value_counts().sum()/100
plt.figure(figsize=(7,6))
sns.countplot(data = df, x = "is_weekend");
for index,value in enumerate(df.is_weekend.value_counts()):
    plt.text(index, value, f"%{value/bölüm:.3}-{value}", ha="center", va="bottom")

In [None]:
ig,ax = plt.subplots(figsize=(8,6))
sns.countplot(data=df,x="is_weekend",ax=ax)
for p in ax.patches:
    ax.annotate((p.get_height()), (p.get_x()+0.45, p.get_height()+1))
    ax.annotate("%"+str(round((p.get_height()/(df.is_weekend.count()))*100,2)), (p.get_x()+0.25, p.get_height()+10));

In [None]:
fig, ax = plt.subplots(figsize=(7,5))
ax = sns.countplot(x='is_weekend',data=df)

for bar in ax.patches: 
        ax.annotate(  "%"+str(round(100*bar.get_height()/len(df),1)) +' - '+ str(bar.get_height() ), 
                    (bar.get_x()+0.2 , bar.get_height()+10) ,
                    size=12)

### 16.Plot the distribution of weather code by seasons

In [None]:
# with countplot
sns.countplot(x="weather_code", data=df)

In [None]:
# with catplot
sns.catplot(x="weather_code", data=df, col="season", kind="count")

### 17.Visulaize all the continuous variables with histogram and scatterplot

In [None]:
fig,axes=plt.subplots(2,2, figsize=(10,10))


axes[0,0].hist(x="t1",data=df,edgecolor="black",linewidth=2,color='#ff5500')
axes[0,0].set_title(" t1")

axes[0,1].hist(x="t2",data=df,edgecolor="black",linewidth=2,color='#00bbff')
axes[0,1].set_title("t2")

axes[1,0].hist(x="wind_speed",data=df,edgecolor="black",linewidth=2,color='#00aa55')
axes[1,0].set_title(" windspeed")

axes[1,1].hist(x="hum",data=df,edgecolor="black",linewidth=2,color='#ffffff')
axes[1,1].set_title("humidity")


In [None]:
fig, ax = plt.subplots(nrows=2, ncols=2, figsize = (10,11))
sns.histplot(data=df, x ="t1", ax = ax[0][0], bins = 10, stat = "count", color = "orangered", edgecolor = "black", linewidth = 2).set_title("t1", fontsize = 13)
sns.histplot(data=df, x ="t2", ax = ax[0][1], bins = 10, stat = "count", color = "deepskyblue", edgecolor = "black", linewidth = 2).set_title("t2", fontsize = 13)
sns.histplot(data=df, x ="wind_speed", ax = ax[1][0], bins = 10, stat = "count", color = "mediumseagreen", edgecolor = "black", linewidth = 2).set_title("wind_speed", fontsize = 13)
sns.histplot(data=df, x ="hum", ax = ax[1][1], bins = 10, stat = "count",  color = "white", edgecolor = "black", linewidth = 2).set_title("humidity", fontsize = 13)
plt.show()

In [None]:
plt.figure(figsize=(15,10))

plt.subplot(211)
plt.title("t1 distribution According to humudity")

sns.scatterplot(x="t1",y="hum",data=df, hue='season',palette="coolwarm")

plt.subplot(212)
plt.title("t1 distribution According to wind speed")
sns.scatterplot(x="t1",y="wind_speed",data=df, hue='season',palette="coolwarm");