In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
plt.style.use('ggplot')
import datetime

# Data Load

In [None]:
data = pd.read_csv("../input/london-bike-sharing-dataset/london_merged.csv")
data.head()

In [None]:
data.info()

Data has 10 characteristics and a total of 17414 rows.

# Check NA

In [None]:
data.isnull().sum()

It can be seen that the data is somewhat refined without a null.

# EDA

Let's first look at the relationship between each column through the scatter plot.  <br>
This is a very efficient plot method because it helps to guess the relationship between columns. <br>
Plot is drawn through random sampling because there are many lines to draw for all rows.  <br>

In [None]:
data_sample = data.sample(1000)

p = sns.PairGrid(data=data_sample, vars=['t1', 't2', 'hum', 'wind_speed', 'weather_code', 'is_holiday', 'is_weekend','season', 'cnt'])
p.map_diag(plt.hist)
p.map_offdiag(plt.scatter)

You can tell a few things by looking at the data. <br> 
First, t1 and t2 have distinct linearity. This is a common sense because it is temperature and feeling temperature.
Second, wind_speed and t2 seem to have some kind of relationship. 
Because I am a categorical variable for my variables, let's draw a different plot and check. <br>

I think we should build and draw many hypotheses with imagination and guesswork at the stage of exploratory data analysis. <br>

In [None]:
data['timestamp'] = data['timestamp'] .apply(lambda x :datetime.datetime.strptime(x,'%Y-%m-%d %H:%M:%S'))
data['month'] = data['timestamp'].apply(lambda x : str(x).split(' ')[0].split('-')[1])
data['day'] = data['timestamp'].apply(lambda x : str(x).split(' ')[0].split('-')[2])
data['hour'] = data['timestamp'].apply(lambda x : str(x).split(' ')[1].split(':')[0])

I'll split the time and do some searches. <br>
If you ride a bicycle, you will see different results if you go to work or go on a picnic on the weekend, or if you take time to see the details. <br>

In [None]:
figure, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2)
figure.set_size_inches(12, 8)

sns.boxplot(data=data, y='cnt', ax=ax1)
sns.boxplot(data=data, x='month', y='cnt', ax=ax2)
sns.boxplot(data=data, x='hour', y='cnt', ax=ax3)
sns.boxplot(data=data, x='day', y='cnt', ax=ax4)


* First, if you look at the monthly rental volume, the first quarter of the third quarter shows the largest amount of loans divided into the fourth quarter of a year. <br>
This can be inferred from the best weather in July. <br>
* Second, if you look at the time zone, you'll notice that the loans are high between 8 and 17-18, which would be very relevant to commuting time. <br>
* Third, there is usually a beginning or end-of-month effect, which does not seem to be the case. <br>

In [None]:
fig,(ax1, ax2, ax3, ax4, ax5)= plt.subplots(nrows=5)
fig.set_size_inches(18,25)

sns.pointplot(data=data, x='hour', y='cnt', ax=ax1)
sns.pointplot(data=data, x='hour', y='cnt', hue='is_holiday', ax=ax2)
sns.pointplot(data=data, x='hour', y='cnt', hue='is_weekend', ax=ax3)
sns.pointplot(data=data, x='hour', y='cnt', hue='season', ax=ax4)
sns.pointplot(data=data, x='hour', y='cnt', hue='weather_code',ax=ax5)

It is a very interesting result. <br>
* First, when it comes to holidays, commuting time is clearly visible when it's not a holiday, while the most people use it for lunch when it's a holiday. This shows a similar trend over the weekend. <br>
* Second, if you look at the season, it looks like the weather is good summer - fall - spring - winter. <br>
* Thirdly, by weather, the more loans are made for the better weather, and when it rains, there are few. <br>
However, unlike rain, when it snows, there are people who rent during rush hour.

# Conclusion

This is a simple data search. <br>
As a result of dividing the data into various segments, we can see that people's bicycle usage rate is different depending on time zone, weekend, holiday, etc. <br>
Based on these results, the machine learning algorithm for bicycle usage could be used to solve the problem of prediction.  <br>
However, I think it's good to try to create variables derived from deeper data exploration before using these algorithms. <br>
If you have time, I'll try to make and model some variables by analyzing the data a little bit.<br>
**Whenever there is a part of feedback, please.** 