# 3D Plotting with Matplotlib
## Milestone 1: Getting Your Data Ready

### Objective
In this milestone, you will read the SeoulBikeData.csv file from the UCI ML repository into a pandas
DataFrame and extract the relevant data columns that will be used for creating 3D plots in subsequent milestones.

### Importance of the Project
For any data analysis, data visualization, data science, or machine learning project, data is the most significant ingredient. Identifying and extracting the relevant data from an entire dataset is important for discovery of useful insights about the question you wish to answer through data analysis. This milestone will help us extract the data that we will use for plotting.

### 1. Import libraries

In [21]:
import pandas as pd
import matplotlib.pyplot as plt

### 2. Read data from the csv file into a pandas DataFrame

In [2]:
data_loc = "./data/"

In [3]:
data = pd.read_csv(data_loc+"SeoulBikeData.csv",  
                    encoding = 'unicode_escape',   
                    parse_dates=['Date'],
                    date_format = "%d/%m/%Y"
                  )

### 3. Do preliminary data exploration on your DataFrame.

#### See the number of rows and columns in your data.

In [4]:
data.shape

(8760, 14)

In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8760 entries, 0 to 8759
Data columns (total 14 columns):
 #   Column                     Non-Null Count  Dtype         
---  ------                     --------------  -----         
 0   Date                       8760 non-null   datetime64[ns]
 1   Rented Bike Count          8760 non-null   int64         
 2   Hour                       8760 non-null   int64         
 3   Temperature(°C)            8760 non-null   float64       
 4   Humidity(%)                8760 non-null   int64         
 5   Wind speed (m/s)           8760 non-null   float64       
 6   Visibility (10m)           8760 non-null   int64         
 7   Dew point temperature(°C)  8760 non-null   float64       
 8   Solar Radiation (MJ/m2)    8760 non-null   float64       
 9   Rainfall(mm)               8760 non-null   float64       
 10  Snowfall (cm)              8760 non-null   float64       
 11  Seasons                    8760 non-null   object        
 12  Holida

#### See the column names.

In [6]:
data.columns

Index(['Date', 'Rented Bike Count', 'Hour', 'Temperature(°C)', 'Humidity(%)',
       'Wind speed (m/s)', 'Visibility (10m)', 'Dew point temperature(°C)',
       'Solar Radiation (MJ/m2)', 'Rainfall(mm)', 'Snowfall (cm)', 'Seasons',
       'Holiday', 'Functioning Day'],
      dtype='object')

#### See the contents of your DataFrame for at least the first 5 rows

In [7]:
data.head(5)

Unnamed: 0,Date,Rented Bike Count,Hour,Temperature(°C),Humidity(%),Wind speed (m/s),Visibility (10m),Dew point temperature(°C),Solar Radiation (MJ/m2),Rainfall(mm),Snowfall (cm),Seasons,Holiday,Functioning Day
0,2017-12-01,254,0,-5.2,37,2.2,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
1,2017-12-01,204,1,-5.5,38,0.8,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
2,2017-12-01,173,2,-6.0,39,1.0,2000,-17.7,0.0,0.0,0.0,Winter,No Holiday,Yes
3,2017-12-01,107,3,-6.2,40,0.9,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
4,2017-12-01,78,4,-6.0,36,2.3,2000,-18.6,0.0,0.0,0.0,Winter,No Holiday,Yes


### 4. Compute the mean temperature from the DataFrame

In [8]:
mean_temp = data["Temperature(°C)"].mean()
mean_temp

12.882922374429223

#### Round the mean temperature value to 2 decimal places.

In [9]:
mean_temp = mean_temp.round(2)
mean_temp

12.88

### 5. Group the DataFrame by the Hour column and compute mean temperature for each hour group

In [10]:
mean_data_by_hour = data[['Rented Bike Count', 'Hour', 'Temperature(°C)', 'Humidity(%)',
       'Wind speed (m/s)', 'Visibility (10m)', 'Dew point temperature(°C)',
       'Solar Radiation (MJ/m2)', 'Rainfall(mm)', 'Snowfall (cm)']].groupby("Hour", as_index = False).mean()
mean_data_by_hour

Unnamed: 0,Hour,Rented Bike Count,Temperature(°C),Humidity(%),Wind speed (m/s),Visibility (10m),Dew point temperature(°C),Solar Radiation (MJ/m2),Rainfall(mm),Snowfall (cm)
0,0,541.460274,11.286301,64.99726,1.453699,1433.380822,4.543014,0.0,0.145205,0.061644
1,1,426.183562,10.923288,66.128767,1.28411,1414.813699,4.470959,0.0,0.062192,0.073699
2,2,301.630137,10.591507,67.306849,1.223836,1382.59726,4.438904,0.0,0.096986,0.076438
3,3,203.331507,10.293699,68.136986,1.197534,1358.635616,4.40274,0.0,0.146849,0.076712
4,4,132.591781,10.026301,68.731507,1.21726,1339.284932,4.308219,0.0,0.155616,0.076438
5,5,139.082192,9.768767,69.523288,1.136712,1333.758904,4.294795,0.0,0.080274,0.078356
6,6,287.564384,9.560548,70.20274,1.118904,1311.69863,4.203288,0.006384,0.161096,0.085205
7,7,606.005479,9.581096,69.232877,1.187123,1298.706849,4.039726,0.093973,0.139726,0.092877
8,8,1015.70137,10.176986,66.263014,1.280274,1319.523288,3.790137,0.354959,0.162192,0.1
9,9,645.983562,11.37589,60.419178,1.466849,1332.183562,3.470959,0.769945,0.279452,0.113973


In [11]:
mean_data_by_hour.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24 entries, 0 to 23
Data columns (total 10 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Hour                       24 non-null     int64  
 1   Rented Bike Count          24 non-null     float64
 2   Temperature(°C)            24 non-null     float64
 3   Humidity(%)                24 non-null     float64
 4   Wind speed (m/s)           24 non-null     float64
 5   Visibility (10m)           24 non-null     float64
 6   Dew point temperature(°C)  24 non-null     float64
 7   Solar Radiation (MJ/m2)    24 non-null     float64
 8   Rainfall(mm)               24 non-null     float64
 9   Snowfall (cm)              24 non-null     float64
dtypes: float64(9), int64(1)
memory usage: 2.0 KB


#### Round the mean temperature for each group to 2 decimal places

In [12]:
mean_data_by_hour = mean_data_by_hour.round(2)
mean_data_by_hour

Unnamed: 0,Hour,Rented Bike Count,Temperature(°C),Humidity(%),Wind speed (m/s),Visibility (10m),Dew point temperature(°C),Solar Radiation (MJ/m2),Rainfall(mm),Snowfall (cm)
0,0,541.46,11.29,65.0,1.45,1433.38,4.54,0.0,0.15,0.06
1,1,426.18,10.92,66.13,1.28,1414.81,4.47,0.0,0.06,0.07
2,2,301.63,10.59,67.31,1.22,1382.6,4.44,0.0,0.1,0.08
3,3,203.33,10.29,68.14,1.2,1358.64,4.4,0.0,0.15,0.08
4,4,132.59,10.03,68.73,1.22,1339.28,4.31,0.0,0.16,0.08
5,5,139.08,9.77,69.52,1.14,1333.76,4.29,0.0,0.08,0.08
6,6,287.56,9.56,70.2,1.12,1311.7,4.2,0.01,0.16,0.09
7,7,606.01,9.58,69.23,1.19,1298.71,4.04,0.09,0.14,0.09
8,8,1015.7,10.18,66.26,1.28,1319.52,3.79,0.35,0.16,0.1
9,9,645.98,11.38,60.42,1.47,1332.18,3.47,0.77,0.28,0.11


### 6. Filter the data on the hours when mean temperature was greater than the overall mean temperature

In [13]:
mean_temp

12.88

In [14]:
mask_higher = (mean_data_by_hour["Temperature(°C)"] > mean_temp)

In [15]:
mean_data_by_hour_higher = mean_data_by_hour[mask_higher][['Hour','Rented Bike Count', 'Temperature(°C)']].copy().reset_index(drop = True)
mean_data_by_hour_higher

Unnamed: 0,Hour,Rented Bike Count,Temperature(°C)
0,10,527.82,12.91
1,11,600.85,14.31
2,12,699.44,15.46
3,13,733.25,16.26
4,14,758.82,16.82
5,15,829.19,17.04
6,16,930.62,16.9
7,17,1138.51,16.25
8,18,1502.93,15.3
9,19,1195.15,14.28


In [16]:
mean_data_by_hour_higher.to_csv(data_loc + 'mean_data_by_hour_higher.csv', index = 'False')

### 7. Filter the data on the hours when mean temperature was less than or equal to than the overall mean temperature

In [17]:
mean_temp

12.88

In [18]:
mask_lower = (mean_data_by_hour["Temperature(°C)"] <= mean_temp)

In [19]:
mean_data_by_hour_lower = mean_data_by_hour[mask_lower][['Hour','Rented Bike Count', 'Temperature(°C)']].copy().reset_index(drop= True)
mean_data_by_hour_lower

Unnamed: 0,Hour,Rented Bike Count,Temperature(°C)
0,0,541.46,11.29
1,1,426.18,10.92
2,2,301.63,10.59
3,3,203.33,10.29
4,4,132.59,10.03
5,5,139.08,9.77
6,6,287.56,9.56
7,7,606.01,9.58
8,8,1015.7,10.18
9,9,645.98,11.38


In [20]:
mean_data_by_hour_lower.to_csv(data_loc + 'mean_data_by_hour_lower.csv', index = 'False')