> > > > > # **EXPLORATORY DATA ANALYSIS USING SQL**

> > > > > > **BIKE SHARING DEMAND IN SEOUL, SOUTH KOREA**

> > > > ![Intro Image](https://i.ibb.co/t8scxrY/Intro-Image2.jpg)

In an hypotetical startup in Seoul (South Korea) that scaling quickly, the demand has been fluctuating a lot. This means that there are not enough usable bikes available on some days and on other days there are too many bikes. If the company could predict demand in advance, it could avoid these situations.  

Therefore, in this project I will attempt to :

- Compare the average number of bikes rented by the time of day (morning, afternoon, and evening) across the four different seasons.
- Create views to analyse the relationship between metrics like temperature and the number of bikes rented.
- And find which variables correlate most with the number of rentals and how strong these relationships are.

This dataset (Usage Stats) consists of the number of public bikes rented in Seoul's bike sharing system at each hour. It also includes information about the weather and the time, such as whether it was a public holiday.

**\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_**

**USAGE STATS**

> **<u>column</u>                                <u>type</u>                       <u>meaning</u>**
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Date&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;date&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; The date of the recorded observation ranging between 2017-12-01 and 2018-12-01.</span>
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Rented_Bike_Count&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; smallint&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; The count of bikes rented within a set hour and its 60 subsequent minutes.</span>
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Hour&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;tinyint&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; The start day-hour of counting from 0 to 23.</span>
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Temperature_C&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;float&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;The temperature in Celcius Degrees as 0° being the point of freezing water and 100° of boiling water.&nbsp;&nbsp;</span>                        
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Humidity&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; tinyint&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Percentage of amount of water vapor in the air.</span>
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Wind_speed_m_s&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;float&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; The strength of natural wind affecting the environment measured in meters per second (m/s).</span>
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Rainfall_mm&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;float&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Total rainfall depth during a given hour expressed in millimeters (mm).</span>
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Snowfall_cm&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;float&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; The depth of snow measured in centimeters (cm).</span>
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Seasons&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; varchar(50)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;The divisions of the year: Summer, Autumn, Winter and Spring.</span>
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Holiday&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; varchar(50)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; [Description in progress]</span>
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Functioning_Day&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;varchar(50)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;A day on which one usually works.</span>

**\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_**

# \[**BASIC EXPLORATION\]**

### **DATASET OVERVIEW**

In [55]:
SELECT TOP 4 * FROM UsageStats

Date,Rented_Bike_Count,Hour,Temperature_C,Humidity,Wind_speed_m_s,Rainfall_mm,Snowfall_cm,Seasons,Holiday,Functioning_Day
2017-12-01,254,0,-5.199999809265137,37,2.200000047683716,0,0,Winter,No Holiday,Yes
2017-12-01,204,1,-5.5,38,0.800000011920929,0,0,Winter,No Holiday,Yes
2017-12-01,173,2,-6.0,39,1.0,0,0,Winter,No Holiday,Yes
2017-12-01,107,3,-6.199999809265137,40,0.8999999761581421,0,0,Winter,No Holiday,Yes


### **SUMMARY STATISTICS**

As I believe there is no one-only built-in T-SQL function that can provide a summary for all (or selected) dimensions, I had to improvise in my querying to generate key statistics (and a few customisable info) for everything all at once.

In [1]:
------------------------------------------ [COUNT OF NULL VALUES] --------------------------------
/* The CASE statement works around the lack of a built-in function to count null values. */
SELECT '1- Null Values' 'Statistics',
    SUM(case when Rented_Bike_Count is null then 1 else 0 end) Rented_Bike_Count, SUM(case when Temperature_C is null then 1 else 0 end) Temperature_C,
    SUM(case when Humidity is null then 1 else 0 end) Humidity, SUM(case when Wind_speed_m_s is null then 1 else 0 end) Wind_speed_m_s,
    SUM(case when Visibility_10m is null then 1 else 0 end) Visibility_10m, SUM(case when Rainfall_mm is null then 1 else 0 end) Rainfall_mm,
    SUM(case when Snowfall_cm is null then 1 else 0 end) Snowfall_cm
FROM UsageStats
------------------------------------------ [MINIMUM] --------------------------------------------
UNION /* I had to use a series of UNION clauses to arrange all key stats in a Matrix style. */
SELECT '2- Minimum',
    MIN(Rented_Bike_Count), MIN(Temperature_C), MIN(Humidity), MIN(Wind_speed_m_s), MIN(Visibility_10m), MIN(Rainfall_mm), MIN(Snowfall_cm)
FROM UsageStats
------------------------------------------ [MEDIAN] ---------------------------------------------
UNION /* Functions and syntax below are required by MS documentation to achieve the aimed result */
SELECT '3- Median',
    PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY Rented_Bike_Count)OVER(), PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY Temperature_C)OVER(),
    PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY Humidity)OVER(), PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY Wind_speed_m_s)OVER(),
    PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY Visibility_10m)OVER(), PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY Rainfall_mm)OVER(),
    PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY Snowfall_cm)OVER()
FROM UsageStats
------------------------------------------ [MAXIMUM] ---------------------------------
UNION
SELECT '4- Maximum',
    MAX(Rented_Bike_Count), MAX(Temperature_C), MAX(Humidity), MAX(Wind_speed_m_s), MAX(Visibility_10m), MAX(Rainfall_mm), MAX(Snowfall_cm)
FROM UsageStats
------------------------------------------ [AVERAGE] --------------------------------------------
UNION
SELECT '5- Average',
    AVG(Rented_Bike_Count), AVG(Temperature_C), AVG(Humidity), AVG(Wind_speed_m_s), AVG(Visibility_10m), AVG(Rainfall_mm), AVG(Snowfall_cm)
FROM UsageStats
------------------------------------------ [STANDARD DEVIATION] ---------------------------------
UNION
SELECT '6- Standard Deviation',
    STDEV(Rented_Bike_Count), STDEV(Temperature_C), STDEV(Humidity), STDEV(Wind_speed_m_s), STDEV(Visibility_10m), STDEV(Rainfall_mm),
    STDEV(Snowfall_cm)
FROM UsageStats
ORDER BY [Statistics]

Statistics,Rented_Bike_Count,Temperature_C,Humidity,Wind_speed_m_s,Visibility_10m,Rainfall_mm,Snowfall_cm
1- Null Values,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2- Minimum,0.0,-17.799999237060547,0.0,0.0,27.0,0.0,0.0
3- Median,504.0,13.699999809265137,57.0,1.5,1698.0,0.0,0.0
4- Maximum,3556.0,39.400001525878906,98.0,7.400000095367432,2000.0,35.0,8.800000190734863
5- Average,704.0,12.882922378630637,58.0,1.7249086756965135,1436.0,0.1486872147666673,0.0750684932860881
6- Standard Deviation,644.9974677392156,11.94482523726802,20.36241330156561,1.0362999914838886,608.298711984019,1.1281929694562725,0.4367461821932454


**COMMENT ON RESULTS:** 

- No null values in the metris, which garantees an accurate investigation of patterns using all observations.
- The mean and median are not far from each other on most dimensions, suggensting that their data may be more symmetrically distributed.

**\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_**

# \[**TIME SERIES DECOMPOSITION\]**

### **WHAT WERE THE BUSIEST AND MOST QUIET MONTHS?**

Starting simple, let's first identify what are

In [70]:
WITH cte AS (
    SELECT
        DATENAME(MONTH, date) as MonthName,
        SUM(Rented_Bike_Count) as MonthTotal,
        RANK()OVER(ORDER BY SUM(Rented_Bike_Count) DESC) AS Rank 
    FROM UsageStats
    GROUP BY DATENAME(MONTH, date)
            )

SELECT MonthName, MonthTotal
FROM cte
WHERE Rank < 4 OR Rank > 9
ORDER BY MonthTotal DESC

MonthName,MonthTotal
June,896887
July,734460
May,707088
December,185330
February,151833
January,150006


### **HOW DO WEEKS PERFORM THROUGHOUT THE YEAR DEPENDING ON EACH SEASON?**

### <span style="font-size: 14px;">As expected to the results from the last query, all winter months were ranked the most quiet ones and 2 months of summer with 1 preceeding month were ranked the busiest. Which suggests the there may be a difference between annual seasons.</span>

### <span style="font-size: 14px;">Therefore, below I attempt now to retrieve the averaged distribution of rental activity between the days of the week and segregate these into four seasons to observe how this changes behave throughout the recorded year, and, with this, I hope to start making assumptions on whether there are influences on the usage purposes of customers.</span>

In [4]:
SELECT y.DayName, Summer, Autumn, Winter, Spring,
       AVG(TotalofDay) AS 'Year'
FROM
    (SELECT
        DATEPART(WEEKDAY,date) AS DayNumber, DATENAME(WEEKDAY,date) AS DayName,
        SUM(Rented_Bike_Count) AS TotalofDay
     FROM UsageStats GROUP BY date) y
------------------------------------------ [SUMMER] -----------------------------------------
     JOIN (SELECT DayName, AVG(TotalofDay) AS 'Summer' /* Average fetched by day not hour */
           FROM
               (SELECT /* Subquery to fetch the total count per day instaed of hour */
                      DATEPART(WEEKDAY,date) AS DayNumber,
                      DATENAME(WEEKDAY,date) AS DayName,
                      SUM(Rented_Bike_Count) AS TotalofDay
                FROM UsageStats WHERE seasons = 'Summer' GROUP BY date
               ) s1
           GROUP BY DayName, DayNumber
          ) s ON y.DayName = s.DayName
------------------------------------------ [AUTUMN] -----------------------------------------
     JOIN (SELECT DayName, AVG(TotalofDay) AS 'Autumn'
           FROM
                 (SELECT /* Same subquery process repeated to all other seasons */
                        DATEPART(WEEKDAY,date) AS DayNumber,
                        DATENAME(WEEKDAY,date) AS DayName,
                        SUM(Rented_Bike_Count) AS TotalofDay
                 FROM UsageStats WHERE seasons = 'Autumn' GROUP BY date
                ) a1
            GROUP BY DayName, DayNumber
           ) a ON y.DayName = a.DayName
------------------------------------------ [WINTER] -----------------------------------------
     JOIN (SELECT DayName, AVG(TotalofDay) AS 'Winter'
           FROM
               (SELECT
                      DATEPART(WEEKDAY,date) AS DayNumber,
                      DATENAME(WEEKDAY,date) AS DayName,
                      SUM(Rented_Bike_Count) AS TotalofDay
                FROM UsageStats WHERE seasons = 'Winter' GROUP BY date
               ) w1
           GROUP BY DayName, DayNumber
          ) w ON y.DayName = w.DayName
------------------------------------------ [SPRING] -----------------------------------------
     JOIN (SELECT DayName, AVG(TotalofDay) AS 'Spring'
           FROM (SELECT
                       DATEPART(WEEKDAY,date) AS DayNumber,
                       DATENAME(WEEKDAY,date) AS DayName,
                       SUM(Rented_Bike_Count) AS TotalofDay
                 FROM UsageStats WHERE seasons = 'Spring' GROUP BY date
                ) sp1
            GROUP BY DayName, DayNumber
          ) sp ON y.DayName = sp.DayName
--------------------------------------- [END OF JOINS] --------------------------------------
GROUP BY y.DayName, Summer, Autumn, Winter, Spring, DayNumber
ORDER BY DayNumber

DayOfWeek,Summer,Autumn,Winter,Spring,Year
Sunday,22326,18159,4079,15450,15003
Monday,22187,23011,5514,19421,17533
Tuesday,24329,16257,5908,19550,16511
Wednesday,26091,22434,5333,17214,17768
Thursday,25155,20099,5785,14590,16576
Friday,27968,17366,6285,19330,17930
Saturday,25424,20362,5013,17313,17028


I exported this view to Excel, applying _transpose_ and _conditional formatting_ to make the visualisation of the matrix a bit easier with the number cluster.  

![Image](https://i.ibb.co/tpw49ps/Weekly-Analysis.jpg)

**COMMENT ON RESULTS:** Interestingly, weekdays such as Fridays (the top performer) have been more demaning most of the time across all seasons. Moreover, summer have enjoyed a demand throughout weeks almost 50% superior of all 12 averaged months. Whereas winter, on the other hand, struggled with a drop of almost 70% every day of week when compared with the same period.

### **WHAT ABOUT THE DISTRIBUTION THROUGHOUT A WHOLE DAY?**

Let's check now how these activity levels behaved throughout a day, assuming that people might prefer using the bikes when there in daylight and when there are places open to visit (business hours) around Seoul.

In [12]:
------------------ [YEAR HOURLY AVERAGE]-------------------
SELECT
    y.hour, Summer, Autumn, Winter, Spring,
    AVG(Rented_Bike_Count) AS 'Year'
FROM UsageStats y
------------------------- [SUMMER]------------------------
    JOIN (SELECT hour, AVG(Rented_Bike_Count) AS 'Summer'
          FROM UsageStats
          WHERE seasons = 'Summer' GROUP BY hour
         ) s ON y.hour = s.hour
------------------------- [AUTUMN]------------------------
    JOIN (SELECT hour, AVG(Rented_Bike_Count) AS 'Autumn'
          FROM UsageStats
          WHERE seasons = 'Autumn' GROUP BY hour
         ) a ON y.hour = a.hour
------------------------- [WINTER]------------------------
    JOIN (SELECT hour, AVG(Rented_Bike_Count) AS 'Winter'
          FROM UsageStats
          WHERE seasons = 'Winter' GROUP BY hour
         ) w ON y.hour = w.hour
------------------------- [SPRING]------------------------
    JOIN (SELECT hour, AVG(Rented_Bike_Count) AS 'Spring'
          FROM UsageStats
          WHERE seasons = 'Spring' GROUP BY hour
         ) sp ON y.hour = sp.hour
--------------------- [END OF JOINS]----------------------
GROUP BY y.hour, Summer, Autumn, Winter, Spring
ORDER BY y.hour

hour,Summer,Autumn,Winter,Spring,Year
0,899,623,165,470,541
1,698,485,159,356,426
2,505,331,117,247,301
3,342,225,77,164,203
4,223,148,50,105,132
5,245,143,51,113,139
6,485,316,92,251,287
7,902,702,209,601,606
8,1418,1197,422,1013,1015
9,911,755,254,655,645


![Image](https://i.ibb.co/rmz7WZS/Hourly-Analysis.jpg)

**COMMENT ON RESULTS:**

- Consistent demand above the average appears to start increasing from 4pm, with Winters persisting until 8pm and Summers till 10pm.
- The count decreases significantly after 11pm, hitting the bottom between 4am and 5am.
- Spikes in the number of rented bikes can be seen at both sunrise (≈7:40am) and sunset (≈5:15pm) [times](https://www.timeanddate.com/sun/south-korea/seoul?month=12&year=2017) in Seoul.

In [56]:
SELECT
    Seasons, Functioning_Day, Holiday,
    SUM(Rented_Bike_Count) AS 'Total of Bikes Rented'
FROM UsageStats 
GROUP BY Seasons, Functioning_Day, Holiday
ORDER BY 'Total of Bikes Rented' DESC

Seasons,Functioning_Day,Holiday,Total of Bikes Rented
Summer,Yes,No Holiday,2234171
Autumn,Yes,No Holiday,1698984
Spring,Yes,No Holiday,1566167
Winter,Yes,No Holiday,457097
Autumn,Yes,Holiday,91018
Summer,Yes,Holiday,49063
Spring,Yes,Holiday,45742
Winter,Yes,Holiday,30072
Spring,No,No Holiday,0
Autumn,No,No Holiday,0


**\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_**

# **\[ANALYSING CORRELATION OF KEY METRICS\]**

**CATEGORISING ENVIRONMENTAL INFLUENCERS**

With so many dimensions and spreadout data points, it becomes difficult to evaluate until what level people can feel motivated to or be driven away from renting a bike to cycle around Seoul. So, we may think: Do these cyclers prefer humid air or they actually don't like getting sweaty? Summer feels too hot or they just don't mind it? How much strong winds have to be to make people refrain from hopping on a bike? Let's now clear up any doubts we might have about how weather and climate can influence these behaviours and then we may find some leads for further investigation.

But first, because each dimension has its own data range with continuous values and each end representing a different human sensation or real-world outcome, I decided to categorise them all in 3 levels of intensity with a diagram placed below the output of my following query to facilitate the interpretation of the results — these are things like _Intensity 0_ representing most of the times comfortable or calm situations up to _Intensity 2_, representing challenging situations for cyclers such as more extreme weather or climate. Also, the range of metrics decided for each intensity classification are based on external resources found in government websites, scientific papers and blogs, and the link for them can be found on the hyperlinks in the diagram.

Then, after grouping up the data into bins, I will retrieve the _<u>average hourly count of rented bikes</u>_ to be linked to each _Intensity_ classification in my query output..

In [41]:
-- [CREATING CTEs FOR KEY METRICS AND BINNING THEM INTO THE SAME CATEGORY NAMES] --
WITH t AS (SELECT *, CASE
                WHEN Temperature_C < -0.001 THEN 'Intensity 2'
                WHEN Temperature_C BETWEEN 0 AND 20 THEN 'Intensity 1'
                ELSE 'Intensity 0' END AS Temperature
           FROM UsageStats),
     h AS (SELECT *, CASE
                WHEN Humidity > 70.001 THEN 'Intensity 2'
                WHEN Humidity BETWEEN 30.001 AND 70 THEN 'Intensity 1'
                ELSE 'Intensity 0' END AS HumidWeather
           FROM UsageStats),
     w AS (SELECT *, CASE
                WHEN Wind_speed_m_s > 5.001 THEN 'Intensity 2'
                WHEN Wind_speed_m_s BETWEEN 2.501 AND 5 THEN 'Intensity 1'
                ELSE 'Intensity 0' END AS WindSpeed
           FROM UsageStats),
     r AS (SELECT *, CASE
                WHEN Rainfall_mm > 10 THEN 'Intensity 2'
                WHEN Rainfall_mm BETWEEN 2.001 AND 10 THEN 'Intensity 1'
                ELSE 'Intensity 0' END AS Rainfall
           FROM UsageStats),
     s AS (SELECT *, CASE
                WHEN Snowfall_cm > 5.001 THEN 'Intensity 2'
                WHEN Snowfall_cm BETWEEN 2.5 AND 5 THEN 'Intensity 1'
                ELSE 'Intensity 0' END AS Snowfall
           FROM UsageStats)

------ [FETCHING THE CATEGORIES AND ADDING THE DERIVED TABLES UNDERNEATH AS EXTRA COLUMNS] -------
SELECT 
    Temperature AS 'Intensity Level', AVG(Rented_Bike_Count) 'Temperature',
    h1.Avg4Humid AS 'Humidity', w1.Avg4Wind AS 'Wind_Speed',
    r1.Avg4Rain AS 'Rainfall',  s1.Avg4Snow AS 'Snowfall'
FROM t

----------- [JOINING THE COMMON TABLE EXPRESSIONS AND FETCHING AVERAGE HOURLY COUNT OF RENTALS FOR EACH DIMENSION] -------------
    JOIN (SELECT HumidWeather, AVG(Rented_Bike_Count) Avg4Humid FROM h GROUP BY HumidWeather) h1 ON t.Temperature = h1.HumidWeather
    JOIN (SELECT WindSpeed, AVG(Rented_Bike_Count) Avg4Wind FROM w GROUP BY WindSpeed) w1        ON t.Temperature = w1.WindSpeed
    JOIN (SELECT Rainfall, AVG(Rented_Bike_Count) Avg4Rain FROM r GROUP BY Rainfall) r1          ON t.Temperature = r1.Rainfall
    JOIN (SELECT Snowfall, AVG(Rented_Bike_Count) Avg4Snow FROM s GROUP BY Snowfall) s1          ON t.Temperature = s1.Snowfall

---[ENDING QUERY]---
GROUP BY Temperature, h1.Avg4Humid, w1.Avg4Wind, r1.Avg4Rain, s1.Avg4Snow

Intensity Level,Temperature,Humidity,Wind_Speed,Rainfall,Snowfall
Intensity 0,1117,708,682,715,710
Intensity 1,598,805,794,86,159
Intensity 2,197,495,591,85,111


**OUTPUT LEGEND:**

<u>**Level**</u>           |  [<u>**Temperature**</u>](https://thinkmetric.uk/basics/temperature)      |  [**<u>Humidity</u>**](https://yourairexperts.com/blog/how-dry-air-affects-home/#:~:text=The%2520ideal%2520relative%2520humidity%2520level,of%2520moisture%2520as%2520warm%2520air.)    |  [**<u>Wind Speed</u>**](https://www.weather.gov/pqr/wind)                           |  [**<u>Rainfall</u>**](https://water.usgs.gov/edu/activity-howmuchrain-metric.html#:~:text=Slight%20rain%3A%20Less%20than%200.5,than%208%20mm%20per%20hour.)                        |  [**<u>Snowfall</u>**](https://www.sciencedirect.com/science/article/pii/S1470160X20311882#:~:text=A%2520day%2520is%2520remarked%2520as,if%2520higher%2520than%252010%2520mm.)                  |

<span style="background-color: rgba(127, 127, 127, 0.1);">Intensity 0&nbsp; &nbsp;|&nbsp; Warm to Hot&nbsp; &nbsp; &nbsp; &nbsp;|&nbsp; Humid Air&nbsp; &nbsp; |&nbsp; Calmness to Light Air&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|&nbsp; &nbsp;Slight Shower&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |&nbsp;&nbsp;Light Snow&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|</span>

<span style="background-color: rgba(127, 127, 127, 0.1);">Intensity 1&nbsp; &nbsp;|&nbsp; Cool to Chilly&nbsp; &nbsp; &nbsp; |&nbsp; Normal&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|&nbsp; Light to Gentle Breeze&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|&nbsp; &nbsp;Moderate Shower&nbsp; &nbsp; &nbsp; &nbsp; |&nbsp;&nbsp;Medium Snow&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|</span>

<span style="background-color: rgba(127, 127, 127, 0.1);">Intensity 2&nbsp; &nbsp;|&nbsp; Freezing Cold&nbsp; &nbsp; &nbsp;|&nbsp; Dry Air&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; Moderate to Strong Breeze&nbsp; &nbsp; |&nbsp; Heavy Shower&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |&nbsp;&nbsp;Heavy Snow&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|</span>

**COMMENT ON RESULTS:** 

- It is important to note that the dataset doesn't contain any recorded events with wind speed over 7 m/s. Which could be explained by Seoul having its windiest month
- Biggest influencers:

**\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_**

# **\[CONCLUSIONS\]**

- The patterns observed in the time series analysis suggest that bike sharing in Seoul's winter may be favoured more by commuters than holidaymakers or other casual travellers.
- In a nutshell, weekdays have performed better than weekends in all seasons with Summer and Autumn being the most demanded by cyclists.

In [56]:
SELECT
    AVG(Rented_Bike_Count)
FROM UsageStats


(No column name)
704
