# **Exploratory and Time Series Analysis in SQL**

![Intro Image](.\Materials\IntroImage2.jpg)

An important part

**USAGE STATS**

> **<u>column</u>                                <u>type</u>                       <u>meaning</u>**
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Date&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;date&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; The date of the recorded observation ranging between 2017-12-01 and 2018-12-01.</span>
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Rented_Bike_Count&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; smallint&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; The count of bikes rented within a set hour and its 60 subsequent minutes.</span>
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Hour&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;tinyint&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; The start day-hour of counting from 0 to 23.</span>
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Temperature_C&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;float&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;The temperature in Celcius Degrees as 0° being the point of freezing water and 100° of boiling water.&nbsp;&nbsp;</span>          
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Humidity&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; tinyint&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Percentage of amount of water vapor in the air.</span>
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Wind_speed_m_s&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;float&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; The strength of natural wind affecting the environment measured in meters per second (m/s).</span>
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Visibility_10m&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; smallint&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Greatest distance a black object can be recognised when observed against a bright background.</span>
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Rainfall_mm&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;float&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Total rainfall depth during a given hour expressed in millimeters (mm).</span>
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Snowfall_cm&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;float&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; The depth of snow measured in centimeters (cm).</span>
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Seasons&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; varchar(50)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;The divisions of the year: Summer, Autumn, Winter and Spring.</span>
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Holiday&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; varchar(50)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; [Description in progress]</span>
> 
> <span style="background-color: rgba(127, 127, 127, 0.1);">Functioning_Day&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;varchar(50)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;A day on which one usually works.</span>

# **BASIC EXPLORATION**

### **DATASET OVERVIEW**

In [81]:
SELECT TOP 4 * FROM UsageStats

Date,Rented_Bike_Count,Hour,Temperature_C,Humidity,Wind_speed_m_s,Visibility_10m,Rainfall_mm,Snowfall_cm,Seasons,Holiday,Functioning_Day
2017-12-01,254,0,-5.199999809265137,37,2.200000047683716,2000,0,0,Winter,No Holiday,Yes
2017-12-01,204,1,-5.5,38,0.800000011920929,2000,0,0,Winter,No Holiday,Yes
2017-12-01,173,2,-6.0,39,1.0,2000,0,0,Winter,No Holiday,Yes
2017-12-01,107,3,-6.199999809265137,40,0.8999999761581421,2000,0,0,Winter,No Holiday,Yes


### **SUMMARY STATISTICS**

As I believe there is no one-only built-in T-SQL function that can provide a summary for all (or selected) dimensions, I had to improvise in my querying to generate key statistics (and a few customisable info) for everything all at once.

In [1]:

------------------------------------------ [COUNT OF NULL VALUES]--------------------------------
/* The CASE statement works around the lack of a built-in function to count null values. */
SELECT '1- Null Values' 'Statistics',
    SUM(case when Rented_Bike_Count is null then 1 else 0 end) Rented_Bike_Count, SUM(case when Temperature_C is null then 1 else 0 end) Temperature_C,
    SUM(case when Humidity is null then 1 else 0 end) Humidity, SUM(case when Wind_speed_m_s is null then 1 else 0 end) Wind_speed_m_s,
    SUM(case when Visibility_10m is null then 1 else 0 end) Visibility_10m, SUM(case when Rainfall_mm is null then 1 else 0 end) Rainfall_mm,
    SUM(case when Snowfall_cm is null then 1 else 0 end) Snowfall_cm
FROM UsageStats
------------------------------------------ [MINIMUM] --------------------------------------------
UNION /* I had to use a series of UNION clauses to arrange all key stats in a Matrix style. */
SELECT '2- Minimum',
    MIN(Rented_Bike_Count), MIN(Temperature_C), MIN(Humidity), MIN(Wind_speed_m_s), MIN(Visibility_10m), MIN(Rainfall_mm), MIN(Snowfall_cm)
FROM UsageStats
------------------------------------------ [MEDIAN] ---------------------------------------------
UNION /* Functions and syntax below are required by MS documentation to achieve the aimed resunt */
SELECT '3- Median',
    PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY Rented_Bike_Count)OVER(), PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY Temperature_C)OVER(),
    PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY Humidity)OVER(), PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY Wind_speed_m_s)OVER(),
    PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY Visibility_10m)OVER(), PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY Rainfall_mm)OVER(),
    PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY Snowfall_cm)OVER()
FROM UsageStats
------------------------------------------ [MAXIMUM DATA POINT] ---------------------------------
UNION
SELECT '4- Maximum',
    MAX(Rented_Bike_Count), MAX(Temperature_C), MAX(Humidity), MAX(Wind_speed_m_s), MAX(Visibility_10m), MAX(Rainfall_mm), MAX(Snowfall_cm)
FROM UsageStats
------------------------------------------ [AVERAGE] --------------------------------------------
UNION
SELECT '5- Average',
    AVG(Rented_Bike_Count), AVG(Temperature_C), AVG(Humidity), AVG(Wind_speed_m_s), AVG(Visibility_10m), AVG(Rainfall_mm), AVG(Snowfall_cm)
FROM UsageStats
------------------------------------------ [STANDARD DEVIATION] ---------------------------------
UNION
SELECT '6- Standard Deviation',
    STDEV(Rented_Bike_Count), STDEV(Temperature_C), STDEV(Humidity), STDEV(Wind_speed_m_s), STDEV(Visibility_10m), STDEV(Rainfall_mm),
    STDEV(Snowfall_cm)
FROM UsageStats
ORDER BY [Statistics]

Statistics,Rented_Bike_Count,Temperature_C,Humidity,Wind_speed_m_s,Visibility_10m,Rainfall_mm,Snowfall_cm
1- Null Values,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2- Minimum,0.0,-17.799999237060547,0.0,0.0,27.0,0.0,0.0
3- Median,504.0,13.699999809265137,57.0,1.5,1698.0,0.0,0.0
4- Maximum,3556.0,39.400001525878906,98.0,7.400000095367432,2000.0,35.0,8.800000190734863
5- Average,704.0,12.882922378630637,58.0,1.7249086756965135,1436.0,0.1486872147666673,0.0750684932860881
6- Standard Deviation,644.9974677392156,11.94482523726802,20.36241330156561,1.0362999914838886,608.298711984019,1.1281929694562725,0.4367461821932454


**COMMENT ON RESULTS:** 

-

# **TIME SERIES ANALYSIS**

## **HOW DO WEEKS PERFORM IN EACH SEASON?**  

### <span style="font-size: 14px;">dadsadsa&nbsp;</span> <span style="font-size: 14px;">BIKE RENTAL DEMAND THROUGHOUT THE WEEK SEGREGATED BY SEASONS</span>

In [93]:
-- DISTRIBUTION OF RENTAL ACTIVITY ACROSS A WEEK

SELECT y.DayOfWeek, AVG(TotalofDay) AS 'Year', Summer, Autumn, Winter, Spring
FROM
    (SELECT
        DATEPART(WEEKDAY,date) AS DayNumber, DATENAME(WEEKDAY,date) AS DayOfWeek,
        SUM(Rented_Bike_Count) AS TotalofDay
     FROM UsageStats GROUP BY date) y

JOIN (SELECT DayOfWeek, AVG(TotalofDay) AS 'Summer'
      FROM (SELECT
                 DATEPART(WEEKDAY,date) AS DayNumber, DATENAME(WEEKDAY,date) AS DayOfWeek,
                 SUM(Rented_Bike_Count) AS TotalofDay
           FROM UsageStats WHERE seasons = 'Summer' GROUP BY date) stemp
      GROUP BY DayOfWeek, DayNumber
) s ON y.DayOfWeek = s.DayOfWeek

JOIN (SELECT DayOfWeek, AVG(TotalofDay) AS 'Autumn'
      FROM (SELECT
                 DATEPART(WEEKDAY,date) AS DayNumber, DATENAME(WEEKDAY,date) AS DayOfWeek,
                 SUM(Rented_Bike_Count) AS TotalofDay
           FROM UsageStats WHERE seasons = 'Autumn' GROUP BY date) atemp
      GROUP BY DayOfWeek, DayNumber
) a ON y.DayOfWeek = a.DayOfWeek

JOIN (SELECT DayOfWeek, AVG(TotalofDay) AS 'Winter'
      FROM (SELECT
                 DATEPART(WEEKDAY,date) AS DayNumber, DATENAME(WEEKDAY,date) AS DayOfWeek,
                 SUM(Rented_Bike_Count) AS TotalofDay
           FROM UsageStats WHERE seasons = 'Winter' GROUP BY date) wtemp
      GROUP BY DayOfWeek, DayNumber
) w ON y.DayOfWeek = w.DayOfWeek

JOIN (SELECT DayOfWeek, AVG(TotalofDay) AS 'Spring'
      FROM (SELECT
                 DATEPART(WEEKDAY,date) AS DayNumber, DATENAME(WEEKDAY,date) AS DayOfWeek,
                 SUM(Rented_Bike_Count) AS TotalofDay
           FROM UsageStats WHERE seasons = 'Spring' GROUP BY date) sptemp
      GROUP BY DayOfWeek, DayNumber
) sp ON y.DayOfWeek = sp.DayOfWeek

GROUP BY y.DayOfWeek, Summer, Autumn, Winter, Spring, DayNumber ORDER BY DayNumber

DayOfWeek,Year,Summer,Autumn,Winter,Spring
Sunday,15003,22326,18159,4079,15450
Monday,17533,22187,23011,5514,19421
Tuesday,16511,24329,16257,5908,19550
Wednesday,17768,26091,22434,5333,17214
Thursday,16576,25155,20099,5785,14590
Friday,17930,27968,17366,6285,19330
Saturday,17028,25424,20362,5013,17313


**COMMENT ON RESULTS:** 

- Summer enjoyed a demand throughout weeks almost 50% superior of all 12 averaged months.
- Whereas, winter struggled with a drop of almost 70% every day of week when compared with the same period.
- Also, Fridays, as exptected, have been most demaning days of week in all seasons with the exception of Autumn, which saw Mondays steal the spotlight.
- In a nutshell, weekdays have performed better than weekends.

In [71]:
SELECT                DATEPART(WEEKDAY,date) AS DayNumber,
                DATENAME(WEEKDAY,date) AS DayOfWeek,
                avg(Rented_Bike_Count) AS n
            FROM UsageStats GROUP BY DATEPART(WEEKDAY,date),DATENAME(WEEKDAY,date)

DayNumber,DayOfWeek,n
1,Sunday,625
4,Wednesday,740
2,Monday,730
6,Friday,747
3,Tuesday,687
5,Thursday,690
7,Saturday,709


**COMMENT ON RESULTS:** Surprisingly, the numbers for Saturdays and Sundays are below those recorded for most business days, with Sundays performing 25% below the overall average (\<16,907) and Fridays proving to be the busiest days with bike sharing trips performing 25% above the same average.

In [55]:
-- DISTRIBUTION OF RENTAL ACTIVITY ACROSS THE DAY
SELECT 
    Hour,
    AVG(Rented_Bike_Count) AS 'Average Rent Count'
FROM UsageStats
GROUP BY Hour
ORDER BY Hour

Hour,Average Rent Count
0,541
1,426
2,301
3,203
4,132
5,139
6,287
7,606
8,1015
9,645


**COMMENT ON RESULTS:** Consistent usage above average counts (\>706) only happens between 1pm and 10pm but the count decreases significantly after 1h (hitting the bottom between 2am and 6am). Also, spikes in the number of rented bikes can be observed at both sunrise (≈7:40am) and sunset (≈5:15pm) [times](https://www.timeanddate.com/sun/south-korea/seoul?month=12&year=2017) in Seoul.

In [57]:
-- DISTRIBUTION OF RENTAL ACTIVITY ACROSS THE SEASONS
SELECT 
    seasons,
    AVG(Rented_Bike_Count) AS 'Average Rent Count'
FROM UsageStats
GROUP BY seasons

seasons,Average Rent Count
Winter,225
Summer,1034
Spring,730
Autumn,819


# **ANALYSING KEY METRICS**

**CATEGORISING ENVIRONMENTAL INFLUENCERS**

Let's now clear up any doubts we might have about how weather and climate can influence the cyclers behaviour. 

- Ignore dimensions with very low usability.

<u>Level</u>           | [Temperature](https://thinkmetric.uk/basics/temperature)             | <u>Humidity</u>     | [Wind Speed](https://www.weather.gov/pqr/wind)                           | Visibility

<span style="background-color: rgba(127, 127, 127, 0.1);">Intensity 0&nbsp; &nbsp;| Midly Cold&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | Humid Air&nbsp; &nbsp;| Calmness to Light Air&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |</span>

<span style="background-color: rgba(127, 127, 127, 0.1);">Intensity 1&nbsp; &nbsp;| Very Chilly&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;| Average&nbsp; &nbsp; &nbsp; | Light to Gentle Breeze&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |</span>

<span style="background-color: rgba(127, 127, 127, 0.1);">Intensity 2&nbsp; &nbsp;| Mostly Unbearable&nbsp; &nbsp;| Dry Air&nbsp; &nbsp; &nbsp; &nbsp; | Moderate to Strong Breeze&nbsp; &nbsp;|</span>

<span style="background-color: rgba(127, 127, 127, 0.1);">Intensity 3&nbsp; &nbsp;| Mostly Unbearable&nbsp; &nbsp;| Dry Air&nbsp; &nbsp; &nbsp; &nbsp; | Moderate to Strong Breeze&nbsp; &nbsp;|</span><span style="background-color: rgba(127, 127, 127, 0.1);"><br></span>

In [47]:
-- [WORK IN PROGRESS - CATEGORY MATRIX OF ALL ENVIRONMENT DIMENSIONS]
WITH t AS (SELECT *, CASE
                WHEN Temperature_C BETWEEN 0 AND 9 THEN 'Intensity 0'
                WHEN Temperature_C BETWEEN -8.000 AND -0.001 THEN 'Intensity 1'
                ELSE 'Intensity 2' END AS Temperature
           FROM UsageStats),
     h AS (SELECT *, CASE
                WHEN Humidity BETWEEN 70 AND 100 THEN 'Intensity 2'
                WHEN Humidity BETWEEN 40 AND 69.99 THEN 'Intensity 1'
                ELSE 'Intensity 0' END AS HumidWeather
           FROM UsageStats),
     w AS (SELECT *, CASE
                WHEN Wind_speed_m_s BETWEEN 4.150 AND 7.000 THEN 'Intensity 2'
                WHEN Wind_speed_m_s BETWEEN 1.700 AND 4.149 THEN 'Intensity 1'
                ELSE 'Intensity 0' END AS WindSpeed
           FROM UsageStats)

/* With all dimesions grouped into bins, I can now start working into arranging the
   average hourly rental counts of each of them into a single t*/

SELECT
    Temperature AS 'Intensity/AvgRentals',
    AVG(Rented_Bike_Count) Temperature,
    h1.Avg4Humid AS HumidWeather,
    w1.Avg4Wind AS WindSpeed
FROM t
    JOIN (SELECT HumidWeather, AVG(Rented_Bike_Count) Avg4Humid
          FROM h GROUP BY HumidWeather) h1 ON t.Temperature = h1.HumidWeather
    JOIN (SELECT WindSpeed, AVG(Rented_Bike_Count) Avg4Wind
            FROM w GROUP BY WindSpeed) w1 ON t.Temperature = w1.WindSpeed

GROUP BY Temperature, h1.Avg4Humid, w1.Avg4Wind


Intensity/AvgRentals,Temperature,HumidWeather,WindSpeed
Intensity 0,275,254,231
Intensity 1,219,234,222
Intensity 2,157,173,238


**COMMENT ON RESULTS:** 

- It is important to note that the dataset doesn't contain any recorded events with wind speed over 7 m/s. Which could be explained by Seoul having its windiest month

In [56]:
SELECT
    AVG(Rented_Bike_Count)
FROM UsageStats


(No column name)
704


**CONCLUSIONS**

- The patterns observed in the time series analysis suggest that bike sharing in Seoul's winter may be favoured more by commuters than holidaymakers or other casual travellers.
-