# SQL 的五十道練習

> 函數

[數據交點](https://www.datainpoint.com) | 郭耀仁 <yaojenkuo@datainpoint.com>

## 練習題指引

- 在每份練習題的開始，都會先將四個學習資料庫載入環境。
- 因此 SQL 可以指定四個學習資料庫中的資料表，不需要額外指定資料庫。
- 在 SQL 語法起點與 SQL 語法終點這兩個單行註解之間撰寫能夠得到預期結果的 SQL。
- 可以先在自己電腦的 SQLiteStudio 或者 DBeaver 寫出跟預期結果相同的 SQL 後再複製貼上到練習題。
- 執行測試的方式為點選上方選單的 Kernel -> Restart & Run All -> Restart and Run All Cells。
- 可以每寫一題就執行測試，也可以全部寫完再執行測試。
- 練習題閒置超過 10 分鐘會自動斷線，這時只要重新點選練習題連結即可重新啟動。

In [2]:
import sqlite3
import unittest
import json
import os
import numpy as np
import pandas as pd
conn = sqlite3.connect('../databases/nba.db')
conn.execute("""ATTACH '../databases/covid19.db' AS covid19""")
conn.execute("""ATTACH '../databases/twElection2020.db' AS twElection2020""")
conn.execute("""ATTACH '../databases/imdb.db' AS imdb""")

<sqlite3.Cursor at 0x22924f47030>

## 09. 從 `nba` 資料庫的 `players` 資料表依據 `heightMeters`、`weightKilograms` 以及下列公式衍生計算欄位 `bmi`，並使用 `ROUND` 函數將 `bmi` 的小數點位數調整為 2 位，參考下列的預期查詢結果。

\begin{equation}
BMI = \frac{weight_{kg}}{height_{m}^2}
\end{equation}

- 預期輸入：SQL 查詢語法。
- 預期輸出：(484, 3) 的查詢結果。

```
     heightMeters  weightKilograms    bmi
0            2.06            113.4  26.72
1            2.01            108.0  26.73
2            2.03            106.6  25.87
3            2.08            120.2  27.78
4            1.98             97.5  24.87
..            ...              ...    ...
479          2.01            104.3  25.82
480          2.08            106.1  24.52
481          1.78             88.5  27.93
482          1.98             90.7  23.14
483          1.96             83.9  21.84

[484 rows x 3 columns]
```

In [3]:
calculate_rounded_bmi_from_players =\
"""
-- SQL 查詢語法起點
SELECT heightMeters,
       weightKilograms,
       ROUND(weightKilograms / heightMeters / heightMeters, 2) AS bmi
  FROM players
-- SQL 查詢語法終點
"""

## 10. 從 `nba` 資料庫的 `career_summaries` 資料表中依據 `assists`、`turnovers` 欄位以及下列公式衍生計算助攻失誤比，使用 `CAST` 函數讓衍生計算欄位的資料類型為浮點數 `REAL`，參考下列的預期查詢結果。

\begin{equation}
\text{Assists Turnover Ratio} = \frac{Assists}{Turnovers}
\end{equation}

PS 在練習題預期的查詢結果看到 `NaN` 或者 `None` 代表的就是遺漏值 `NULL`。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(484, 3) 的查詢結果。

```
     assists  turnovers  assist_turnover_ratio
0     9669.0     4576.0               2.112981
1     3327.0     2981.0               1.116068
2      729.0      804.0               0.906716
3     1615.0     3225.0               0.500775
4     4965.0     2180.0               2.277523
..       ...        ...                    ...
479      4.0        3.0               1.333333
480      0.0        1.0               0.000000
481    112.0       39.0               2.871795
482      1.0        0.0                    NaN
483     12.0        7.0               1.714286

[484 rows x 3 columns]
```

In [5]:
calculate_ast_to_ratio_from_career_summaries =\
"""
-- SQL 查詢語法起點
SELECT assists,
        turnovers,
        CAST(assists AS REAL) / turnovers AS assist_turnover_ratio
  FROM career_summaries
-- SQL 查詢語法終點
"""

## 11. 從 `covid19` 資料庫的 `time_series` 資料表依據 `Date` 變數，使用 `STRFTIME` 函數查詢時間序列資料有哪些不重複的月份，參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(15, 1) 的查詢結果。

```
   distinct_year_month
0              2020-01
1              2020-02
2              2020-03
3              2020-04
4              2020-05
5              2020-06
6              2020-07
7              2020-08
8              2020-09
9              2020-10
10             2020-11
11             2020-12
12             2021-01
13             2021-02
14             2021-03
```

In [7]:
find_distinct_year_month_from_time_series =\
"""
-- SQL 查詢語法起點
SELECT DISTINCT STRFTIME('%Y-%m', Date) AS distinct_yeat_month
  FROM time_series
-- SQL 查詢語法終點
"""

## 12. 從 `twElection2020` 資料庫的 `presidential` 資料表利用聚合函數彙總有多少人參與了總統副總統的投票選舉，參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(1, 1) 的查詢結果。

```
   total_presidential_votes
0                  14300940
```

In [9]:
summarize_total_votes_from_presidential =\
"""
-- SQL 查詢語法起點
SELECT SUM(votes) AS total_presidential_votes
  FROM presidential

-- SQL 查詢語法終點
"""

## 13. 從 `covid19` 資料庫的 `daily_report` 資料表利用聚合函數彙總截至 2021-03-31 全世界總確診數、總痊癒數以及總死亡數，參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(1, 3) 的查詢結果。

```
   total_confirmed  total_recovered  total_deaths
0        128822735         73070921       2815166
```

In [11]:
summarize_totals_from_daily_report =\
"""
-- SQL 查詢語法起點
SELECT SUM(Confirmed) AS total_confirmed,
        SUM(Recovered) AS total_recovered,
        SUM(Deaths) AS total_deaths
  FROM daily_report

-- SQL 查詢語法終點
"""

## 執行測試！

Kernel -> Restart & Run All -> Restart and Run All Cells.

In [7]:
class TestFunctions(unittest.TestCase):
    def test_09_calculate_rounded_bmi_from_players(self):
        rounded_bmi_from_players = pd.read_sql(calculate_rounded_bmi_from_players, conn)
        self.assertEqual(rounded_bmi_from_players.shape, (484, 3))
        column_values = rounded_bmi_from_players.iloc[:, 2].values
        first_value = str(column_values[0])
        self.assertTrue(len(first_value) == 5)
    def test_10_calculate_ast_to_ratio_from_career_summaries(self):
        ast_to_ratio_from_career_summaries = pd.read_sql(calculate_ast_to_ratio_from_career_summaries, conn)
        self.assertEqual(ast_to_ratio_from_career_summaries.shape, (484, 3))
        variable_dtype = str(ast_to_ratio_from_career_summaries.iloc[:, 2].dtype)
        self.assertEqual(variable_dtype, 'float64')     
    def test_11_find_distinct_year_month_from_time_series(self):
        distinct_year_month_from_time_series = pd.read_sql(find_distinct_year_month_from_time_series, conn)
        self.assertEqual(distinct_year_month_from_time_series.shape, (15, 1))
        column_values = set(distinct_year_month_from_time_series.iloc[:, 0].values)
        self.assertTrue('2020-01' in column_values)
        self.assertTrue('2021-03' in column_values)
    def test_12_summarize_total_votes_from_presidential(self):
        total_votes_from_presidential = pd.read_sql(summarize_total_votes_from_presidential, conn)
        self.assertEqual(total_votes_from_presidential.shape, (1, 1))
        column_value = total_votes_from_presidential.iloc[0, 0]
        self.assertEqual(column_value, 14300940)
    def test_13_summarize_totals_from_daily_report(self):
        totals_from_daily_report = pd.read_sql(summarize_totals_from_daily_report, conn)
        self.assertEqual(totals_from_daily_report.shape, (1, 3))
        row_values = set(totals_from_daily_report.iloc[0, :].values)
        self.assertTrue(128822735 in row_values)
        self.assertTrue(73070921 in row_values)
        self.assertTrue(2815166 in row_values)
        
suite = unittest.TestLoader().loadTestsFromTestCase(TestFunctions)
runner = unittest.TextTestRunner(verbosity=2)
test_results = runner.run(suite)
number_of_failures = len(test_results.failures)
number_of_errors = len(test_results.errors)
number_of_test_runs = test_results.testsRun
number_of_successes = number_of_test_runs - (number_of_failures + number_of_errors)
cwd = os.getcwd()
folder_name = cwd.split("/")[-1]
with open("../exercise_index.json", "r") as content:
    exercise_index = json.load(content)
chapter_name = exercise_index[folder_name]

test_09_calculate_rounded_bmi_from_players (__main__.TestFunctions) ... ERROR
test_10_calculate_ast_to_ratio_from_career_summaries (__main__.TestFunctions) ... ERROR
test_11_find_distinct_year_month_from_time_series (__main__.TestFunctions) ... ERROR
test_12_summarize_total_votes_from_presidential (__main__.TestFunctions) ... ERROR
test_13_summarize_totals_from_daily_report (__main__.TestFunctions) ... ERROR

ERROR: test_09_calculate_rounded_bmi_from_players (__main__.TestFunctions)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-7-b8bfb7b6b6fb>", line 3, in test_09_calculate_rounded_bmi_from_players
    rounded_bmi_from_players = pd.read_sql(calculate_rounded_bmi_from_players, conn)
  File "/Users/kuoyaojen/pyda/lib/python3.6/site-packages/pandas/io/sql.py", line 489, in read_sql
    chunksize=chunksize,
  File "/Users/kuoyaojen/pyda/lib/python3.6/site-packages/pandas/io/sql.py", line 1728, in read_query


In [12]:
class TestFunctions(unittest.TestCase):
    def test_09_calculate_rounded_bmi_from_players(self):
        rounded_bmi_from_players = pd.read_sql(calculate_rounded_bmi_from_players, conn)
        self.assertEqual(rounded_bmi_from_players.shape, (484, 3))
        column_values = rounded_bmi_from_players.iloc[:, 2].values
        first_value = str(column_values[0])
        self.assertTrue(len(first_value) == 5)
    def test_10_calculate_ast_to_ratio_from_career_summaries(self):
        ast_to_ratio_from_career_summaries = pd.read_sql(calculate_ast_to_ratio_from_career_summaries, conn)
        self.assertEqual(ast_to_ratio_from_career_summaries.shape, (484, 3))
        variable_dtype = str(ast_to_ratio_from_career_summaries.iloc[:, 2].dtype)
        self.assertEqual(variable_dtype, 'float64')     
    def test_11_find_distinct_year_month_from_time_series(self):
        distinct_year_month_from_time_series = pd.read_sql(find_distinct_year_month_from_time_series, conn)
        self.assertEqual(distinct_year_month_from_time_series.shape, (15, 1))
        column_values = set(distinct_year_month_from_time_series.iloc[:, 0].values)
        self.assertTrue('2020-01' in column_values)
        self.assertTrue('2021-03' in column_values)
    def test_12_summarize_total_votes_from_presidential(self):
        total_votes_from_presidential = pd.read_sql(summarize_total_votes_from_presidential, conn)
        self.assertEqual(total_votes_from_presidential.shape, (1, 1))
        column_value = total_votes_from_presidential.iloc[0, 0]
        self.assertEqual(column_value, 14300940)
    def test_13_summarize_totals_from_daily_report(self):
        totals_from_daily_report = pd.read_sql(summarize_totals_from_daily_report, conn)
        self.assertEqual(totals_from_daily_report.shape, (1, 3))
        row_values = set(totals_from_daily_report.iloc[0, :].values)
        self.assertTrue(128822735 in row_values)
        self.assertTrue(73070921 in row_values)
        self.assertTrue(2815166 in row_values)
        
suite = unittest.TestLoader().loadTestsFromTestCase(TestFunctions)
runner = unittest.TextTestRunner(verbosity=2)
test_results = runner.run(suite)
number_of_failures = len(test_results.failures)
number_of_errors = len(test_results.errors)
number_of_test_runs = test_results.testsRun
number_of_successes = number_of_test_runs - (number_of_failures + number_of_errors)
cwd = os.getcwd()
print(cwd)
folder_name = cwd.split("\\")[-1]
print(folder_name)
with open("../exercise_index.json", "r",encoding="utf-8") as content:
    exercise_index = json.load(content)
chapter_name = exercise_index[folder_name]

test_09_calculate_rounded_bmi_from_players (__main__.TestFunctions) ... ok
test_10_calculate_ast_to_ratio_from_career_summaries (__main__.TestFunctions) ... ok
test_11_find_distinct_year_month_from_time_series (__main__.TestFunctions) ... ok
test_12_summarize_total_votes_from_presidential (__main__.TestFunctions) ... ok
test_13_summarize_totals_from_daily_report (__main__.TestFunctions) ... 

D:\coding\classroom-hahow-sqlfifty\05-functions
05-functions


ok

----------------------------------------------------------------------
Ran 5 tests in 0.320s

OK


In [13]:
print("您在「{}」章節中的 {} 道 SQL 練習答對了 {} 題。".format(chapter_name, number_of_test_runs, number_of_successes))

您在「函數」章節中的 5 道 SQL 練習答對了 5 題。
