# Task 05 - Part 1 of 2: SQL Aggregations & GROUP BY

**Course:** Database Applications Development  
**Lesson:** 05 - SQL Aggregations, Grouping, and Excel Export (in Part 2) 

---

## Instructions

Complete all exercises in this notebook. You will:
1. Write SQL queries using aggregate functions
2. Use GROUP BY to analyze categories
3. Export 4 query results to Excel files
4. Answer analysis questions

**Resources:**
- Lesson materials (dbApps05_AggregationsGrouping.md)
- Walkthrough notebook (dbApps05_Walkthrough.ipynb)
- SQL Reference Guide (updated with aggregations)

**Submission:**
1. Complete all TODO sections
2. Verify all cells run without errors
3. Check that Excel files were created (in Part 2)
4. Push to GitHub: `databaseApplications/dbApps05TasksPart1.ipynb`

Let's practice aggregations!

---

## Setup

In [16]:
import pandas as pd
import sqlite3

# Connect to database
conn = sqlite3.connect('nba_5seasons.db')
print("✅ Connected to database")

✅ Connected to database


---

## Part 1: Basic Aggregate Functions (10 queries)

Practice using COUNT, SUM, AVG, MIN, and MAX without GROUP BY.

**Hints:**
- Remember to use AS to name your result columns
- Aggregate functions work on ALL rows that match your WHERE clause
- Use ROUND(AVG(column), 1) to round decimals

### Query 1: Count All Teams

**Task:** How many teams are in the database?

**Hint:** Use COUNT(*) on the `teams` table.

In [17]:
# TODO: Write your query
query_1 = """
SELECT COUNT(*) AS total_teams 
FROM teams
"""
result_1 = pd.read_sql(query_1, conn)
display(result_1)

Unnamed: 0,total_teams
0,30


### Query 2: Count Player Season Records

**Task:** How many player-season records exist for 2021-22?

**Hint:** COUNT(*) from `player_season_stats` WHERE season = '2021-22'

In [18]:
# TODO: Write your query
query_2 = """
SELECT COUNT(*) AS player_records_2021_22
FROM player_season_stats 
WHERE season = '2021-22'
"""
result_2 = pd.read_sql(query_2, conn)
display(result_2)

Unnamed: 0,player_records_2021_22
0,605


### Query 3: Total Points (All Teams)

**Task:** What were the total combined points scored by all teams in all 2021-22 games?

**Hint:** SUM(pts) from `team_game_stats` WHERE season = '2021-22'

In [19]:
# TODO: Write your query
query_3 = """
SELECT SUM(pts) AS total_combined_points
FROM team_game_stats 
WHERE season = '2021-22'
"""
result_3 = pd.read_sql(query_3, conn)
display(result_3)

Unnamed: 0,total_combined_points
0,272115


### Query 4: Average Points Per Game (League-Wide)

**Task:** What was the league-wide average points per game in 2021-22?

**Hint:** AVG(pts), remember to round to 1 decimal place  (`team_game_stats` table)

In [20]:
# TODO: Write your query
query_4 = """
SELECT ROUND(AVG(pts), 1) AS league_avg_ppg
FROM team_game_stats 
WHERE season = '2021-22'
"""
result_4 = pd.read_sql(query_4, conn)
display(result_4)

Unnamed: 0,league_avg_ppg
0,110.6


### Query 5: Highest and Lowest Scores

**Task:** Find the highest and lowest points scored in any single game during 2021-22.

**Hint:** Use both MAX(pts) and MIN(pts) in one query  (`team_game_stats` table)

In [21]:
# TODO: Write your query
query_5 = """
SELECT MAX(pts) AS highest_score, MIN(pts) AS lowest_score
FROM team_game_stats
WHERE season = '2021-22'
"""
result_5 = pd.read_sql(query_5, conn)
display(result_5)

Unnamed: 0,highest_score,lowest_score
0,158,75


### Query 6: Lakers Total Points

**Task:** How many total points did the Lakers (team_id = 1610612747) score in 2021-22?

**Hint:** SUM(pts) with WHERE for team_id AND season  (`team_game_stats` table)

In [22]:
# TODO: Write your query
query_6 = """
SELECT SUM(pts) AS lakers_total_points
FROM team_game_stats
WHERE team_id = 1610612747 AND season = '2021-22'
"""
result_6 = pd.read_sql(query_6, conn)
display(result_6)

Unnamed: 0,lakers_total_points
0,9192


### Query 7: Warriors Average Points

**Task:** What was the Warriors' (team_id = 1610612744) average points per game in 2021-22?

**Hint:** AVG(pts), round to 1 decimal (`team_game_stats` table)

In [23]:
# TODO: Write your query
query_7 = """
SELECT ROUND(AVG(pts), 1) AS warriors_avg_ppg
FROM team_game_stats
WHERE team_id = 1610612744 AND season = '2021-22'
"""
result_7 = pd.read_sql(query_7, conn)
display(result_7)

Unnamed: 0,warriors_avg_ppg
0,111.0


### Query 8: Complete Summary Statistics

**Task:** Create a summary with COUNT, SUM, AVG, MIN, and MAX for 'pts' for all games in 2021-22.

**Hint:** Use all 5 aggregate functions in one SELECT statement from the `team_game_stats` table

In [24]:
# TODO: Write your query
query_8 = """
SELECT 
    COUNT(*) AS total_games,
    SUM(pts) AS total_points,
    ROUND(AVG(pts), 1) AS avg_points,
    MIN(pts) AS min_points,
    MAX(pts) AS max_points
FROM team_game_stats
WHERE season = '2021-22'
"""
result_8 = pd.read_sql(query_8, conn)
display(result_8)

Unnamed: 0,total_games,total_points,avg_points,min_points,max_points
0,2460,272115,110.6,75,158


### Query 9: Count Teams by State

**Task:** How many teams are located in California?

**Hint:** COUNT(*) from `teams` WHERE state = 'California'

In [25]:
# TODO: Write your query
query_9 = """
SELECT COUNT(*) AS california_teams
FROM teams
WHERE state = 'California'
"""
result_9 = pd.read_sql(query_9, conn)
display(result_9)

Unnamed: 0,california_teams
0,4


### Query 10: Oldest Team

**Task:** What is the earliest year_founded in the `teams` table?

**Hint:** MIN(year_founded)

In [26]:
# TODO: Write your query
query_10 = """
SELECT MIN(year_founded) AS earliest_year
FROM teams
"""
result_10 = pd.read_sql(query_10, conn)
display(result_10)

Unnamed: 0,earliest_year
0,1946


---

## Part 2: GROUP BY Queries (8 queries)

Practice grouping data and aggregating by category.

**Remember:**
- Every non-aggregated column in SELECT must be in GROUP BY
- GROUP BY creates separate groups for aggregation
- Use ORDER BY to sort your results

### Query 11: Games Per Team

**Task:** How many games did each team play in 2021-22?

**Hint:** SELECT team_id, COUNT(*) ... GROUP BY team_id

In [27]:
# TODO: Write your query
query_11 = """
SELECT team_id, COUNT(*) AS games_played
FROM team_game_stats
WHERE season = '2021-22'
GROUP BY team_id
"""

result_11 = pd.read_sql(query_11, conn)
display(result_11.head(10))  # Show first 10 teams

Unnamed: 0,team_id,games_played
0,1610612737,82
1,1610612738,82
2,1610612739,82
3,1610612740,82
4,1610612741,82
5,1610612742,82
6,1610612743,82
7,1610612744,82
8,1610612745,82
9,1610612746,82


### Query 12: Average Points By Team

**Task:** Calculate average points per game for each team in 2021-22. Sort by highest average first.

**Hint:** GROUP BY team_id, ORDER BY avg_points DESC

In [28]:
# TODO: Write your query
query_12 = """
SELECT team_id, AVG(pts) AS avg_points
FROM team_game_stats
WHERE season = '2021-22'
GROUP BY team_id
ORDER BY avg_points DESC
"""

result_12 = pd.read_sql(query_12, conn)
display(result_12.head(10))  # Show first 10 teams

Unnamed: 0,team_id,avg_points
0,1610612750,115.939024
1,1610612763,115.609756
2,1610612749,115.487805
3,1610612766,115.329268
4,1610612756,114.829268
5,1610612737,113.939024
6,1610612762,113.609756
7,1610612759,113.158537
8,1610612751,112.902439
9,1610612743,112.719512


### Query 13: Team Performance with Names

**Task:** Show team name, games played, and average points for each team in 2021-22.

**Hint:** JOIN teams with team_game_stats, then GROUP BY

Recall from the walkthrough how JOIN operations work, here's an example:

SELECT <br>
    t.full_name as team,<br>
    t.city,<br>
    COUNT(tgs.game_id) as games_played<br>
FROM teams t<br>
JOIN team_game_stats tgs ON t.team_id = tgs.team_id<br>
WHERE tgs.season = '2021-22'<br>
GROUP BY t.team_id, t.full_name, t.city<br>
ORDER BY games_played DESC<br>
LIMIT 10

In [29]:
# TODO: Write your query
query_13 = """
SELECT 
    t.full_name AS team, 
    COUNT(tgs.game_id) AS games_played, 
    ROUND(AVG(tgs.pts), 1) AS avg_points
FROM teams t
JOIN team_game_stats tgs ON t.team_id = tgs.team_id
WHERE tgs.season = '2021-22'
GROUP BY t.team_id, t.full_name
ORDER BY avg_points DESC
"""

result_13 = pd.read_sql(query_13, conn)
display(result_13.head(10))

Unnamed: 0,team,games_played,avg_points
0,Minnesota Timberwolves,82,115.9
1,Memphis Grizzlies,82,115.6
2,Milwaukee Bucks,82,115.5
3,Charlotte Hornets,82,115.3
4,Phoenix Suns,82,114.8
5,Atlanta Hawks,82,113.9
6,Utah Jazz,82,113.6
7,San Antonio Spurs,82,113.2
8,Brooklyn Nets,82,112.9
9,Denver Nuggets,82,112.7


### Query 14: Total Points By Team

**Task:** Calculate total points scored by each team in 2021-22. Include team name.

**Hint:** SUM(pts), JOIN with `team_game_stats` with the `teams` table, GROUP BY team

In [30]:
# TODO: Write your query
query_14 = """
SELECT t.full_name, SUM(tgs.pts) AS total_season_points
FROM teams t
JOIN team_game_stats tgs ON t.team_id = tgs.team_id
WHERE tgs.season = '2021-22'
GROUP BY t.full_name
"""

result_14 = pd.read_sql(query_14, conn)
display(result_14.head(30))

Unnamed: 0,full_name,total_season_points
0,Atlanta Hawks,9343
1,Boston Celtics,9164
2,Brooklyn Nets,9258
3,Charlotte Hornets,9457
4,Chicago Bulls,9152
5,Cleveland Cavaliers,8839
6,Dallas Mavericks,8858
7,Denver Nuggets,9243
8,Detroit Pistons,8596
9,Golden State Warriors,9102


### Query 15: Season High by Team

**Task:** Find each team's highest-scoring game in 2021-22. Include team name and sort by highest game.

**Hint:** MAX(pts), JOIN, GROUP BY, ORDER BY DESC

In [31]:
# TODO: Write your query
query_15 = """
SELECT t.full_name, MAX(tgs.pts) AS season_high
FROM teams t
JOIN team_game_stats tgs ON t.team_id = tgs.team_id
WHERE tgs.season = '2021-22'
GROUP BY t.full_name
ORDER BY season_high DESC
"""

result_15 = pd.read_sql(query_15, conn)
display(result_15.head(10))

Unnamed: 0,full_name,season_high
0,Charlotte Hornets,158
1,San Antonio Spurs,157
2,Washington Wizards,153
3,Los Angeles Clippers,153
4,Memphis Grizzlies,152
5,Brooklyn Nets,150
6,Minnesota Timberwolves,149
7,Los Angeles Lakers,146
8,Houston Rockets,146
9,Boston Celtics,145


### Query 16: Win Count by Team

**Task:** Count how many wins each team had in 2021-22.

**Hint:** Use SUM(CASE WHEN wl = 'W' THEN 1 ELSE 0 END) to count wins

In [32]:
# TODO: Write your query
query_16 = """
SELECT t.full_name, SUM(CASE WHEN tgs.wl = 'W' THEN 1 ELSE 0 END) AS total_wins
FROM teams t
JOIN team_game_stats tgs ON t.team_id = tgs.team_id
WHERE tgs.season = '2021-22'
GROUP BY t.full_name
ORDER BY total_wins DESC
"""
result_16 = pd.read_sql(query_16, conn)
display(result_16.head(10))
result_16 = pd.read_sql(query_16, conn)
display(result_16.head(10))

Unnamed: 0,full_name,total_wins
0,Phoenix Suns,64
1,Memphis Grizzlies,56
2,Miami Heat,53
3,Golden State Warriors,53
4,Dallas Mavericks,52
5,Philadelphia 76ers,51
6,Milwaukee Bucks,51
7,Boston Celtics,51
8,Utah Jazz,49
9,Toronto Raptors,48


Unnamed: 0,full_name,total_wins
0,Phoenix Suns,64
1,Memphis Grizzlies,56
2,Miami Heat,53
3,Golden State Warriors,53
4,Dallas Mavericks,52
5,Philadelphia 76ers,51
6,Milwaukee Bucks,51
7,Boston Celtics,51
8,Utah Jazz,49
9,Toronto Raptors,48


### Query 17: Teams by State

**Task:** Count how many teams are in each state.

**Hint:** GROUP BY state from teams table

In [33]:
# TODO: Write your query
query_17 = """
SELECT state, COUNT(*) AS team_count
FROM teams
GROUP BY state
ORDER BY team_count DESC
"""
result_17 = pd.read_sql(query_17, conn)
display(result_17)

result_17 = pd.read_sql(query_17, conn)
display(result_17.head(10))

Unnamed: 0,state,team_count
0,California,4
1,Texas,3
2,New York,2
3,Florida,2
4,Wisconsin,1
5,Utah,1
6,Tennessee,1
7,Pennsylvania,1
8,Oregon,1
9,Ontario,1


Unnamed: 0,state,team_count
0,California,4
1,Texas,3
2,New York,2
3,Florida,2
4,Wisconsin,1
5,Utah,1
6,Tennessee,1
7,Pennsylvania,1
8,Oregon,1
9,Ontario,1


### Query 18: Players Per Team

**Task:** Count how many player-season records each team has for 2021-22.

**Hint:** COUNT(*) from player_season_stats, GROUP BY team_id

In [34]:
# TODO: Write your query
query_18 = """
SELECT team_id, COUNT(*) AS player_count
FROM player_season_stats
WHERE season = '2021-22'
GROUP BY team_id
"""

result_18 = pd.read_sql(query_18, conn)
display(result_18.head(10))

Unnamed: 0,team_id,player_count
0,1610612737,21
1,1610612738,22
2,1610612739,22
3,1610612740,18
4,1610612741,20
5,1610612742,21
6,1610612743,20
7,1610612744,16
8,1610612745,16
9,1610612746,16


## You've completed Part 1 - Nice Work!

## Now move onto Part 2!