# AI Lesson 04b Part 2: GROUP BY and HAVING
## **INSTRUCTOR SOLUTIONS**

**Course:** Applications of Artificial Intelligence  
**Focus:** Grouping Data and Filtering Groups  

Complete solutions for all 16 queries plus 3 Excel exports.

---

In [None]:
# Setup
import pandas as pd
import sqlite3

conn = sqlite3.connect('nba_5seasons.db')
print("✅ Connected to NBA database")

## Part 1: Introduction to GROUP BY Solutions

In [None]:
# Query 1: SOLUTION - Count games per season
query = """
SELECT season, COUNT(*) as game_count
FROM team_game_stats
GROUP BY season
"""

result = pd.read_sql(query, conn)
display(result)

In [None]:
# Query 2: SOLUTION - Count wins and losses
query = """
SELECT wl, COUNT(*) as count
FROM team_game_stats
GROUP BY wl
"""

result = pd.read_sql(query, conn)
display(result)

## Part 2: GROUP BY with Different Aggregations Solutions

In [None]:
# Query 3: SOLUTION - Average points per season
query = """
SELECT season, AVG(pts) as avg_points
FROM team_game_stats
GROUP BY season
ORDER BY season
"""

result = pd.read_sql(query, conn)
display(result)

In [None]:
# Query 4: SOLUTION - Total points per season
query = """
SELECT season, SUM(pts) as total_points
FROM team_game_stats
GROUP BY season
ORDER BY season
"""

result = pd.read_sql(query, conn)
display(result)

In [None]:
# Query 5: SOLUTION - Max and min points per season
query = """
SELECT 
    season,
    MAX(pts) as highest_score,
    MIN(pts) as lowest_score
FROM team_game_stats
GROUP BY season
ORDER BY season
"""

result = pd.read_sql(query, conn)
display(result)

## Part 3: Team Performance Analysis Solutions

In [None]:
# Query 6: SOLUTION - Games per team
query = """
SELECT team_id, COUNT(*) as games_played
FROM team_game_stats
WHERE season = '2021-22'
GROUP BY team_id
ORDER BY games_played DESC
"""

result = pd.read_sql(query, conn)
display(result.head(10))

In [None]:
# Query 7: SOLUTION - Average points per team
query = """
SELECT 
    team_id,
    AVG(pts) as avg_points_per_game
FROM team_game_stats
WHERE season = '2021-22'
GROUP BY team_id
ORDER BY avg_points_per_game DESC
LIMIT 10
"""

result = pd.read_sql(query, conn)
print("Top 10 scoring teams:")
display(result)

In [None]:
# Query 8: SOLUTION - Comprehensive team stats
query = """
SELECT 
    team_id,
    COUNT(*) as games,
    AVG(pts) as avg_pts,
    AVG(ast) as avg_ast,
    AVG(reb) as avg_reb
FROM team_game_stats
WHERE season = '2021-22'
GROUP BY team_id
ORDER BY avg_pts DESC
LIMIT 10
"""

result = pd.read_sql(query, conn)
display(result)

## Part 4: GROUP BY with Multiple Columns Solutions

In [None]:
# Query 9: SOLUTION - Wins/losses per team
query = """
SELECT 
    team_id,
    wl,
    COUNT(*) as count
FROM team_game_stats
WHERE season = '2021-22'
GROUP BY team_id, wl
ORDER BY team_id, wl
"""

result = pd.read_sql(query, conn)
display(result.head(20))

In [None]:
# Query 10: SOLUTION - Season and win/loss stats
query = """
SELECT 
    season,
    wl,
    AVG(pts) as avg_points
FROM team_game_stats
GROUP BY season, wl
ORDER BY season, wl
"""

result = pd.read_sql(query, conn)
display(result)

## Part 5: HAVING - Filtering Groups Solutions

In [None]:
# Query 11: SOLUTION - Teams with 40+ wins
query = """
SELECT 
    team_id,
    COUNT(*) as wins
FROM team_game_stats
WHERE season = '2021-22' AND wl = 'W'
GROUP BY team_id
HAVING COUNT(*) >= 40
ORDER BY wins DESC
"""

result = pd.read_sql(query, conn)
print("Teams with 40+ wins:")
display(result)

In [None]:
# Query 12: SOLUTION - High-scoring teams
query = """
SELECT 
    team_id,
    AVG(pts) as avg_points,
    COUNT(*) as games
FROM team_game_stats
WHERE season = '2021-22'
GROUP BY team_id
HAVING AVG(pts) >= 110
ORDER BY avg_points DESC
"""

result = pd.read_sql(query, conn)
print("Teams averaging 110+ points:")
display(result)

In [None]:
# Query 13: SOLUTION - Assist leaders
query = """
SELECT 
    team_id,
    AVG(ast) as avg_assists
FROM team_game_stats
WHERE season = '2021-22'
GROUP BY team_id
HAVING AVG(ast) >= 25
ORDER BY avg_assists DESC
"""

result = pd.read_sql(query, conn)
print("Teams with great ball movement:")
display(result)

## Part 6: Player Statistics Solutions

In [None]:
# Query 14: SOLUTION - Players with multiple seasons
query = """
SELECT 
    player_id,
    COUNT(DISTINCT season) as seasons_played
FROM player_season_stats
GROUP BY player_id
HAVING COUNT(DISTINCT season) > 1
ORDER BY seasons_played DESC
LIMIT 20
"""

result = pd.read_sql(query, conn)
display(result)

In [None]:
# Query 15: SOLUTION - Career averages
query = """
SELECT 
    player_id,
    COUNT(*) as seasons,
    AVG(pts) as career_avg_pts,
    AVG(reb) as career_avg_reb,
    AVG(ast) as career_avg_ast
FROM player_season_stats
GROUP BY player_id
HAVING COUNT(*) >= 2
ORDER BY career_avg_pts DESC
LIMIT 20
"""

result = pd.read_sql(query, conn)
display(result)

## Part 7: Excel Exports Solutions

In [None]:
# EXPORT 1: SOLUTION - Team Performance Summary
query = """
SELECT 
    team_id,
    COUNT(*) as games_played,
    SUM(CASE WHEN wl = 'W' THEN 1 ELSE 0 END) as wins,
    SUM(CASE WHEN wl = 'L' THEN 1 ELSE 0 END) as losses,
    AVG(pts) as avg_pts,
    AVG(reb) as avg_reb,
    AVG(ast) as avg_ast,
    AVG(stl) as avg_stl,
    AVG(blk) as avg_blk,
    AVG(tov) as avg_tov
FROM team_game_stats
WHERE season = '2021-22'
GROUP BY team_id
ORDER BY wins DESC
"""

team_summary = pd.read_sql(query, conn)
team_summary.to_excel('team_performance_2021-22.xlsx', index=False, sheet_name='Team Stats')

print(f"✅ Exported {len(team_summary)} teams")
display(team_summary.head(10))

In [None]:
# EXPORT 2: SOLUTION - Season Comparison
query = """
SELECT 
    season,
    COUNT(*) as total_games,
    AVG(pts) as avg_points,
    AVG(fg3m) as avg_three_pointers,
    AVG(ast) as avg_assists,
    AVG(reb) as avg_rebounds,
    MAX(pts) as highest_score,
    MIN(pts) as lowest_score
FROM team_game_stats
GROUP BY season
ORDER BY season
"""

season_comparison = pd.read_sql(query, conn)
season_comparison.to_excel('season_trends.xlsx', index=False, sheet_name='Season Trends')

print(f"✅ Exported {len(season_comparison)} seasons")
display(season_comparison)

In [None]:
# EXPORT 3: SOLUTION - Win/Loss Features
query = """
SELECT 
    team_id,
    wl,
    COUNT(*) as games,
    AVG(pts) as avg_pts,
    AVG(fgm) as avg_fgm,
    AVG(fg3m) as avg_fg3m,
    AVG(ftm) as avg_ftm,
    AVG(reb) as avg_reb,
    AVG(ast) as avg_ast,
    AVG(stl) as avg_stl,
    AVG(blk) as avg_blk,
    AVG(tov) as avg_tov
FROM team_game_stats
WHERE season = '2021-22'
GROUP BY team_id, wl
ORDER BY team_id, wl
"""

win_features = pd.read_sql(query, conn)
win_features.to_excel('team_win_loss_features.xlsx', index=False, sheet_name='Features')
win_features.to_csv('team_win_loss_features.csv', index=False)

print("✅ Exported win/loss features")
display(win_features.head(10))

## Part 8: Complete Analysis Solution

In [None]:
# Query 16: SOLUTION - Playoff-caliber teams
query = """
SELECT 
    team_id,
    COUNT(*) as wins,
    AVG(pts) as avg_pts,
    AVG(ast) as avg_ast,
    AVG(reb) as avg_reb,
    AVG(tov) as avg_tov,
    MAX(pts) as best_game
FROM team_game_stats
WHERE season = '2021-22' AND wl = 'W'
GROUP BY team_id
HAVING COUNT(*) >= 45
ORDER BY wins DESC
"""

result = pd.read_sql(query, conn)
print("Playoff-caliber teams (45+ wins):")
display(result)

In [None]:
# Cleanup
conn.close()
print("✅ Connection closed")

---
## Summary for Instructors

**Queries covered:**
- 16 GROUP BY queries with progressive difficulty
- Single column grouping (season, team_id, wl)
- Multiple column grouping (team_id + wl, season + wl)
- HAVING clause for filtering groups
- 3 Excel exports for analysis

**Common mistakes:**
1. Forgetting GROUP BY after aggregates
2. Using WHERE instead of HAVING for groups
3. Wrong clause order
4. Not including all non-aggregated columns in GROUP BY

**Excel files:**
- team_performance_2021-22.xlsx
- season_trends.xlsx
- team_win_loss_features.xlsx/csv

**Key concepts:**
- WHERE filters rows before grouping
- GROUP BY creates separate calculations
- HAVING filters groups after aggregation
- CASE WHEN for conditional counting