# AI Lesson 04a Part 1: Your First SQL Queries

**Course:** Applications of Artificial Intelligence  
**Focus:** Learning SQL Basics Step-by-Step  

---

## Welcome to SQL!

Today you'll write your first SQL queries to explore NBA data. Don't worry - we'll go slowly and explain everything!

**What you'll learn:**
- How to connect to a database
- How to see what tables exist
- How to SELECT data
- How to use WHERE to filter
- How to use ORDER BY to sort
- How to use LIMIT to control output

**Take your time!** Read each explanation before running the code.

---

## Part 1: Connecting to the Database

First, we need to connect to our NBA database. This is like "opening the file" in SQL.

### Query 1: Import Libraries and Connect

We need two libraries:
- `pandas` - for working with data
- `sqlite3` - for connecting to SQL databases

**Run this cell to get started:**

In [4]:
# Import the libraries we need
import pandas as pd
import sqlite3

# Connect to the NBA database
conn = sqlite3.connect('nba_5seasons.db')

print("âœ… Connected to the database!")
print("You're ready to write SQL queries.")

âœ… Connected to the database!
You're ready to write SQL queries.


**What just happened?**
- We imported pandas and sqlite3
- We created a connection (`conn`) to the database
- This connection lets us run SQL queries

**Keep `conn` open** - we'll use it in every query!

---
## Part 2: Exploring the Database

Before we can query data, we need to know what tables exist!

### Query 2: See All Tables

This special query shows what tables are in the database.

**Fill in the blank:**

In [5]:
# Query to see all tables
# sqlite_master is SQLiteâ€™s built-in metadata table that lists tables/views/indexes.
# pd.read_sql runs the SQL and returns the results as a pandas DataFrame.
query = """
SELECT name
FROM sqlite_master
WHERE type = 'table'
"""

# Run the query and show results
result = pd.read_sql(query, conn)
display(result)     # display() is a Jupyter function that shows DataFrames in a nicely formatted table

Unnamed: 0,name
0,teams
1,players
2,team_game_stats
3,player_season_stats


**Hint:** `pd.read_sql(query, conn)` - you need the query and the connection!

**What you should see:** 4 tables - players, teams, player_season_stats, team_game_stats

### Query 3: Look at the teams Table

Let's see what the `teams` table looks like. We'll get the first 5 rows.

**SQL Pattern:**
```sql
SELECT *           -- Get all columns
FROM table_name    -- From this table
LIMIT 5            -- Only show 5 rows
```

**Now you try - fill in the blanks:**

In [10]:
# Get first 5 teams
# SELECT * returns all columns (quick way to preview a table).
# LIMIT is just a sample size (keeps output small while testing).
query = """
SELECT *
FROM teams
LIMIT 5
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,team_id,full_name,abbreviation,nickname,city,state,year_founded
0,1610612737,Atlanta Hawks,ATL,Hawks,Atlanta,Georgia,1949
1,1610612738,Boston Celtics,BOS,Celtics,Boston,Massachusetts,1946
2,1610612739,Cleveland Cavaliers,CLE,Cavaliers,Cleveland,Ohio,1970
3,1610612740,New Orleans Pelicans,NOP,Pelicans,New Orleans,Louisiana,2002
4,1610612741,Chicago Bulls,CHI,Bulls,Chicago,Illinois,1966


**Hint:** Table name is `teams`, and we want `5` rows

**What you should see:** Columns like team_id, full_name, city, state, year_founded

### Query 4: Look at team_game_stats

Now let's peek at the game stats table. Same pattern!

**Fill in the blanks:**

In [11]:
# Get first 3 game records
query = """
SELECT *
FROM team_game_stats
LIMIT 3
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,season,game_id,team_id,game_date,matchup,wl,pts,fgm,fga,fg3m,...,ftm,fta,oreb,dreb,reb,ast,stl,blk,tov,plus_minus
0,2021-22,22100002,1610612744,2021-10-19,GSW @ LAL,W,121,41,93,14,...,25,30,9,41,50,30,9,2,17,7
1,2021-22,22100001,1610612751,2021-10-19,BKN @ MIL,L,104,37,84,17,...,13,23,5,39,44,19,3,9,13,-23
2,2021-22,22100002,1610612747,2021-10-19,LAL vs. GSW,L,114,45,95,15,...,9,19,5,40,45,21,7,4,18,-7


**Hint:** Table is `team_game_stats`, show just `3` rows

**What you should see:** Columns like season, game_date, pts, reb, ast, wl (win/loss)

### Query 5: Look at player_season_stats

One more table to explore!

**Your turn - write the whole query:**

In [12]:
# Get first 5 player season records
query = """
SELECT *
FROM player_season_stats
LIMIT 5
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,season,player_id,team_id,gp,min,pts,reb,ast,stl,blk,tov,fg_pct,fg3_pct,ft_pct
0,2021-22,203932,1610612743,75,2375.418333,1126.0,439.0,188.0,44.0,44.0,133.0,0.52,0.335,0.743
1,2021-22,1630565,1610612755,6,16.983333,2.0,1.0,0.0,0.0,2.0,2.0,0.2,0.0,0.0
2,2021-22,1628988,1610612756,63,1020.75,400.0,122.0,153.0,42.0,9.0,67.0,0.447,0.379,0.868
3,2021-22,1630174,1610612738,52,573.878333,196.0,89.0,22.0,20.0,5.0,31.0,0.396,0.27,0.808
4,2021-22,1630598,1610612760,50,1208.75,416.0,178.0,68.0,30.0,10.0,54.0,0.463,0.304,0.729


**Hint:** Use the same pattern - SELECT * FROM player_season_stats LIMIT 5

**What you should see:** Columns like player_id, season, gp (games played), pts, reb, ast

---
## Part 3: SELECT - Choosing Specific Columns

Instead of `SELECT *` (all columns), we can pick just the columns we want!

### Query 6: Get Team Names Only

Let's get just the team names from the teams table.

**SQL Pattern:**
```sql
SELECT column1, column2    -- List the columns you want
FROM table_name
```

**Fill in the blank:**

In [15]:
# Get just team names
query = """
SELECT full_name
FROM teams
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,full_name
0,Atlanta Hawks
1,Boston Celtics
2,Cleveland Cavaliers
3,New Orleans Pelicans
4,Chicago Bulls
5,Dallas Mavericks
6,Denver Nuggets
7,Golden State Warriors
8,Houston Rockets
9,Los Angeles Clippers


**Hint:** The column for team name is `full_name`

**What you should see:** A single column with team names

### Query 7: Get Team Name and City

Now let's get TWO columns: team name AND city.

**Remember:** Separate columns with commas!

**Fill in the blank:**

In [16]:
# Get team names and cities
query = """
SELECT full_name, city
FROM teams
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,full_name,city
0,Atlanta Hawks,Atlanta
1,Boston Celtics,Boston
2,Cleveland Cavaliers,Cleveland
3,New Orleans Pelicans,New Orleans
4,Chicago Bulls,Chicago
5,Dallas Mavericks,Dallas
6,Denver Nuggets,Denver
7,Golden State Warriors,San Francisco
8,Houston Rockets,Houston
9,Los Angeles Clippers,Los Angeles


**Hint:** `full_name, city` - don't forget the comma!

**What you should see:** Two columns - full_name and city

### Query 8: Get Team Name, City, and State

Let's add a third column!

**Fill in the blanks:**

In [17]:
# Get team names, cities, and states
query = """
SELECT full_name, city, state
FROM teams
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,full_name,city,state
0,Atlanta Hawks,Atlanta,Georgia
1,Boston Celtics,Boston,Massachusetts
2,Cleveland Cavaliers,Cleveland,Ohio
3,New Orleans Pelicans,New Orleans,Louisiana
4,Chicago Bulls,Chicago,Illinois
5,Dallas Mavericks,Dallas,Texas
6,Denver Nuggets,Denver,Colorado
7,Golden State Warriors,San Francisco,California
8,Houston Rockets,Houston,Texas
9,Los Angeles Clippers,Los Angeles,California


**Hint:** `full_name, city, state` - commas between each column!

### Query 9: Get Game Stats Columns

From the team_game_stats table, get: game_date, pts (points), and wl (win/loss)

**Don't forget LIMIT to keep it manageable!**

In [18]:
# Get game date, points, and win/loss
query = """
SELECT game_date, pts, wl
FROM team_game_stats
LIMIT 10
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,game_date,pts,wl
0,2021-10-19,121,W
1,2021-10-19,104,L
2,2021-10-19,114,L
3,2021-10-19,127,W
4,2021-10-20,121,L
5,2021-10-20,97,L
6,2021-10-20,110,W
7,2021-10-20,106,L
8,2021-10-20,97,L
9,2021-10-20,98,L


**Hint:** `game_date, pts, wl`

### Query 10: Get Player Stats Columns

From player_season_stats, get: player_id, pts, reb, ast

**Your turn - write the full query:**

In [19]:
# Get player stats
query = """
SELECT player_id, pts, reb, ast
FROM player_season_stats
LIMIT 10
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,player_id,pts,reb,ast
0,203932,1126.0,439.0,188.0
1,1630565,2.0,1.0,0.0
2,1628988,400.0,122.0,153.0
3,1630174,196.0,89.0,22.0
4,1630598,416.0,178.0,68.0
5,1627846,34.0,27.0,7.0
6,1630278,0.0,0.0,0.0
7,1629678,146.0,89.0,25.0
8,1629958,2.0,0.0,0.0
9,201143,701.0,530.0,232.0


**Hint:** SELECT player_id, pts, reb, ast FROM player_season_stats LIMIT 10

---
## Part 4: WHERE - Filtering Rows

WHERE lets us filter to only the rows we want!

### Query 11: Find California Teams

Let's find all teams in California.

**SQL Pattern:**
```sql
SELECT columns
FROM table
WHERE column = 'value'    -- Text needs single quotes!
```

**Fill in the blanks:**

In [20]:
# Find California teams
query = """
SELECT full_name, city, state
FROM teams
WHERE state = 'California'
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,full_name,city,state
0,Golden State Warriors,San Francisco,California
1,Los Angeles Clippers,Los Angeles,California
2,Los Angeles Lakers,Los Angeles,California
3,Sacramento Kings,Sacramento,California


**Hint:** `state = 'California'` - Don't forget the quotes around California!

**What you should see:** 4 teams (Lakers, Clippers, Warriors, Kings)

### Query 12: Find Texas Teams

Now find teams in Texas!

**Same pattern, different state:**

In [22]:
# Find Texas teams
query = """
SELECT full_name, city
FROM teams
WHERE state = 'Texas'
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,full_name,city
0,Dallas Mavericks,Dallas
1,Houston Rockets,Houston
2,San Antonio Spurs,San Antonio


**Hint:** `'Texas'` - with quotes!

### Query 13: Find Wins Only

From team_game_stats, find only games that were wins (wl = 'W')

**Fill in the blanks:**

In [23]:
# Find only wins
query = """
SELECT game_date, pts, wl
FROM team_game_stats
WHERE wl = 'W'
LIMIT 10
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,game_date,pts,wl
0,2021-10-19,121,W
1,2021-10-19,127,W
2,2021-10-20,110,W
3,2021-10-20,124,W
4,2021-10-20,107,W
5,2021-10-20,98,W
6,2021-10-20,94,W
7,2021-10-20,124,W
8,2021-10-20,117,W
9,2021-10-20,123,W


**Hint:** `wl = 'W'` - single quotes around the W!

### Query 14: Find High-Scoring Games

Find games where pts (points) >= 120

**Note:** Numbers DON'T need quotes!

**Fill in the blanks:**

In [24]:
# Find high-scoring games (120+ points)
# Use >, <, >=, <= to filter numeric columns by a threshold.
query = """
SELECT game_date, pts, wl
FROM team_game_stats
WHERE pts >= 120
LIMIT 10
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,game_date,pts,wl
0,2021-10-19,121,W
1,2021-10-19,127,W
2,2021-10-20,121,L
3,2021-10-20,121,L
4,2021-10-20,124,W
5,2021-10-20,134,L
6,2021-10-20,124,W
7,2021-10-20,122,L
8,2021-10-20,123,W
9,2021-10-20,123,W


**Hint:** `pts >= 120` - no quotes for numbers!

### Query 15: Find 2021-22 Season Games

Find games from the '2021-22' season.

**Note:** '2021-22' is stored as TEXT, so it needs quotes!

In [25]:
# Find 2021-22 season games
query = """
SELECT game_date, pts, wl
FROM team_game_stats
WHERE season = '2021-22'
LIMIT 10
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,game_date,pts,wl
0,2021-10-19,127,W
1,2021-10-19,104,L
2,2021-10-19,121,W
3,2021-10-19,114,L
4,2021-10-20,122,L
5,2021-10-20,123,W
6,2021-10-20,94,W
7,2021-10-20,88,L
8,2021-10-20,134,L
9,2021-10-20,138,W


**Hint:** `'2021-22'` - needs quotes because it's text!

---
## Part 5: Combining WHERE with AND

You can combine multiple conditions with AND!

### Query 16: Wins with 100+ Points

Find games that were BOTH:
- Wins (wl = 'W')
- Scored 100+ points

**SQL Pattern:**
```sql
WHERE condition1 AND condition2
```

**Fill in the blanks:**

In [26]:
# Find wins with 100+ points
# AND combines multiple filters (all conditions must be true).
query = """
SELECT game_date, pts, wl
FROM team_game_stats
WHERE wl = 'W' AND pts >= 100
LIMIT 10
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,game_date,pts,wl
0,2021-10-19,121,W
1,2021-10-19,127,W
2,2021-10-20,110,W
3,2021-10-20,124,W
4,2021-10-20,107,W
5,2021-10-20,124,W
6,2021-10-20,117,W
7,2021-10-20,123,W
8,2021-10-20,123,W
9,2021-10-20,138,W


**Hint:** `pts >= 100`

### Query 17: 2021-22 Season Wins

Find games that are BOTH:
- From 2021-22 season
- Wins

**Fill in the blanks:**

In [27]:
# Find 2021-22 wins
# Filtering by season prevents mixing games from different years.
query = """
SELECT game_date, pts, wl
FROM team_game_stats
WHERE season = '2021-22' AND wl = 'W'
LIMIT 10
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,game_date,pts,wl
0,2021-10-19,127,W
1,2021-10-19,121,W
2,2021-10-20,123,W
3,2021-10-20,94,W
4,2021-10-20,138,W
5,2021-10-20,98,W
6,2021-10-20,132,W
7,2021-10-20,124,W
8,2021-10-20,117,W
9,2021-10-20,123,W


**Hint:** `wl = 'W'`

### Query 18: High-Scoring Wins in 2021-22

Find games with ALL THREE:
- Season = '2021-22'
- Wins
- 120+ points

**Your turn:**

In [28]:
# Find high-scoring wins in 2021-22
query = """
SELECT game_date, pts, wl
FROM team_game_stats
WHERE season = '2021-22' AND wl = 'W' AND pts >= 120
LIMIT 10
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,game_date,pts,wl
0,2021-10-19,127,W
1,2021-10-19,121,W
2,2021-10-20,123,W
3,2021-10-20,138,W
4,2021-10-20,132,W
5,2021-10-20,124,W
6,2021-10-20,123,W
7,2021-10-20,124,W
8,2021-10-21,137,W
9,2021-10-22,123,W


**Hint:** `season = '2021-22' AND wl = 'W' AND pts >= 120`

---
## Part 6: ORDER BY - Sorting Results

ORDER BY lets us sort the results!

### Query 19: Teams Sorted by Name

Get all teams, sorted alphabetically by name.

**SQL Pattern:**
```sql
SELECT columns
FROM table
ORDER BY column    -- ASC (ascending) is default
```

**Fill in the blank:**

In [29]:
# Teams sorted alphabetically
# ORDER BY sorts results (ASC lowâ†’high, DESC highâ†’low).
query = """
SELECT full_name, city
FROM teams
ORDER BY full_name
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,full_name,city
0,Atlanta Hawks,Atlanta
1,Boston Celtics,Boston
2,Brooklyn Nets,Brooklyn
3,Charlotte Hornets,Charlotte
4,Chicago Bulls,Chicago
5,Cleveland Cavaliers,Cleveland
6,Dallas Mavericks,Dallas
7,Denver Nuggets,Denver
8,Detroit Pistons,Detroit
9,Golden State Warriors,San Francisco


**Hint:** `full_name` - sorts A to Z by default

### Query 20: Games Sorted by Points (Lowest to Highest)

Show games sorted by points (ascending = lowest first)

**Fill in the blanks:**

In [30]:
# Games sorted by points (lowest first)
query = """
SELECT game_date, pts, wl
FROM team_game_stats
WHERE season = '2021-22'
ORDER BY pts
LIMIT 10
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,game_date,pts,wl
0,2021-10-29,75,L
1,2022-01-08,75,L
2,2022-01-25,75,L
3,2021-11-22,77,L
4,2022-03-09,77,L
5,2021-11-04,78,L
6,2021-11-12,78,L
7,2022-04-08,78,L
8,2021-10-27,79,L
9,2021-11-03,79,L


**Hint:** `pts` or `pts ASC` - both work!

### Query 21: Games Sorted by Points (Highest to Lowest)

Show games sorted by points DESCENDING (highest first)

**Use DESC for descending!**

**Fill in the blank:**

In [31]:
# Top 10 highest-scoring games
query = """
SELECT game_date, pts, wl
FROM team_game_stats
WHERE season = '2021-22'
ORDER BY pts DESC
LIMIT 10
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,game_date,pts,wl
0,2022-01-26,158,W
1,2022-02-25,157,W
2,2022-02-25,153,L
3,2022-04-01,153,W
4,2021-12-02,152,W
5,2022-03-15,150,W
6,2022-03-14,149,W
7,2021-11-27,146,W
8,2022-04-10,146,W
9,2021-12-04,145,W


**ðŸ’¡ Hint:** `pts DESC` - DESC means highest to lowest

### Query 22: Teams by State, Then Name

Sort by state first, then by team name within each state.

**You can ORDER BY multiple columns!**

In [32]:
# Teams sorted by state, then name
query = """
SELECT full_name, city, state
FROM teams
ORDER BY state, full_name
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,full_name,city,state
0,Phoenix Suns,Phoenix,Arizona
1,Golden State Warriors,San Francisco,California
2,Los Angeles Clippers,Los Angeles,California
3,Los Angeles Lakers,Los Angeles,California
4,Sacramento Kings,Sacramento,California
5,Denver Nuggets,Denver,Colorado
6,Washington Wizards,Washington,District of Columbia
7,Miami Heat,Miami,Florida
8,Orlando Magic,Orlando,Florida
9,Atlanta Hawks,Atlanta,Georgia


**Hint:** `state, full_name` - sort by state first, then name

---
## Part 7: Putting It All Together

Now let's combine everything: SELECT, WHERE, ORDER BY, and LIMIT!

### Query 23: Top 5 Losses in 2021-22

Find the 5 lowest-scoring losses from 2021-22.

**Remember the order:**
1. SELECT
2. FROM
3. WHERE
4. ORDER BY
5. LIMIT

**Fill in the blanks:**

In [33]:
# Top 5 lowest-scoring losses
query = """
SELECT game_date, pts, wl
FROM team_game_stats
WHERE season = '2021-22' AND wl = 'L'
ORDER BY pts
LIMIT 5
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,game_date,pts,wl
0,2021-10-29,75,L
1,2022-01-08,75,L
2,2022-01-25,75,L
3,2021-11-22,77,L
4,2022-03-09,77,L


**Hint:** wl = 'L', ORDER BY pts ASC (or just pts), LIMIT 5

### Query 24: Top 10 Highest-Scoring Wins

Find the 10 highest-scoring wins from any season.

**Your turn - write the full query:**

In [34]:
# Top 10 highest-scoring wins
query = """
SELECT game_date, pts, wl
FROM team_game_stats
WHERE wl = 'W'
ORDER BY pts DESC
LIMIT 10
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,game_date,pts,wl
0,2023-02-24,176,W
1,2025-03-27,162,W
2,2022-01-26,158,W
3,2022-02-25,157,W
4,2023-04-09,157,W
5,2023-11-21,157,W
6,2024-04-14,157,W
7,2023-11-01,155,W
8,2024-12-26,155,W
9,2024-01-03,154,W


**Hint:**
```sql
SELECT game_date, pts, wl
FROM team_game_stats
WHERE wl = 'W'
ORDER BY pts DESC
LIMIT 10
```

### Query 25: California Teams Sorted by Age

Find California teams, sorted by year_founded (oldest first).

**Challenge - write it all yourself!**

In [35]:
# California teams, oldest first
query = """
SELECT full_name, city, year_founded
FROM teams
WHERE state = 'California'
ORDER BY year_founded
"""

result = pd.read_sql(query, conn)
display(result)

Unnamed: 0,full_name,city,year_founded
0,Golden State Warriors,San Francisco,1946
1,Los Angeles Lakers,Los Angeles,1948
2,Sacramento Kings,Sacramento,1948
3,Los Angeles Clippers,Los Angeles,1970


**Hint:**
```sql
SELECT full_name, city, year_founded
FROM teams
WHERE state = 'California'
ORDER BY year_founded
```

---
## Cleanup

Always close your database connection when you're done!

In [36]:
# Close the connection
conn.close()
print("âœ… Database connection closed")
print("Great work today!")

âœ… Database connection closed
Great work today!


---
## Congratulations!

You just wrote 25 SQL queries! You learned:

âœ… How to connect to a database  
âœ… How to explore tables with SELECT *  
âœ… How to choose specific columns with SELECT  
âœ… How to filter rows with WHERE  
âœ… How to combine conditions with AND  
âœ… How to sort results with ORDER BY  
âœ… How to limit output with LIMIT  

---

## Key Takeaways

**Remember:**
1. Text needs `'single quotes'` - numbers don't
2. Clause order: SELECT â†’ FROM â†’ WHERE â†’ ORDER BY â†’ LIMIT
3. `DESC` for high to low, `ASC` (or nothing) for low to high
4. Use AND to combine multiple conditions
5. Always test with LIMIT when learning!

---

## Next Steps

Ready for more? In the next lesson (ai04aTasks Part 2), you'll:
- Write more complex queries
- Work with player statistics
- Export data to Excel
- Create datasets for machine learning!