# Introduction to SQL Queries

# Pair Programming Exercise for DSE5002
HD Sheets,  Feb. 5 2025

We will connect to the chinook database and work through some queries

These are queries made to a single table, we will see later how to join tables together and create subqueries

The commands used in queries are:

SELECT --specify the variables or columns required

FROM--specify the table to obtain data from 

LIMIT-- restrict the number of lines returned to a desired total N

WHERE-- this is a filtering function carried out on row elements,  we can use AND, OR and NOT within the Where

ORDER BY-- This is a sorting function,   it can sortascending or descending, and we can sort on multiple variables

GROUP BY--this is a grouping function

HAVING--Having is a filtering operation on group members

MAX(), MIN(), AVG(), SUM(), COUNT() are aggregating functions used with GROUP BY,  



# Source material

"Learning SQL", Beaulieu,  O'Reilly 2005

https://www.sqlitetutorial.net/   - explains queries using the chinook database, albeit in the SQLite database system.  The SELECT system used for queries is pretty standard for most SQL databases,   the other aspects and commands seem to be a bit more variable from one server program to another.

That said, there are minor differences in variable names between the chinook database in postgres and the tutorial for SQlite,   watch for underscores and pluralization (track vs tracks, etc).   I have fixed all the examples shown here.

# Connect to the Chinook Database

and figure out what we have in it


Set up the required libraries

In [65]:
!pip install sqlalchemy



In [66]:
#import psycopg2
import sqlalchemy

# we will want Pandas for the data frame structure

import pandas as pd
from dotenv import find_dotenv, dotenv_values
import os

In [67]:
# Alter this to reflect your username and password,   this is for postgres on the same machine

keys = list(dotenv_values(find_dotenv('.env')).items())
os.environ['POSTGRES_PASS'] = keys[1][1]

engine=sqlalchemy.create_engine(f'postgresql://Lab_03:{os.getenv('POSTGRES_PASS')}@localhost:5432/chinook')

# what tables do we have

We can use a SELECT command to look for the table_name values in a built-in database called information_schema,  in a table called tables.  
This is a database and table that are built into postgres to hold information.   It holds a lot of info, but our table names are all in the first 15 lines

In [68]:
pd.read_sql_query("SELECT table_name  FROM information_schema.tables LIMIT 15",engine)

Unnamed: 0,table_name
0,employee
1,genre
2,invoice
3,pg_type
4,album
5,artist
6,customer
7,invoice_line
8,playlist
9,media_type


In [69]:
# Looking at the customer table, but only first 5 rows

pd.read_sql_query("SELECT * FROM customer LIMIT 5",engine)

Unnamed: 0,customer_id,first_name,last_name,company,address,city,state,country,postal_code,phone,fax,email,support_rep_id
0,1,Luís,Gonçalves,Embraer - Empresa Brasileira de Aeronáutica S.A.,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,+55 (12) 3923-5555,+55 (12) 3923-5566,luisg@embraer.com.br,3
1,2,Leonie,Köhler,,Theodor-Heuss-Straße 34,Stuttgart,,Germany,70174,+49 0711 2842222,,leonekohler@surfeu.de,5
2,3,François,Tremblay,,1498 rue Bélanger,Montréal,QC,Canada,H2G 1A7,+1 (514) 721-4711,,ftremblay@gmail.com,3
3,4,Bjørn,Hansen,,Ullevålsveien 14,Oslo,,Norway,0171,+47 22 44 22 22,,bjorn.hansen@yahoo.no,4
4,5,František,Wichterlová,JetBrains s.r.o.,Klanova 9/506,Prague,,Czech Republic,14700,+420 2 4172 5555,+420 2 4172 5555,frantisekw@jetbrains.com,4


In [70]:
#restrict this to only customer_id, first and last names


pd.read_sql_query("SELECT customer_id, first_name, last_name FROM customer LIMIT 8",engine)

Unnamed: 0,customer_id,first_name,last_name
0,1,Luís,Gonçalves
1,2,Leonie,Köhler
2,3,François,Tremblay
3,4,Bjørn,Hansen
4,5,František,Wichterlová
5,6,Helena,Holý
6,7,Astrid,Gruber
7,8,Daan,Peeters


# *QUESTION/ACTION*

Figure out what the table "invoices" looks like,  display the first 5 lines of it so you can see the content

In [71]:
pd.read_sql_query("SELECT * FROM invoice LIMIT 5",engine)

Unnamed: 0,invoice_id,customer_id,invoice_date,billing_address,billing_city,billing_state,billing_country,billing_postal_code,total
0,1,2,2021-01-01,Theodor-Heuss-Straße 34,Stuttgart,,Germany,70174,1.98
1,2,4,2021-01-02,Ullevålsveien 14,Oslo,,Norway,0171,3.96
2,3,8,2021-01-03,Grétrystraat 63,Brussels,,Belgium,1000,5.94
3,4,14,2021-01-06,8210 111 ST NW,Edmonton,AB,Canada,T6G 2C7,8.91
4,5,23,2021-01-11,69 Salem Street,Boston,MA,USA,2113,13.86


# *Question/Action*

Show the variables customer_id,  billing_country and total for the first 12 lines of invoice

In [72]:
pd.read_sql_query("SELECT customer_id, billing_country, total FROM invoice LIMIT 12",engine)

Unnamed: 0,customer_id,billing_country,total
0,2,Germany,1.98
1,4,Norway,3.96
2,8,Belgium,5.94
3,14,Canada,8.91
4,23,USA,13.86
5,37,Germany,0.99
6,38,Germany,1.98
7,40,France,1.98
8,42,France,3.96
9,46,Ireland,5.94


# Ordering or Sorting Results

In [73]:
pd.read_sql_query("SELECT * FROM track ORDER BY Milliseconds LIMIT 12",engine)

Unnamed: 0,track_id,name,album_id,media_type_id,genre_id,composer,milliseconds,bytes,unit_price
0,2461,É Uma Partida De Futebol,200,1,1,Samuel Rosa,1071,38747,0.99
1,168,Now Sports,18,1,4,,4884,161266,0.99
2,170,A Statistic,18,1,4,,6373,211997,0.99
3,178,Oprah,18,1,4,,6635,224313,0.99
4,3304,Commercial 1,258,1,17,L. Muggerud,7941,319888,0.99
5,172,The Real Problem,18,1,4,,11650,387360,0.99
6,3310,Commercial 2,258,1,17,L. Muggerud,21211,850698,0.99
7,2241,Bossa,184,1,17,,29048,967098,0.99
8,1086,Casinha Feliz,85,1,10,Gilberto Gil,32287,1039615,0.99
9,246,Mateus Enter,24,1,7,Chico Science,33149,1103013,0.99


In [74]:
# reversed order sort

# add DESC to sort descending, ASC to sort ascending

pd.read_sql_query("SELECT * FROM track ORDER BY Milliseconds DESC LIMIT 12",engine)

Unnamed: 0,track_id,name,album_id,media_type_id,genre_id,composer,milliseconds,bytes,unit_price
0,2820,Occupation / Precipice,227,3,19,,5286953,1054423946,1.99
1,3224,Through a Looking Glass,229,3,21,,5088838,1059546140,1.99
2,3244,"Greetings from Earth, Pt. 1",253,3,20,,2960293,536824558,1.99
3,3242,The Man With Nine Lives,253,3,20,,2956998,577829804,1.99
4,3227,"Battlestar Galactica, Pt. 2",253,3,20,,2956081,521387924,1.99
5,3226,"Battlestar Galactica, Pt. 1",253,3,20,,2952702,541359437,1.99
6,3243,Murder On the Rising Star,253,3,20,,2935894,551759986,1.99
7,3228,"Battlestar Galactica, Pt. 3",253,3,20,,2927802,554509033,1.99
8,3248,Take the Celestra,253,3,20,,2927677,512381289,1.99
9,3239,Fire In Space,253,3,20,,2926593,536784757,1.99


In [75]:
# sort by two variables

pd.read_sql_query("SELECT * FROM track ORDER BY composer ASC, milliseconds DESC LIMIT 12",engine)

Unnamed: 0,track_id,name,album_id,media_type_id,genre_id,composer,milliseconds,bytes,unit_price
0,2108,Children Of The Grave,174,1,3,"A. F. Iommi, W. Ward, T. Butler, J. Osbourne",357067,11626740,0.99
1,2109,Paranoid,174,1,3,"A. F. Iommi, W. Ward, T. Butler, J. Osbourne",176352,5729813,0.99
2,2107,Iron Man,174,1,3,"A. F. Iommi, W. Ward, T. Butler, J. Osbourne",172120,5609799,0.99
3,1908,New Rhumba,157,1,2,A. Jamal,276871,8980400,0.99
4,415,Astronomy,35,1,3,A.Bouchard/J.Bouchard/S.Pearlman,397531,13065612,0.99
5,2589,Hard To Handle,210,1,6,A.Isbell/A.Jones/O.Redding,206994,6786304,0.99
6,3427,Fanfare for the Common Man,296,2,24,Aaron Copland,198064,3211245,0.99
7,3357,OAM's Blues,267,5,2,Aaron Goldberg,266936,4292028,0.99
8,20,Overdose,4,1,1,AC/DC,369319,12066294,0.99
9,17,Let There Be Rock,4,1,1,AC/DC,366654,12021261,0.99


# *Question Action*

Sort invoices by billing_city (ascending) and total purchase (descending),  show the invoice_id, billing_city and total

In [76]:
pd.read_sql_query("SELECT invoice_id, billing_city, total FROM invoice ORDER BY billing_city ASC, total DESC;",engine)

Unnamed: 0,invoice_id,billing_city,total
0,390,Amsterdam,13.86
1,206,Amsterdam,8.94
2,32,Amsterdam,8.91
3,184,Amsterdam,3.96
4,379,Amsterdam,1.98
...,...,...,...
407,388,Yellowknife,5.94
408,366,Yellowknife,3.96
409,343,Yellowknife,1.98
410,148,Yellowknife,1.98


# Distinct

Selects only the unique values of a variable

In [77]:
# look at the Distinct cities in our customer list

pd.read_sql_query("""SELECT DISTINCT city 
                    FROM customer
                    ORDER BY city
                    LIMIT 20;"""
                     ,engine)

Unnamed: 0,city
0,Amsterdam
1,Bangalore
2,Berlin
3,Bordeaux
4,Boston
5,Brasília
6,Brussels
7,Budapest
8,Buenos Aires
9,Chicago


# *Question/Action*

Find the list of distinct artists listed in Track,   sort them

In [78]:
pd.read_sql_query("""
                  SELECT DISTINCT composer
                  FROM track 
                  ORDER BY composer
                  """,
                  engine
                  )

Unnamed: 0,composer
0,"A. F. Iommi, W. Ward, T. Butler, J. Osbourne"
1,A. Jamal
2,A.Bouchard/J.Bouchard/S.Pearlman
3,A.Isbell/A.Jones/O.Redding
4,Aaron Copland
...,...
849,Willie Dixon
850,"Willie Dixon, C. Burnett"
851,Wolfgang Amadeus Mozart
852,"Wright, Waters"


# Where

Where is a filter that allows us to filter out only the rows that meet some desired condition.  

Notice that the select command itself allows us to control the columns show, Where works on the rows

# Comparison Operators

=,   !=,  <, >, >=, <=                 *Note equality is a single equal sign in postgres "="

# Logical Operators

AND, NOT, OR

# Other tests

ALL- 1 if all expressions are 1

ANY- 1 if any expressions is 1

BETWEEN- tests for a range of values

IN- comparison to a list of values

LIKE- used on strings, if they match a pattern






In [79]:
# we can select as specific album id for the tracks 

pd.read_sql_query("""SELECT name, milliseconds,bytes,album_id
                     FROM track
                     WHERE album_id=6""", engine)

Unnamed: 0,name,milliseconds,bytes,album_id
0,All I Really Want,284891,9375567,6
1,You Oughta Know,249234,8196916,6
2,Perfect,188133,6145404,6
3,Hand In My Pocket,221570,7224246,6
4,Right Through You,176117,5793082,6
5,Forgiven,300355,9753256,6
6,You Learn,239699,7824837,6
7,Head Over Feet,267493,8758008,6
8,Mary Jane,280607,9163588,6
9,Ironic,229825,7598866,6


In [80]:
# we can select as specific album id for the tracks and restrict to relatively short tracks

pd.read_sql_query("""SELECT name, milliseconds,bytes,album_id
                     FROM track
                     WHERE album_id=6 AND milliseconds<250000""", engine)

Unnamed: 0,name,milliseconds,bytes,album_id
0,You Oughta Know,249234,8196916,6
1,Perfect,188133,6145404,6
2,Hand In My Pocket,221570,7224246,6
3,Right Through You,176117,5793082,6
4,You Learn,239699,7824837,6
5,Ironic,229825,7598866,6
6,Not The Doctor,227631,7604601,6


# *Question/Action*

Find out how many invoices totals where over 25

In [81]:
pd.read_sql_query("""SELECT total
                     FROM invoice 
                     WHERE total > 25""", engine)

Unnamed: 0,total
0,25.86


# LIKE

In [82]:
# The Like operator,  allows partial text matching

# note the use of the doubled percent signs %%
# also note that this is case sensitive

pd.read_sql_query("""SELECT name, album_id, composer 
                     FROM track
                     WHERE composer LIKE '%%Smith%%'""",engine)

Unnamed: 0,name,album_id,composer
0,Restless and Wild,3,"F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. D..."
1,Princess of the Dawn,3,Deaffy & R.A. Smith-Diesel
2,Killing Floor,19,Adrian Smith
3,Machine Men,19,Adrian Smith
4,2 Minutes To Midnight,95,Adrian Smith/Bruce Dickinson
...,...,...,...
92,Savior,195,Anthony Kiedis/Chad Smith/Flea/John Frusciante
93,Dancing Barefoot,234,Ivan Kral/Patti Smith
94,Take the Box,322,Luke Smith
95,What Is It About Men,322,"Delroy ""Chris"" Cooper, Donovan Jackson, Earl C..."


# *Question/Action*

Use the LIKE function to find all the invoice entries from Ireland

be sure to use LIKE,   the = test would work here too, but practice using LIKE

In [83]:
pd.read_sql_query("""SELECT billing_country
                     FROM invoice
                     WHERE billing_country LIKE '%%Ireland%%'""", engine)

Unnamed: 0,billing_country
0,Ireland
1,Ireland
2,Ireland
3,Ireland
4,Ireland
5,Ireland
6,Ireland


# IN

Tests for membership in a list

Also filtering out one AC/DC album using AND NOT combined with LIKE 

In [84]:
pd.read_sql_query("""SELECT
                        name,
                        album_id,
                        media_type_id
                    FROM
                        track
                    WHERE
                        media_type_id IN (2, 3) AND NOT(name LIKE '%%Wall%%');""",engine)

Unnamed: 0,name,album_id,media_type_id
0,Fast As a Shark,3,2
1,Restless and Wild,3,2
2,Princess of the Dawn,3,2
3,Welcome to the Jungle,90,2
4,It's So Easy,90,2
...,...,...,...
444,"There's No Place Like Home, Pt. 2",261,3
445,"There's No Place Like Home, Pt. 3",261,3
446,"Band Members Discuss Tracks from ""Revelations""",271,3
447,Branch Closing,251,3


# AND

In [85]:
pd.read_sql_query("""SELECT
                      billing_address,
                      billing_city,
                      total
                    FROM
                      invoice
                    WHERE
                      billing_city= 'New York'
                    AND total > 5
                    ORDER BY
                      total;""",engine)

Unnamed: 0,billing_address,billing_city,total
0,627 Broadway,New York,5.94
1,627 Broadway,New York,8.91
2,627 Broadway,New York,13.86


In [86]:
pd.read_sql_query("""SELECT * FROM invoice LIMIT 5""",engine)

Unnamed: 0,invoice_id,customer_id,invoice_date,billing_address,billing_city,billing_state,billing_country,billing_postal_code,total
0,1,2,2021-01-01,Theodor-Heuss-Straße 34,Stuttgart,,Germany,70174,1.98
1,2,4,2021-01-02,Ullevålsveien 14,Oslo,,Norway,0171,3.96
2,3,8,2021-01-03,Grétrystraat 63,Brussels,,Belgium,1000,5.94
3,4,14,2021-01-06,8210 111 ST NW,Edmonton,AB,Canada,T6G 2C7,8.91
4,5,23,2021-01-11,69 Salem Street,Boston,MA,USA,2113,13.86


# OR

Using AND and OR together

In [87]:
pd.read_sql_query("""SELECT
                      billing_address,
                      billing_city,
                      total
                    FROM
                      invoice
                    WHERE
                      (billing_city= 'New York' OR billing_city= 'Chicago')
                    AND total > 5
                    ORDER BY
                      total;""",engine)

Unnamed: 0,billing_address,billing_city,total
0,162 E Superior Street,Chicago,5.94
1,627 Broadway,New York,5.94
2,162 E Superior Street,Chicago,7.96
3,162 E Superior Street,Chicago,8.91
4,627 Broadway,New York,8.91
5,627 Broadway,New York,13.86
6,162 E Superior Street,Chicago,15.86


# BETWEEN

Looks for a range of values

In [88]:
pd.read_sql_query("""SELECT
                        invoice_id,
                        billing_address,
                        total
                    FROM
                        invoice
                    WHERE
                        total BETWEEN 14.91 and 18.86    
                    ORDER BY
                        total; """,engine)

Unnamed: 0,invoice_id,billing_address,total
0,193,Berger Straße 10,14.91
1,208,Ullevålsveien 14,15.86
2,103,162 E Superior Street,15.86
3,313,"68, Rue Jouvence",16.86
4,306,Klanova 9/506,16.86
5,88,"Calle Lira, 198",17.91
6,201,319 N. Frances Street,18.86
7,89,"Rotenturmstraße 4, 1010 Innere Stadt",18.86


In [89]:
#NOT BETWEEN
#
# excluding a range


pd.read_sql_query("""SELECT
                        invoice_id,
                        billing_address,
                        total
                    FROM
                        invoice
                    WHERE
                        total NOT BETWEEN 1 and 20    
                    ORDER BY
                        total; """,engine)

Unnamed: 0,invoice_id,billing_address,total
0,405,541 Del Medio Avenue,0.99
1,13,1600 Amphitheatre Parkway,0.99
2,20,110 Raeburn Pl,0.99
3,27,5112 48 Street,0.99
4,34,"Praça Pio X, 119",0.99
5,41,C/ San Bernardo 85,0.99
6,48,796 Dundas Street West,0.99
7,55,Grétrystraat 63,0.99
8,62,3 Chatham Street,0.99
9,69,319 N. Frances Street,0.99


In [90]:
# shut down the engine to close the connection

engine.dispose()