# Practical SQL: Chapter Practice Notebook

This Jupyter notebook will display all SQL queries and practices covered in each chapter of the book "Practical SQL: A Beginner's Guide To Storytelling With Data" by Anthony DeBarros.

The first part of this notebook will require us to import a couple of dependencies as well as personal data that will allow us to link out Jupyter Notebook to PostgreSQL.

In [1]:
import psycopg2
import pandas as pd
from sql_data import db, usr, pwd

In [2]:
# Connecting to postgreSQL database
conn = psycopg2.connect(
    host = "localhost",
    database = db,
    user = usr,
    password = pwd,
    port = 5432
)

def execute_query(connection, query):
    connection.autocommit = True
    cursor = connection.cursor()
    try:
        cursor.execute(query)
        results = cursor.fetchall()
        column_names = [i[0] for i in cursor.description]
        results = pd.DataFrame(results, columns= column_names)
        return results
        print("Query executed succesfully!")
        # Closing the cursor
        cursor.close()
        del cursor
        # Closing the connection
        connection.close()
    except OperationalError as e:
        print(f"The error '{e}' occurred.")

## Chapter 8: Extracting Information By Grouping and Summarizing

This chapter requires us to build two tables based on the 2009 and 2014 Public Library Surveys conducted by the Institute of Museum and Library Services (IMLS). Please refer to the books resources as detailed in the readme page to gain access to the csv files needed to fill in the two library tables used henceforth.

In [3]:
# Exploring the Library Data Using Aggregate Functions
# Counting Rows and Values using Count()

query = """
SELECT COUNT(*)
FROM pls_fy2014_pupld14a;
"""

execute_query(conn, query)

Unnamed: 0,count
0,9305


In [4]:
query = """
SELECT COUNT(*)
FROM pls_fy2009_pupld09a;
"""

execute_query(conn, query)

Unnamed: 0,count
0,9299


In [5]:
# Examining the count of rows in columns where the NOT NULL constraint was not applied
query = """
SELECT COUNT(salaries)
FROM pls_fy2014_pupld14a;
"""
execute_query(conn,query)

Unnamed: 0,count
0,5983


In [6]:
# Using the DISTINCT function to see how many unique values are contained within a column
# This query returns the count of all rows
query = """
SELECT COUNT(libname)
FROM pls_fy2014_pupld14a;
"""
execute_query(conn,query)

Unnamed: 0,count
0,9305


In [7]:
# This query return the unique values of the same column. The number should be smaller as there are some duplicated values.
query = """
SELECT COUNT(DISTINCT(libname))
FROM pls_fy2014_pupld14a;
"""
execute_query(conn,query)

Unnamed: 0,count
0,8515


In [8]:
# Finding Maximum and Minimum Values Using MAX() and MIN()
query = """
SELECT MAX(visits), MIN(visits)
FROM pls_fy2014_pupld14a;
"""
execute_query(conn, query)

Unnamed: 0,max,min
0,17729020,-3


In [9]:
# Aggregating Data Using GROUP BY

# Combining GROUP BY clause with COUNT()
query = """
SELECT stabr, COUNT(*)
FROM pls_fy2014_pupld14a
GROUP BY stabr
ORDER BY COUNT(*) DESC
LIMIT 5;
"""
execute_query(conn, query)

Unnamed: 0,stabr,count
0,NY,756
1,IL,625
2,TX,556
3,IA,543
4,PA,455


In [10]:
# Aggregating data from multiple columns using GROUP BY
query = """
SELECT stabr, stataddr, COUNT(*)
FROM pls_fy2014_pupld14a
GROUP BY stabr, stataddr
ORDER BY stabr ASC, COUNT(*) DESC;
"""

execute_query(conn, query)

Unnamed: 0,stabr,stataddr,count
0,AK,00,70
1,AK,15,10
2,AK,07,5
3,AL,00,221
4,AL,07,3
...,...,...,...
101,WI,07,6
102,WI,15,3
103,WV,00,93
104,WV,15,4


In [11]:
# Aggregating data from multiple joined tables

# No aggregation but here we are trying to determine trends in library visists using the 2014 and 2009 tables 
query = """
SELECT SUM(pls14.visits) AS visits_2014,
    SUM(pls09.visits) AS visits_2009
FROM pls_fy2014_pupld14a AS pls14
JOIN pls_fy2009_pupld09a AS pls09
    ON pls14.fscskey = pls09.fscskey
WHERE pls14.visits >=0 AND pls09.visits >= 0;
"""

execute_query(conn, query)

Unnamed: 0,visits_2014,visits_2009
0,1417299241,1585455205


In [12]:
# Aggregating data to compare trends by states
query = """
SELECT
    pls14.stabr,
    SUM(pls14.visits) AS visits_2014,
    SUM(pls09.visits) AS visits_2009,
    ROUND( (CAST(SUM(pls14.visits) AS DECIMAL(10, 1)) - SUM(pls09.visits)) /
        SUM(pls09.visits) * 100, 2) AS pct_change
FROM pls_fy2014_pupld14a AS pls14
JOIN pls_fy2009_pupld09a AS pls09
    ON pls14.fscskey = pls09.fscskey
WHERE pls14.visits >=0 AND pls09.visits >= 0
GROUP BY pls14.stabr
ORDER BY pct_change DESC
LIMIT 5
;
"""

execute_query(conn, query)

Unnamed: 0,stabr,visits_2014,visits_2009,pct_change
0,GU,103593,60763,70.49
1,DC,4230790,2944774,43.67
2,LA,17242110,15591805,10.58
3,MT,4582604,4386504,4.47
4,AL,17113602,16933967,1.06


In [13]:
# Filtering an Aggregate Query Using Having
query = """
SELECT
    pls14.stabr,
    SUM(pls14.visits) AS visits_2014,
    SUM(pls09.visits) AS visits_2009,
    ROUND( (CAST(SUM(pls14.visits) AS DECIMAL(10, 1)) - SUM(pls09.visits)) /
        SUM(pls09.visits) * 100, 2) AS pct_change
FROM pls_fy2014_pupld14a AS pls14
JOIN pls_fy2009_pupld09a AS pls09
    ON pls14.fscskey = pls09.fscskey
WHERE pls14.visits >=0 AND pls09.visits >= 0
GROUP BY pls14.stabr
HAVING SUM(pls14.visits) > 50000000
ORDER BY pct_change DESC
LIMIT 5
;
"""

execute_query(conn, query)

Unnamed: 0,stabr,visits_2014,visits_2009,pct_change
0,TX,72876601,78838400,-7.56
1,CA,162787836,182181408,-10.65
2,OH,82495138,92402369,-10.72
3,NY,106453546,119810969,-11.15
4,IL,72598213,82438755,-11.94


## Chapter 9: Inspecting and Modifying Data

For this chaper, we will use a direcory of U.S. meat, poultry, and egg producers provided by the Food Safety and Inspection Service (FSIS)  agency within the U.S. Department of Agriculture.

These first initial queries should give us some insight into our dataset:

In [14]:
# Reviweing our imported data
query = """
SELECT COUNT(*)
FROM meat_poultry_egg_inspect;
"""

execute_query(conn, query)

Unnamed: 0,count
0,6287


In [15]:
# Looking for duplicates
query = """
SELECT company,
    street,
    city,
    st,
    COUNT(*) AS address_count
FROM meat_poultry_egg_inspect
GROUP BY company, street, city, st
HAVING COUNT(*) > 1
ORDER BY company, street, city, st;
"""

execute_query(conn, query)

Unnamed: 0,company,street,city,st,address_count
0,Acre Station Meat Farm,17076 Hwy 32 N,Pinetown,NC,2
1,Beltex Corporation,3801 North Grove Street,Fort Worth,TX,2
2,Cloverleaf Cold Storage,111 Imperial Drive,Sanford,NC,2
3,"Crete Core Ingredients, LLC",2220 County Road I,Crete,NE,2
4,"Crider, Inc.",1 Plant Avenue,Stillmore,GA,3
5,"Dimension Marketing & Sales, Inc.",386 West 9400 South,Sandy,UT,2
6,"Foster Poultry Farms, A California Corporation",6648 Highway 15 North,Farmerville,LA,2
7,"Freezer & Dry Storage, LLC",21740 Trolley Industrial Drive,Taylor,MI,2
8,JBS Souderton Inc.,249 Allentown Road,Souderton,PA,2
9,KB Poultry Processing LLC,15024 Sandstone Dr.,Utica,MN,2


In [16]:
# Checking for missing values
query = """
SELECT st,
    COUNT(*) AS state_count
FROM meat_poultry_egg_inspect
GROUP BY st
ORDER BY st NULLS FIRST
LIMIT 10
"""

execute_query(conn, query)

Unnamed: 0,st,state_count
0,AK,17
1,AL,94
2,AR,87
3,AS,1
4,AZ,37
5,CA,666
6,CO,121
7,CT,55
8,DC,2
9,DE,22


In [17]:
# Looking at rows that don't have a value in a column(s)
query = """
SELECT est_number,
    company,
    city,
    st,
    zip
FROM meat_poultry_egg_inspect
WHERE st IS NULL;
"""

execute_query(conn, query)

Unnamed: 0,est_number,company,city,st,zip


In [18]:
# Checking for inconsitent data values
query = """
SELECT company,
    COUNT(*)
FROM meat_poultry_egg_inspect
GROUP BY company
ORDER BY company ASC;
"""

execute_query(conn, query)

Unnamed: 0,company,count
0,121 In-Flight Catering LLC,1
1,165368 C. Corporation,1
2,1732 Meats LLC,1
3,"1st Original Texas Chili Company, Inc.",1
4,290 West Bar & Grill,1
...,...,...
5581,"Zrile Bros. Packing Co.,Inc",1
5582,"Zummo Meat Co., Inc.",1
5583,"Zwanenberg Food Group (USA), Inc.",1
5584,Zweigle's Inc.,1


In [19]:
# Checking for malformed values using length() - here we see that because the zip column was read in as a number rather than as text it lost the leading 0's
query = """
SELECT LENGTH(zip),
    COUNT(*) AS length_count
FROM meat_poultry_egg_inspect
GROUP BY LENGTH(zip)
ORDER BY LENGTH(zip);
"""

execute_query(conn, query)

Unnamed: 0,length,length_count
0,4,496
1,5,5791


In [20]:
# Examining rows that lost leading 0's
query = """
SELECT st,
    COUNT(*) AS st_count
FROM meat_poultry_egg_inspect
WHERE LENGTH(zip) < 5
GROUP BY st
ORDER BY st;
"""

execute_query(conn, query)

Unnamed: 0,st,st_count
0,CT,55
1,MA,101
2,ME,24
3,NH,18
4,NJ,244
5,RI,27
6,VT,27


For the following part of the chapter an identical copy of the st column was created in the meat_poultry_egg_inspect table titled st_copy

In [21]:
# Modifying Tables, Columns, and Data
# Updating the rows where the values were missing
query = """
UPDATE meat_poultry_egg_inspect
SET st = 'MN'
WHERE est_number = 'V18677A';

UPDATE meat_poultry_egg_inspect
SET st = 'AL'
WHERE est_number = 'M45319+P45319';

UPDATE meat_poultry_egg_inspect
SET st = 'WI'
WHERE est_number = 'M263A+P263A+V263A';

SELECT est_number,
    company,
    city,
    st,
    zip
FROM meat_poultry_egg_inspect
WHERE st IS NULL; 
"""

# The last part of the query just inspects the table to find NULL values - after our update there should be none
execute_query(conn, query)


Unnamed: 0,est_number,company,city,st,zip


For the following part of the chapter, an identical copy of the company column was created in the meat_poultry_egg_inspect table titled company_standard

In [23]:
# Updating values for consistency
query = """
UPDATE meat_poultry_egg_inspect
SET company_standard = company;

SELECT * FROM meat_poultry_egg_inspect LIMIT 5;
"""

execute_query(conn, query)

Unnamed: 0,est_number,company,street,city,st,zip,phone,grant_date,activities,dbas,st_copy,company_standard,zip_copy
0,M6714+P6714,A. Concepcion Hnos,105 Aguila Street,Mayaguez,PR,680,(809) 832-4145,1993-02-09,"Meat Processing, Poultry Processing",,PR,A. Concepcion Hnos,680
1,M44941+P44941,"Alcor Foods, Inc.",Ave. Hostos WF-8 Santa Juanita,Bayamon,PR,956,(787) 718-9449,2016-06-20,"Meat Processing, Poultry Processing",,PR,"Alcor Foods, Inc.",956
2,M40232+P40232,"American Butchers, Inc.",Diana Street Lot #5 Amelia Industrial Park,Guaynabo,PR,968,(787) 774-8802,2011-05-11,"Meat Processing, Poultry Processing",,PR,"American Butchers, Inc.",968
3,I319,"Ballester Hermanos, Inc.","Ballester Hermanos, Inc. Westgate Industrial P...",Catano,PR,962,(787) 788-4110,2016-07-13,Imported Product,,PR,"Ballester Hermanos, Inc.",962
4,M896+P896,Bupik Inc.,1052 Nueva Palma 1052-54 Ba. Tras Talleres,San Juan,PR,907,(787) 717-7739,2016-04-15,"Meat Processing, Poultry Processing",Bocadito CG; Don Alfonso; La Borinquena,PR,Bupik Inc.,907


In [24]:
# Updating and standardizing all rows that contain the string 'Armour' to appear in company_standard as Armour_Eckrich Meats
query = """
UPDATE meat_poultry_egg_inspect
SET company_standard = 'Armour_Eckrich Meats'
WHERE company LIKE '%Armour%';

SELECT company, company_standard
FROM meat_poultry_egg_inspect
WHERE company LIKE '%Armour%';
"""

execute_query(conn, query)

Unnamed: 0,company,company_standard
0,"Armour-Eckrich Meats, Inc.",Armour_Eckrich Meats
1,"Armour-Eckrich Meats, LLC",Armour_Eckrich Meats
2,Armour-Eckrich Meats LLC,Armour_Eckrich Meats
3,"Armour-Eckrich Meats, LLC",Armour_Eckrich Meats
4,"Armour - Eckrich Meats, LLC",Armour_Eckrich Meats
5,Armour-Eckrich Meats LLC,Armour_Eckrich Meats
6,Armour-Eckrich Meats LLC,Armour_Eckrich Meats


For the following part of the chapter, an identicalcopy of the zip column was created in the meat_poultry_egg_inspect table titled zip_copy

In [26]:
query = """
UPDATE meat_poultry_egg_inspect
SET zip_copy = zip;

SELECT * FROM meat_poultry_egg_inspect LIMIT 5;
"""

execute_query(conn, query)

Unnamed: 0,est_number,company,street,city,st,zip,phone,grant_date,activities,dbas,st_copy,company_standard,zip_copy
0,M6714+P6714,A. Concepcion Hnos,105 Aguila Street,Mayaguez,PR,680,(809) 832-4145,1993-02-09,"Meat Processing, Poultry Processing",,PR,A. Concepcion Hnos,680
1,M44941+P44941,"Alcor Foods, Inc.",Ave. Hostos WF-8 Santa Juanita,Bayamon,PR,956,(787) 718-9449,2016-06-20,"Meat Processing, Poultry Processing",,PR,"Alcor Foods, Inc.",956
2,M40232+P40232,"American Butchers, Inc.",Diana Street Lot #5 Amelia Industrial Park,Guaynabo,PR,968,(787) 774-8802,2011-05-11,"Meat Processing, Poultry Processing",,PR,"American Butchers, Inc.",968
3,I319,"Ballester Hermanos, Inc.","Ballester Hermanos, Inc. Westgate Industrial P...",Catano,PR,962,(787) 788-4110,2016-07-13,Imported Product,,PR,"Ballester Hermanos, Inc.",962
4,M896+P896,Bupik Inc.,1052 Nueva Palma 1052-54 Ba. Tras Talleres,San Juan,PR,907,(787) 717-7739,2016-04-15,"Meat Processing, Poultry Processing",Bocadito CG; Don Alfonso; La Borinquena,PR,Bupik Inc.,907


In [27]:
# Removing ZIP codes using concatenation
query = """
UPDATE meat_poultry_egg_inspect
SET zip = '00' || zip
WHERE st IN('PR', 'VI') AND LENGTH(zip) = 3;

SELECT * FROM meat_poultry_egg_inspect LIMIT 5;
"""

execute_query(conn, query)

Unnamed: 0,est_number,company,street,city,st,zip,phone,grant_date,activities,dbas,st_copy,company_standard,zip_copy
0,M6714+P6714,A. Concepcion Hnos,105 Aguila Street,Mayaguez,PR,680,(809) 832-4145,1993-02-09,"Meat Processing, Poultry Processing",,PR,A. Concepcion Hnos,680
1,M44941+P44941,"Alcor Foods, Inc.",Ave. Hostos WF-8 Santa Juanita,Bayamon,PR,956,(787) 718-9449,2016-06-20,"Meat Processing, Poultry Processing",,PR,"Alcor Foods, Inc.",956
2,M40232+P40232,"American Butchers, Inc.",Diana Street Lot #5 Amelia Industrial Park,Guaynabo,PR,968,(787) 774-8802,2011-05-11,"Meat Processing, Poultry Processing",,PR,"American Butchers, Inc.",968
3,I319,"Ballester Hermanos, Inc.","Ballester Hermanos, Inc. Westgate Industrial P...",Catano,PR,962,(787) 788-4110,2016-07-13,Imported Product,,PR,"Ballester Hermanos, Inc.",962
4,M896+P896,Bupik Inc.,1052 Nueva Palma 1052-54 Ba. Tras Talleres,San Juan,PR,907,(787) 717-7739,2016-04-15,"Meat Processing, Poultry Processing",Bocadito CG; Don Alfonso; La Borinquena,PR,Bupik Inc.,907


In [28]:
# Updating the remaining rows lacking leading zero's
query = """
UPDATE meat_poultry_egg_inspect
SET zip = '0' || zip
WHERE st IN('CT', 'MA', 'ME', 'NH', 'NJ', 'RI', 'VT') AND LENGTH(zip) = 4;

SELECT * FROM meat_poultry_egg_inspect LIMIT 5;
"""

execute_query(conn, query)

Unnamed: 0,est_number,company,street,city,st,zip,phone,grant_date,activities,dbas,st_copy,company_standard,zip_copy
0,M6714+P6714,A. Concepcion Hnos,105 Aguila Street,Mayaguez,PR,680,(809) 832-4145,1993-02-09,"Meat Processing, Poultry Processing",,PR,A. Concepcion Hnos,680
1,M44941+P44941,"Alcor Foods, Inc.",Ave. Hostos WF-8 Santa Juanita,Bayamon,PR,956,(787) 718-9449,2016-06-20,"Meat Processing, Poultry Processing",,PR,"Alcor Foods, Inc.",956
2,M40232+P40232,"American Butchers, Inc.",Diana Street Lot #5 Amelia Industrial Park,Guaynabo,PR,968,(787) 774-8802,2011-05-11,"Meat Processing, Poultry Processing",,PR,"American Butchers, Inc.",968
3,I319,"Ballester Hermanos, Inc.","Ballester Hermanos, Inc. Westgate Industrial P...",Catano,PR,962,(787) 788-4110,2016-07-13,Imported Product,,PR,"Ballester Hermanos, Inc.",962
4,M896+P896,Bupik Inc.,1052 Nueva Palma 1052-54 Ba. Tras Talleres,San Juan,PR,907,(787) 717-7739,2016-04-15,"Meat Processing, Poultry Processing",Bocadito CG; Don Alfonso; La Borinquena,PR,Bupik Inc.,907


In [32]:
# Making sure that we fixed all zip codes
query = """
SELECT LENGTH(zip),
    COUNT(zip)
FROM meat_poultry_egg_inspect
GROUP BY LENGTH(zip);
"""

execute_query(conn, query)

Unnamed: 0,length,count
0,5,6287
