# SQL Data Query Language - One Page

This one pager is a reference for all SQL DQL (Data Query Language) commands.

Notes aren't added to every single example but notes have been added to what I deem to be lesser used (or lesser understood) concepts such as correlated subqueries.  The notes added try to explain the idea behind them in concise and simple terms.

Above all, this is intended to be a quick but comprehensive reference and reminder of all SQL data query language commands.

Some examples are my own to help better understanding, many are from online tutorials.

**CONTENTS**

**1. SQL Query Order**

**2. Sorting**

**3. Limiting**

**3. Filtering**

**4. Joining Tables**

**5. Grouping**

**6. SubQuery**

**7. Set Operators**

**8. Common Table Expressions**

**9. Pivot**

**10. Expressions**

# SQL Query Order

**SQL queries are evaluated in the following order:**

- FROM - choose and join tables
- WHERE - filters the data
- GROUP BY - aggregates the data
- HAVING - filters the aggregated data
- SELECT - returns the final data
- ORDER BY - sorts the final data
- LIMIT - limits sorted data to a row count

# Sorting - Order By

`
SELECT 
    first_name
FROM
    sales.customers
ORDER BY 
    first_name DESC
`

ORDER BY multiple cols - city first, then first name eg. 'New York, Brian', 'New York, Jane'

`
ORDER BY 
    city,
    first_name
`

ORDER BY expression

`
SELECT
    first_name
FROM
    sales.customers
ORDER BY 
    LEN(first_name) DESC
`

# Limiting - Offset and Fetch

OFFSET - means skip the first n rows

FETCH FIRST - means how many rows to fetch

- Often ORDER BY is used first, to put the data in it's correct order before limiting results.

`SELECT 
    team_name,
    points,
FROM 
    prem_table
ORDER BY 
    points DESC
OFFSET 4 ROWS
FETCH FIRST 13 ROWS`

# Limiting - Select Top

Get the Top n, top n%, or top n with ties (means if tied for n number, show all tied results!)

Start the query with these options and continue with a normal query

`
SELECT TOP 10
SELECT TOP 1 PERCENT
SELECT TOP 3 WITH TIES
`

# Filtering - Distinct

`
SELECT DISTINCT 
    country
FROM 
    sales.customers
`

Use SELECT DISTINCT with multiple cols

`
SELECT DISTINCT
    city, state
FROM
    sales.customers
`

# Filtering - And Or

- AND is always evaluated first
- This query acts as: get brand_id = 1 OR (brand_id = 2 AND list_price > 40)

`
SELECT 
    product_name,
    brand_id,
    list_price
FROM 
    production.products
WHERE
    brand_id = 1
OR
    brand_id = 2
AND 
    list_price > 40
`

USE parenthesis to make sure you retrieve correct data

`
WHERE
    (brand_id 1 OR brand_id = 2)
AND
    list_price > 40
`

**Use the IN operator** - equivalent to many OR statements

`
SELECT 
    product_id,
    brand_id
FROM
    products
WHERE
    brand_id IN(1,2,3,4)
`

# Filtering - Between/ Not Between

`
SELECT
    product_id,
    product_name,
    list_price
FROM
    production.products
WHERE
    list_price NOT BETWEEN 149.99 AND 199.99
`

Between dates example

Use a string literal 'YYYYMMDD'

`
WHERE
    order_date BETWEEN '20170115' AND '20170117
`

# Filtering - Like

`
WHERE
    last_name LIKE 'z%'
`

`
WHERE
    last_name LIKE '%er'
`

`
WHERE
    last_name LIKE 't%s'
`

`
WHERE
    last_name LIKE '_u%'
`

Wilcard [] means either of these

`
WHERE
    last_name LIKE '[ZY]%'
`

[^A-X] means not starting with this range

`
WHERE
    last_name LIKE '[^A-X]%'
`

Escape clause allows us to use wilcard characters like '%'

`
WHERE 
    comment LIKE '%30!%%' ESCAPE '!';
`

# Filtering - Column Aliases

`
SELECT
    first_name + ' ' + last_name AS full_name
`

Use ' ' for spaces in alias

`
SELECT
    first_name + ' ' + last_name AS 'Full Name'
`

Use aliases for joining tables

`
SELECT
    c.customer_id,
    first_name,
FROM
    sales.customers c
INNER JOIN sales.orders o ON o.customer_id = c.customer_id
`

# Joining Tables - Inner Join

Find all in both tables

`
SELECT 
    c.id candidate_id,
    c.full_name candidate_name,
    e.id employeed_id,
    e.full_name employee_name
FROM 
    candidates c
    INNER JOIN 
        employees e
        ON c.id = e.id
`

Use joins to get information from other tables.

Often we want to retrieve a name instead of an id

`
SELECT 
    product_name, 
    category_name,
    list_price
FROM 
    production.products p
    INNER JOIN production.categories c
        ON p.category_id = c.category_id
`

# Joining Tables - Left Join

Return ALL candidates, not just matching ones in employees table -  NULL will appear if not in employee table

`
SELECT  
    c.id candidate_id,
    c.fullname candidate_name,
    e.id employee_id,
    e.fullname employee_name
FROM 
    hr.candidates c
    LEFT JOIN hr.employees e 
        ON e.fullname = c.fullname
`

List all product names with order_id (NULL will appear for any unsold products (products without order_id))

`
SELECT
    product_name,
    order_id
FROM
    production.products p
    LEFT JOIN sales.order_items o 
        ON o.product_id = p.product_id
    ORDER BY
        order_id;
`

# Joining Tables - Right Join

Opposite of left join

Show all employees in the right column, showing if they appear in the candidates table (or NULL) in the left column.

`
SELECT 
    c.id candidate_id,
    c.full_name candidate_name,
    e.id employee_id,
    e.full_name employee_name
FROM 
    candidates c
    RIGHT JOIN employees e
        ON c.full_name = e.full_name
`

# Joining Tables - Full Outer Join

Mix of left and right joins

Will show all canditates from left table, all employees from right and display if they appear in both tables or show NULL if they appear in one but not the other.

`
SELECT  
    c.id candidate_id,
    c.fullname candidate_name,
    e.id employee_id,
    e.fullname employee_name
FROM 
    hr.candidates c
    FULL JOIN hr.employees e 
        ON e.fullname = c.fullname;
`

RESULTS LIKE THIS:

1 John Doe 1 John Doe

2 Jane Smith NULL NULL

NULL NULL 2 Pete Jones

# Joining Tables - Cross Join

Called a CARTESIAN JOIN

Will create all possible combinations.

Often used as part of a query that will be used with a sub query

Store and Product match is a good example - will show all stores and product combinations...

`
SELECT
    s.store_id,
    p.product_id,
    ISNULL(sales, 0) sales
FROM
    sales.stores s
CROSS JOIN production.products p
    LEFT JOIN (....SUB QUERY...)
`

# Joining Tables - Self Join

Two versions of same table used (with aliases)

Often used for hierarchy queries such as employees and managers in the same table.

`
SELECT
    e.full_name employee,
    m.full_name manager
FROM 
    employees e
    LEFT JOIN employees m
    ON m.employee_id = e.manager_id
`

# Grouping - Group By

 COUNT num of orders per year per customer, eg. 1,2016,2...1,2017,1....2,2017,2...2,2018,1 etc.

`
SELECT
    customer_id,
    YEAR (order_date) order_year,
    COUNT (order_id) order_placed
FROM
    sales.orders
WHERE
    customer_id IN (1, 2)
GROUP BY
    customer_id,
    YEAR (order_date)
ORDER BY
    customer_id; 
`

COUNT customers in every city

`
SELECT
    city,
    COUNT (customer_id) customer_count
FROM
    sales.customers
GROUP BY
    city
ORDER BY
    city;
`

Get AVG list price (join used to retrieve brand name)

`
SELECT
    brand_name,
    AVG (list_price) avg_price
FROM
    production.products p
INNER JOIN production.brands b ON b.brand_id = p.brand_id
WHERE
    model_year = 2018
GROUP BY
    brand_name
ORDER BY
    brand_name;
`

Get net value of every order using SUM()

`
SELECT
    order_id,
    SUM (
        quantity * list_price * (1 - discount)
    ) net_value
FROM
    sales.order_items
GROUP BY
    order_id;
`

# Grouping - Having

HAVING is similar to WHERE clause, often used to filter GROUP BY

NOTE - HAVING is evaluated later in the query, so calculations not column aliases are used.


`
SELECT 
    salesperson_name,
    SUM(sales) sales_total
FROM 
    sales
GROUP BY 
    salesperson_name
HAVING 
    SUM(sales) < 2000
`

`
SELECT
    category_id,
    MAX (list_price) max_list_price,
    MIN (list_price) min_list_price
FROM
    production.products
GROUP BY
    category_id
HAVING
    MAX (list_price) > 4000 OR MIN (list_price) < 500;
`

`
SELECT
    category_id,
    AVG (list_price) avg_list_price
FROM
    production.products
GROUP BY
    category_id
HAVING
    AVG (list_price) BETWEEN 500 AND 1000;
`

# Grouping - Grouping Sets

Without having to perform lengthy UNION statements

GROUPING SETS allow us to define multiple group by queries within the same query.

`
SELECT
	brand,
	category,
	SUM (sales) sales
FROM
	sales.sales_summary
GROUP BY
	GROUPING SETS (
		(brand, category),
		(brand),
		(category),
		()
	)
ORDER BY
	brand,
	category;
`

The result from the above query would return aggregated results for both brand and category, only brand results, only category results and neither brand or category results (total sales):

**Brand Category Sales**

NULL NULL 20000

NULL Bikes 10000

BrandA NULL 5000

BrandA Bikes 2000 


# Grouping - Cube

CUBE is a perfect follow on the GROUPING SETS () function

CUBE creates all possible grouping sets  from given columns

Ideal if you have more than 2 columns --- combos from 3 cols is large using GROUPING SET()...

`
SELECT
    d1,
    d2,
    d3,
    aggregate_function (c4)
FROM
    table_name
GROUP BY
    CUBE (d1, d2, d3);  
`

A partial cube is used to group by brand first, then group by category AND then by NULL showing all results.

**RESULTS:**

brand + category 

brand + NULL (total brand sales)

`
SELECT
    brand,
    category,
    SUM (sales) sales
FROM
    sales.sales_summary
GROUP BY
    brand,
    CUBE(category);
`

# Grouping - Roll Up

Subset of GROUP BY - assumes hierarchy - creates less sets than CUBE

Often used in accounting totals, such as totals for year, month, quarter

ROLLUP(d1,d2,d3) produces:

(d1, d2, d3)
(d1, d2)
(d1)
()

`
SELECT
    brand,
    category,
    SUM (sales) sales
FROM
    sales.sales_summary
GROUP BY
    ROLLUP(brand, category);
`

The above query returns:

brand + category

brand + NULL ( brand total)

NULL + NULL (overall total)

**RESULTS:** 

BrandA CategoryA 2000

BrandA NULL 4000

NULL NULL 6000


`
SELECT
    category,
    brand,
    SUM (sales) sales
FROM
    sales.sales_summary
GROUP BY
    ROLLUP (category, brand)
`

The above produces the following results:
    
category + brand

category + NULL (category total)

NULL + NULL (overall total)

`
SELECT
    brand,
    category,
    SUM (sales) sales
FROM
    sales.sales_summary
GROUP BY
    brand,
    ROLLUP (category);
`

The above is a 'partial' ROLLUP and will only return:

brand + category

brand + NULL (brand total sales)

# SubQuery - Overview

A subquery is evaluated first.

In this example, the subquery obtaining all customer_id of customers in New York is evaluated and returned first
...

`
SELECT
    order_id,
    order_date,
    customer_id
FROM
    sales.orders
WHERE
    customer_id IN (
        SELECT
            customer_id
        FROM
            sales.customers
        WHERE
            city = 'New York'
    )
ORDER BY
    order_date DESC;
`

A subquery can be used as a column expression where a single result is returned:

`
SELECT
    order_id,
    order_date,
    (
        SELECT
            MAX (list_price)
        FROM
            sales.order_items i
        WHERE
            i.order_id = o.order_id
    ) AS max_list_price
FROM
    sales.orders o
ORDER BY order_date desc;
`

Subquery can be used with:

IN

ANY

ALL

EXISTS/ NOT EXISTS

FROM

# SubQuery - Correlated SubQuery

A correlated subquery relies on the outer table.  A correlated subquery is also called a repeating subquery.

A correlated subquery is executed repeatedly, asessed each time by each outer query row.

**This query finds products whose list price is equal to the highest list price of products in the same category:**

1. The subquery gets the category_id of the PRODUCT (SUBQUERY WHERE clause.) 

2. The subquery then finds the MAX list_price for that category.

3. The product in the outer query then asesses if its' list_price matches the MAX list_price for its' category_id or not.

**EXAMPLE SUBQUERY RESULTS:**

**Highest list_price in each category**

600 (category 1)

900 (category 2 etc.)

1800 (category 3 etc.)

**OUTER QUERY PRODUCT BY PRODUCT PROCESS:**

**Blue Mountain Bike, 600, 1**

---> Subquery finds max list_price in **category_id = 1** (matched in WHERE clause) -> 600

---> Does the list_price of this product match the MAX list_price in category_id = 1 -> **YES - RETURN Blue Mountain Bike** 

**Red Mountain Bike, 500, 1**

---> Subquery finds max list_price in **category_id = 1** (matched in WHERE clause) -> 600

---> Does the list_price of this product match the MAX list_price in category_id = 1 -> **NO**

**Yellow Mountain Bike, 900, 2**

---> Subquery finds max list_price in **category_id = 2** (matched in WHERE clause) -> 900

---> Does the list_price of this product match the MAX list_price in category_id = 2 -> **YES - RETURN Yellow Mountain Bike** 

`
SELECT
    product_name,
    list_price,
    category_id
FROM
    production.products p1
WHERE
    list_price IN (
        SELECT
            MAX (p2.list_price)
        FROM
            production.products p2
        WHERE
            p2.category_id = p1.category_id
        GROUP BY
            p2.category_id
    )
ORDER BY
    category_id,
    product_name;
`

# SubQuery - Exists

- EXISTS works with correlated subqueries that use the outer query.
- In this example, the **WHERE** clause in the EXISTS subquery checks if the outer customer_id is found inside the results from the subquery.
- ---> If **TRUE** the row is returned, if **FALSE** the row is not returned.
- The subquery has the COUNT() of customer_ids that appear more than twice in the orders table.
- Every row from the outer query is checked to see if its customer_id appears in the customer_id column of the subquery.

`
SELECT
    customer_id,
    first_name,
    last_name
FROM
    sales.customers c
WHERE
    EXISTS (
        SELECT
            COUNT (*)
        FROM
            sales.orders o
        WHERE
            customer_id = c.customer_id
        GROUP BY
            customer_id
        HAVING
            COUNT (*) > 2
    )
ORDER BY
    first_name,
    last_name;
`

- Sometimes a simpler IN subquery without a correlated subquery can achieve the same result.

`
SELECT
    *
FROM
    sales.orders
WHERE
    customer_id IN (
        SELECT
            customer_id
        FROM
            sales.customers
        WHERE
            city = 'San Jose'
    )
ORDER BY
    customer_id,
    order_date;
`

`
SELECT
    *
FROM
    sales.orders o
WHERE
    EXISTS (
        SELECT
            customer_id
        FROM
            sales.customers c
        WHERE
            o.customer_id = c.customer_id
        AND city = 'San Jose'
    )
ORDER BY
    o.customer_id,
    order_date;
`

**NOTE:** With JOIN, you actively want to get columns from another table.

EXISTS merely checks they exist in another table!

# SubQuery - Any

ANY is used with conditional operators = > < to compare the rows of an outer query with a match of an inner query.

This query finds ANY product that was sold with more than 2 items in the order.

`
SELECT
    product_name,
    list_price
FROM
    production.products
WHERE
    product_id = ANY (
        SELECT
            product_id
        FROM
            sales.order_items
        WHERE
            quantity >= 2
    )
ORDER BY
    product_name;
`

# SubQuery - All

ALL is used with conditional operators = > < to compare the rows of an outer query with a match of an inner query.

The key here is that ALL conditions must match to return TRUE and include the outer row in the result set.

This query returns all products whose list price is greater than ALL of the average list prices grouped by brand.

`
SELECT
    product_name,
    list_price
FROM
    production.products
WHERE
    list_price > ALL (
        SELECT
            AVG (list_price) avg_list_price
        FROM
            production.products
        GROUP BY
            brand_id
    )
ORDER BY
    list_price;
`

# Set Operators - Union, Union All

- JOIN combines columns, UNION combines rows
- UNION removes duplicates, UNION ALL maintaints duplicates.

UNION EXAMPLE

A, B  UNION A+B

1, 2, 1

2, 3, 2

      3

- Here we get all staff and customers into one set
- Dulicates will be removed here

`
SELECT
    first_name,
    last_name
FROM
    sales.staffs
UNION
SELECT
    first_name,
    last_name
FROM
    sales.customers;
`

- UNION ALL can be useful when countingstaff and customers in total (if you want to include all staff who are also customers in the figures)

`
SELECT
    first_name,
    last_name
FROM
    sales.staffs
UNION ALL
SELECT
    first_name,
    last_name
FROM
    sales.customers;
`

# Set Operators - Intersect

- Both queries must have matching name and type of columns for INTERSECT to work.
- INTERSECT finds items that appear in both.
- This example finds players who appeared in both player of the year tables.

`
SELECT 
    player_name
FROM 
    player_of_the_year 2021
INTERSECT
SELECT 
    player_id
FROM 
    player_of_the_year_2020
`

# Set Operators - Except

- EXCEPT tells you which items from TABLE 1 do not appear in TABLE 2
- This query tells us which staff members have not taken the health and safety exam at work.

`
SELECT 
    first_name,
    last_name
FROM
    employees
EXCEPT
SELECT
    first_name,
    last_name
FROM 
    completed_health_and_safety
`

# Common Table Expressions

CTE -> Temporary result set - can use it with SELECT, INSERT, UPDATE DELETE or MERGE

- More readable than subqueries
- Uses WITH keyword
- Specifying column names is optional. If not specified, the column name given in the query is used

**NOTE: This is an example for syntax revision, not the most optimal way to retrieve sales totals for each sales person for 2019.**

`
WITH sales_person_yearly_total (sales_person_id, sales, year) AS (
SELECT
    id,
    SUM(sales_total)
    YEAR(sales_date)
FROM 
    sales
GROUP BY 
    id,
    YEAR(sales_date)
)
SELECT 
    sales_person_id,
    sales,
    year
FROM 
    sales_person_yearly_total
WHERE
    year = 2019
`

- An example of CTE in use would be to calculate the COUNT of a number of rows in the CTE... then use the outer query to calculate an average number of sales.

`
WITH cte_sales AS (
    SELECT 
        staff_id, 
        COUNT(*) order_count  
    FROM
        sales.orders
    WHERE 
        YEAR(order_date) = 2018
    GROUP BY
        staff_id
)
SELECT
    AVG(order_count) average_orders_by_staff
FROM 
    cte_sales;
`

- You can setup multiple CTEs to be used by an outer query....

`
WITH cte_1 AS (
SELECT ....
),
WITH cte_2 AS (
SELECT ...
)
SELECT 
    col_a,
    col_b
FROM 
    cte_1
    INNER JOIN cte_2 ON...
WHERE...
ORDER BY...
`

# Pivot

- Pivoting is similarly used in Pandas and Excel spreadsheets to switch columns to rows, add new initial rows and produce aggregated totals of a cross section such as year (row).. category(col) -> total

- The initial aim is to show the COUNT(product_id) with each category as a column instead of a row. 
- We essentially pivot the columns to rows (we will be able to add new rows afterwards)
- NOTE - you have to use a CTE or a derived query as used here:

**CatA, CatB, CatC, CatD**

50,25,60,35

`
SELECT * FROM   
(
    SELECT 
        category_name, 
        product_id
    FROM 
        products p
        INNER JOIN categories c 
            ON c.category_id = p.category_id
) t 
PIVOT(
    COUNT(product_id) 
    FOR category_name IN (
        [CatA], 
        [CatB], 
        [CatC], 
        [CatD]
) AS pivot_table;
`

- Any additional column in the base data (model_year in SELECT statement here) will generate ROW GROUPS in the pivot table.

**YEAR CatA, CatB, CatC, CatD**

2019,30,15,25,10

2020,20,10,5,10

2021,0,0,30,15

`
SELECT * FROM   
(
    SELECT 
        category_name, 
        product_id,
        model_year
    FROM 
        products p
        INNER JOIN categories c 
            ON c.category_id = p.category_id
) t 
PIVOT(
    COUNT(product_id) 
    FOR category_name IN (
        [CatA], 
        [CatB], 
        [CatC], 
        [CatD]
) AS pivot_table;
`

# Expressions - Case

- CASE adds if, else logic to queries.
- For example, you could return the word 'PASS' if a score was over 70, or 'FAIL' if less or equal to 70.
- The example here performs SUM() on every row adding 1 or 0 to a 'Pass' or 'Fail' column based on the condition provided with CASE.

`
SELECT 
    SUM(CASE
        WHEN test_score > 70
        THEN 1
        ELSE 0
    END) AS 'Pass',
    SUM(CASE
        WHEN test_score <= 70
        THEN 1
        ELSE 0
    END) AS 'Fail'
FROM 
    test_scores
`

- You can create new columns with case as well as returning original column totals.
- In this example, a new column calld 'outcome' is created with three possible values - 'Pass' 'Possible retake' and 'Fail'

`
SELECT 
    name, 
    score,
    CASE
        WHEN score > 70
            THEN 'Pass'
        WHEN score <= 70 AND score >= 60
            THEN 'Possible retake'
        WHEN score <60 
            THEN 'Fail'
    END outcome
FROM 
    test_scores
`

# Expressions - Coalesce

- COALESCE helps handle NULL values effectively.
- It returns the first value in a list of inputs that is not NULL
- If all values are NULL, then it will return NULL

- In this example we handle phone numbers that might be null, so we want to return a more user friendly string.

`
SELECT 
    first_name,
    last_name,
    COALESCE(phone,'N/A')
FROM ........
`

- This clever use of COALESCE handles a situation where employers have their salary added as either hourly, daily or monthly rate.
- The table returns NULL for monthly and daily if the rate is hourly.
- The query uses the NULL values where a calculation returns NULL, to go through each to calculate the monthly rate based on the first non null value found.

`
SELECT
    staff_id,
    COALESCE(
        hourly_rate*22*8, 
        weekly_rate*4, 
        monthly_rate
    ) monthly_salary
FROM
    salaries;
`

NOTE: - COALESCE and CASE are similar when CASE is used like this.....

`
COALESCE(e1,e2,e3)
......
CASE
    WHEN e1 IS NOT NULL THEN e1
    WHEN e2 IS NOT NULL THEN e2
    ELSE e3
END
`

# Expressions - NULLIF

- NULLIF returns the two values passed in are equal.
- The WHERE clause in this example says get all emails that are NULL but count them as NULL if they are blank

`
SELECT 
    name,
    email
FROM 
    clients
WHERE
    NULLIF(email, '') IS NULL
`

- NULLIF and CASE can also be created in a similar way...

`
SELECT 
    NULLIF(a,b)
........ equal to .....
CASE
    WHEN a=b
    THEN NULL
END
`