Advanced SQL study
SELECT grade_level,
AVG(gpa) AS avg_gpa
FROM students
WHERE school_lunch = 'Yes'
GROUP BY grade_level
HAVING avg_gpa < 3.3
ORDER BY grade_level;SELECT: Columns to DisplayFROM: Tables to pull data fromWHERE: Criteria to filter rows byGROUP BY: Column to group the rows byHAVING: Criteria to filter grouped rows byORDER BY: Column to sort values by
"Start Fridays With Grandma's Homemade Oatmeal!"
FROMJOINWHEREGROUP BYHAVINGSELECTORDER BYLIMIT
SELECT DISTINCT grade_level
FROM students;DISTINCT: Return unique values
SELECT COUNT(DISTINCT grade_level)
FROM students;SELECT MAX(gpa) - MIN(gpa) AS gpa_range
FROM students;COUNT,SUM,MAX,AVG,MIN,MAX: Aggregated functions are used to make calc.
SELECT *
FROM students
WHERE grade_level < 12 AND school_lunch = 'Yes';AND,OR,NOT: Logical Operators
SELECT *
FROM students
WHERE grade_level IN (10, 11, 12);SELECT *
FROM students
WHERE email LIKE '%.com';IN,LIKE,BETWEEN,AND,IS NULL: Comparison Keywords
SELECT student_name, gpa
FROM students
ORDER BY gpa DESC;ASCandDESC:ORDER BYkeywords
SELECT student_name, gpa
FROM students
LIMIT 10;LIMIT: Number of rows (TOPfor SQLServer andFETCH FIRSTfor OracleDB)
SELECT student_name,
grade_level,
CASE
WHEN grade_level = 9
THEN 'Freshman'
WHEN grade_level = 10
THEN 'Sophomore'
WHEN grade_level = 11
THEN 'Junior'
ELSE 'Senior'
END AS student_class
FROM students;CASE WHEN,THEN,WHEN,ELSE: IF-ELSE logic in SQL
JOIN: adds related columns from one table to another, based on common columnsUNION: stacks rows from multiple tables with the same column structure
adds related columns from one table to another, based on common columns
- The tables must have at least one column with matching values
- basic
JOINoptions includeINNER JOIN,LEFT JOIN,RIGHT JOINandFULL OUTER JOIN
SELECT *
FROM happiness_score hs
INNER JOIN country_stats cs
ON hs.country = cs.country;INNER: Return Records that exists in BOTH tables, and EXCLUDES unmatched records from either table;LEFT: Return ALL records from the LEFT table, and any matching records from the RIGHT table;RIGHT: Return ALL records from the RIGHT table, and any matching records from the LEFT table;FULL OUTER: Return ALL records from BOTH tables, INCLUDING non-matching records.
Not Supported Joins:
-
SQLite does not support
RIGHT JOINandFULL OUTER JOIN; -
MySQL does not support
FULL OUTER JOIN; -
CROSS JOIN: Return all combinations of rows within two or more tables
WARNING: produces an enormous output (number of rows of one table times the number of the rows of the other table)
Stack multiple tables or queries on top of one another
UNIONremoves duplicate values, whileUNION ALLretains them;UNION ALLruns FASTER thanUNION;
Is query nested within a main query
- Subqueries starts with parentesis and a
SELECTstatement; - the code on the subquery always runs first than the code on the main query;
SELECT *
FROM happiness_scores
WHERE happiness_score >
(
SELECT AVG(happiness_score)
FROM happiness_scores
) Where the subquery can occur?
- Calculations in the
SELECTclause - As part of a
JOINin theFROMclause - Filtering in the
WHEREandHAVINGclauses
A common table experssion (CTE) creates a named, temporary output that can be referenced within another query
Why CTEs instead of subqueries?
- Readability
- Reuseability
- Recursiveness
WITH avg_hs AS (
SELECT AVG(happiness_score) as avg_hs
FROM happiness_scores
)
SELECT *
FROM happiness_scores,
avg_hs
WHERE happiness_score > avg_hsConsider subqueries for simple queries and old RDBMS
Used to apply a function to a "window" of data
- Window: groups of rows
How are window functions different than GROUP BY?
- Aggregate Functions: Collapse the rows in each group and apply calculations
- Window Functions: Leave the rows as they are and apply calculations
Aggregate Function:
SELECT country,
AVG(happiness_score) AS avg_hs
FROM happiness_scores
GROUP BY countryWindow Function:
SELECT country,
year,
happiness_score,
ROW_NUMBER() OVER (
PARTITION BY country
ORDER BY year
) AS row_num
FROM happiness_scores4 main components:
- Function to Apply (required):
ROW_NUMBER,FIRST_VALUE,LAG, etc - Window Function Declaration (required):
OVER - Define Window (optional):
PARTITION BYcolumn, blank = entire table - Sorting the rows (optional, required in Oracle and MSSQL):
ORDER BYcolumn
Basic Window Function
ROW_NUMBER() OVER (
PARTITION BY country
ORDER BY happiness_score
)| Function | Category |
|---|---|
| Row Numbering | ROW_NUMBER RANK DENSE_RANK |
| Value Within a Window | FIRST_VALUE LAST_VALUE NTH_VALUE |
| Value Relative to a Row | LEAD LAG |
| Aggregate Functions | SUM, AVG, COUNT MIN, MAX |
| Statistical Functions | NTILE CUME_DIST PERCENT_RANK |
ROW_NUMBER: assigns a unique sequential index to each row within the partition.RANK: assigns the same rank to tied values, leaving gaps in the ranking sequence.DENSE_RANK: assigns the same rank to tied values, keeping the sequence continuous with no gaps.
FIRST_VALUE(column): retrieves the first value in the window based on the defined order.LAST_VALUE(column): retrieves the last value in the window.NTH_VALUE(column, N)(MSSQL does not support): retrieves the value at the N-th position in the window.
LEAD(column): returns the value from the next row relative to the current row.LAG(column): returns the value from the previous row relative to the current row.
SUM,AVG,COUNT: compute cumulative or windowed totals, averages, and row counts over the defined window.MIN,MAX: return the minimum or maximum value within the window frame.
NTILE(n)(SQLite does not support): divides rows into n approximately equal groups and assigns a bucket number to each row.CUME_DIST: returns the cumulative distribution of a value, representing the proportion of rows with values less than or equal to the current row.PERCENT_RANK: returns the relative rank of a row as a percentage, based on its position within the partition.