Comprehensive SQL Learning Guide for Data Analytics | Mastery in 50+ SQL Queries
The relational model is founded on logic.
- Author: Kintsugi-Programmer
Disclaimer: The content presented here is a curated blend of my personal learning journey, experiences, open-source documentation, and invaluable knowledge gained from diverse sources. I do not claim sole ownership over all the material; this is a community-driven effort to learn, share, and grow together.
- kintsugi-stack-sql
- Master SQL for Data Analysis
- Overview
- Getting Started with SQL
- Part 1: SELECT Statements and Basic Queries
- Part 2: WHERE Clauses - Filtering Data
- Part 3: Logical Operators - AND, OR, NOT
- Part 4: Conditional Logic with CASE
- Part 5: JOINs - Combining Multiple Tables
- Part 6: GROUP BY and Aggregation Functions
- Advanced Tips and Best Practices
- Learning Resources and Next Steps
- Quick Reference
- Advanced SQL & Database Design for Backend Engineering
- Part 7: DDL — Building and Managing Tables
- Part 8: Data Integrity and Constraints
- Part 9: Transactions and Concurrency (ACID)
- Part 10: Advanced Analytics (Window Functions & CTEs)
- Part 11: Subqueries and Set Operations
- Part 12: Advanced Grouping
- Part 13: Performance Tuning (Indexes & Explain Plans)
- Part 14: Programmable SQL (Procedures, Functions, Triggers)
- Part 15: Modern SQL & Data Modeling
- Part 16: Administration & Security
- Part 17: Best Practices & Anti-Patterns
- Prac
- Conclusion
This guide covers mastery of SQL through 50 practical queries. Topics include SELECT operations, WHERE clauses, logical operators (AND, OR, NOT), JOINs, GROUP BY, ORDER BY, and advanced SQL techniques. The guide uses the "Awesome Chocolates" database with real-world examples and practical tips.
Tool Used: MySQL Workbench / BeeKeeper Studio
Database: Awesome Chocolates (must be downloaded and loaded)
Instructions: Refer to video description for database setup instructions
- Click the Plus SQL button in the corner to open the query editor
- View output results directly in the workbench grid
- Execute queries using Ctrl + Enter or the Run command
Note: Shortcuts differ depending on your system and SQL management tool:
- MySQL Workbench: Ctrl + Enter
- SQL Server Management Studio (SSMS): Different shortcut
- Oracle: Different shortcut
Whenever you are getting started, take some minutes to know your db, it will help you build better sql queries !!!
The database contains four main tables:
- Geography table
- People table
- Products table
- Sales table
Before writing queries, understand your data structure.
View all tables in database:
SHOW TABLES;| Tables_in_awesome chocolates |
|---|
| geo |
| people |
| products |
| sales |
...
View table structure and columns:
DESCRIBE sales;| Field | Type | Null | Key | Default | Extra |
|---|---|---|---|---|---|
| SPID | text | YES | null | ||
| GeoID | text | YES | null | ||
| PID | text | YES | null | ||
| SaleDate | datetime | YES | null | ||
| Amount | int | YES | null |
...
Actual mysql console response :
mysql> desc sales;
+-----------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+----------+------+-----+---------+-------+
| SPID | text | YES | | NULL | |
| GeoID | text | YES | | NULL | |
| PID | text | YES | | NULL | |
| SaleDate | datetime | YES | | NULL | |
| Amount | int | YES | | NULL | |
| Customers | int | YES | | NULL | |
| Boxes | int | YES | | NULL | |
+-----------+----------+------+-----+---------+-------+
7 rows in set (0.01 sec)
mysql>
What this returns:
- Column names
- Data types
- Additional information about each column
Critical Rule: When writing SQL queries, you must be familiar with underlying tables and their relationships. Without this knowledge, writing SQL becomes extremely difficult.
Query Limit: MySQL Workbench displays a maximum of 1000 rows at a time, even if the table contains more data.
Example: Sales table contains ~7000 rows, but only 1000 display in the workbench
When to Adjust Limits:
- When building queries: You only need to verify it works correctly
- When exporting data: You may need to remove or increase the limit to see all data
- When using data elsewhere: Export to Power BI or other systems
Query:
SELECT * FROM sales;| SPID | GeoID | PID | SaleDate | Amount | Customers | Boxes |
|---|---|---|---|---|---|---|
| SP01 | G4 | P04 | 2021-01-01 00:00:00 | 8414 | 276 | 495 |
| SP02 | G3 | P14 | 2021-01-01 00:00:00 | 532 | 317 | 54 |
| SP12 | G2 | P08 | 2021-01-01 00:00:00 | 5376 | 178 | 269 |
| SP01 | G4 | P15 | 2021-01-01 00:00:00 | 259 | 32 | 22 |
| SP19 | G2 | P18 | 2021-01-01 00:00:00 | 5530 | 4 | 179 |
...
Explanation:
SELECT *means select everything (all columns)FROM salesspecifies the table name- Displays all rows and all columns from the sales table
Output: Complete sales table with all data visible in grid format
Query:
SELECT saledate, amount, customers FROM sales;| SaleDate | Amount | Customers |
|---|---|---|
| 2021-01-01 00:00:00 | 8414 | 276 |
| 2021-01-01 00:00:00 | 532 | 317 |
| 2021-01-01 00:00:00 | 5376 | 178 |
| 2021-01-01 00:00:00 | 259 | 32 |
| 2021-01-01 00:00:00 | 5530 | 4 |
...
Explanation:
- Specify only the columns you want to see
- Columns are separated by commas
- Result shows only these three columns for all rows
Auto-Suggest Feature: Type the table name first, then add columns. This enables auto-suggest to help prevent spelling mistakes.
Better Practice:
SELECT
FROM salesThen come back and add columns between SELECT and FROM.
Query:
SELECT amount, customers, geoid FROM sales;| Amount | Customers | GeoID |
|---|---|---|
| 8414 | 276 | G4 |
| 532 | 317 | G3 |
| 5376 | 178 | G2 |
| 259 | 32 | G4 |
| 5530 | 4 | G2 |
...
Explanation:
- Columns do not have to appear in their original database order
- They display in the order you specify in the SELECT statement
- Results are rearranged automatically based on your specification
Calculate Amount Per Box:
SELECT
saledate,
amount,
boxes,
amount / boxes
FROM sales;| SaleDate | Amount | Boxes | Amount/Boxes |
|---|---|---|---|
| 2021-01-01 00:00:00 | 8414 | 495 | 16.9980 |
| 2021-01-01 00:00:00 | 532 | 54 | 9.8519 |
| 2021-01-01 00:00:00 | 5376 | 269 | 19.9851 |
| 2021-01-01 00:00:00 | 259 | 22 | 11.7727 |
| 2021-01-01 00:00:00 | 5530 | 179 | 30.8939 |
...
Explanation:
- You can perform arithmetic operations directly in SELECT statements
- Operations include addition (+), subtraction (-), multiplication (*), division (/)
- Results appear as an extra column in the output
- The calculated column name defaults to the operation itself
Problem: Column name amount / boxes is not user-friendly
Solution: Add Column Aliases Using AS:
SELECT
saledate,
amount,
boxes,
amount / boxes AS 'amount_per_box'
FROM sales;| SaleDate | Amount | Boxes | amount_per_box |
|---|---|---|---|
| 2021-01-01 00:00:00 | 8414 | 495 | 16.9980 |
| 2021-01-01 00:00:00 | 532 | 54 | 9.8519 |
| 2021-01-01 00:00:00 | 5376 | 269 | 19.9851 |
| 2021-01-01 00:00:00 | 259 | 22 | 11.7727 |
| 2021-01-01 00:00:00 | 5530 | 179 | 30.8939 |
...
Alternative Without AS Keyword:
SELECT
saledate,
amount,
boxes,
amount / boxes 'amount_per_box'
FROM sales;| SaleDate | Amount | Boxes | amount_per_box |
|---|---|---|---|
| 2021-01-01 00:00:00 | 8414 | 495 | 16.9980 |
| 2021-01-01 00:00:00 | 532 | 54 | 9.8519 |
| 2021-01-01 00:00:00 | 5376 | 269 | 19.9851 |
| 2021-01-01 00:00:00 | 259 | 22 | 11.7727 |
| 2021-01-01 00:00:00 | 5530 | 179 | 30.8939 |
Note: Both methods produce identical results. The AS keyword creates a synonym for the column.
The alias makes your output more readable and professional. Many times you want to give calculated columns proper names for clarity.
Definition: The WHERE clause allows you to impose conditions on your query results.
Concept: WHERE clause in SQL is like filtering in Excel. Set filter criteria to show only specific data.
Importance: WHERE clauses are one of the most important aspects of SQL for data analysis.
Query:
SELECT * FROM sales
WHERE amount > 10000;| SPID | GeoID | PID | SaleDate | Amount | Customers | Boxes |
|---|---|---|---|---|---|---|
| SP06 | G4 | P01 | 2021-01-01 00:00:00 | 12894 | 115 | 478 |
| SP10 | G1 | P06 | 2021-01-01 00:00:00 | 15596 | 32 | 975 |
| SP25 | G6 | P05 | 2021-01-01 00:00:00 | 14273 | 335 | 752 |
| SP18 | G2 | P21 | 2021-01-04 00:00:00 | 19229 | 64 | 1013 |
| SP23 | G1 | P16 | 2021-01-05 00:00:00 | 17248 | 163 | 664 |
...
Explanation:
WHEREspecifies the filter conditionamount > 10000means only rows where amount is greater than 10,000- All rows where amount is NOT greater than 10,000 are excluded
- Only qualifying rows display in results
Comparison Operators:
>Greater than<Less than=Equal to>=Greater than or equal to<=Less than or equal to!=or<>Not equal to
Query:
SELECT * FROM sales
WHERE amount > 10000
ORDER BY amount;| SPID | GeoID | PID | SaleDate | Amount | Customers | Boxes |
|---|---|---|---|---|---|---|
| SP02 | G1 | P07 | 2021-09-17 00:00:00 | 10010 | 257 | 358 |
| SP01 | G2 | P17 | 2021-08-30 00:00:00 | 10017 | 163 | 835 |
| SP21 | G3 | P22 | 2021-11-18 00:00:00 | 10017 | 111 | 418 |
| SP18 | G5 | P18 | 2021-10-27 00:00:00 | 10017 | 77 | 1113 |
| SP23 | G5 | P03 | 2021-05-06 00:00:00 | 10024 | 32 | 627 |
...
Explanation:
- Filters results to show only amounts greater than 10,000
- Orders results by amount in ascending order (lowest to highest)
- Results start at 10,000+ and increase gradually
Ascending Order (Default):
SELECT * FROM sales
WHERE amount > 10000
ORDER BY amount;| SPID | GeoID | PID | SaleDate | Amount | Customers | Boxes |
|---|---|---|---|---|---|---|
| SP02 | G1 | P07 | 2021-09-17 00:00:00 | 10010 | 257 | 358 |
| SP01 | G2 | P17 | 2021-08-30 00:00:00 | 10017 | 163 | 835 |
| SP21 | G3 | P22 | 2021-11-18 00:00:00 | 10017 | 111 | 418 |
| SP18 | G5 | P18 | 2021-10-27 00:00:00 | 10017 | 77 | 1113 |
| SP23 | G5 | P03 | 2021-05-06 00:00:00 | 10024 | 32 | 627 |
...
Descending Order (Highest to Lowest):
SELECT * FROM sales
WHERE amount > 10000
ORDER BY amount DESC;| SPID | GeoID | PID | SaleDate | Amount | Customers | Boxes |
|---|---|---|---|---|---|---|
| SP23 | G4 | P03 | 2021-02-12 00:00:00 | 27146 | 329 | 1939 |
| SP18 | G5 | P10 | 2021-04-22 00:00:00 | 25207 | 22 | 1483 |
| SP03 | G5 | P22 | 2021-06-02 00:00:00 | 24633 | 39 | 986 |
| SP18 | G1 | P11 | 2021-10-21 00:00:00 | 24451 | 472 | 1112 |
| SP18 | G2 | P16 | 2021-03-30 00:00:00 | 24367 | 272 | 3481 |
...
Query:
SELECT * FROM sales
WHERE geoid = 'g1'
ORDER BY pid, amount DESC;| SPID | GeoID | PID | SaleDate | Amount | Customers | Boxes |
|---|---|---|---|---|---|---|
| SP14 | G1 | P01 | 2022-02-25 00:00:00 | 22897 | 43 | 1347 |
| SP21 | G1 | P01 | 2022-01-07 00:00:00 | 18130 | 24 | 1008 |
| SP11 | G1 | P01 | 2021-01-27 00:00:00 | 17402 | 43 | 697 |
| SP08 | G1 | P01 | 2021-09-13 00:00:00 | 16681 | 274 | 596 |
| SP08 | G1 | P01 | 2022-01-10 00:00:00 | 16121 | 55 | 896 |
...
Explanation:
- Filters to show only GeoID 'g1' records
- First sorts by product ID (pid)
- Within each product ID, sorts by amount in descending order
- You can specify multiple ORDER BY columns separated by commas
Result Structure:
- All p01 items grouped together, sorted by amount (highest first)
- Then p02 items grouped together, sorted by amount
- Then p03, p04, etc.
Scenario: Find all sales with amount > 10,000 in the year 2022
Query Method 1 - Using Date Comparison:
SELECT * FROM sales
WHERE amount > 10000
AND saledate >= '2022-01-01';
-- use quote in dates `2022-01-01' : correct, 2022-01-01 : incorrect| SPID | GeoID | PID | SaleDate | Amount | Customers | Boxes |
|---|---|---|---|---|---|---|
| SP15 | G3 | P11 | 2022-01-05 00:00:00 | 14553 | 152 | 910 |
| SP16 | G3 | P22 | 2022-01-28 00:00:00 | 10255 | 53 | 733 |
| SP04 | G4 | P05 | 2022-01-28 00:00:00 | 16800 | 92 | 800 |
| SP05 | G1 | P02 | 2022-01-21 00:00:00 | 16121 | 487 | 621 |
| SP21 | G2 | P22 | 2022-01-11 00:00:00 | 12481 | 177 | 1041 |
...
Explanation:
- Filters to amount greater than 10,000
- AND date is within 2022 or later
- Both conditions must be true for rows to display
MySQL Date Format: YYYY-MM-DD
Date Example: 2022-01-01 represents January 1st, 2022
Query Method 2 - Using YEAR Function:
SELECT saledate, amount
FROM sales
WHERE amount > 10000
AND YEAR(saledate) = 2022
ORDER BY amount DESC;| saledate | amount |
|---|---|
| 2022-02-16 00:00:00 | 23912 |
| 2022-03-15 00:00:00 | 23184 |
| 2022-02-25 00:00:00 | 22897 |
| 2022-03-01 00:00:00 | 22603 |
| 2022-02-25 00:00:00 | 22155 |
...
Explanation:
YEAR()is a built-in SQL function that extracts the year from a date- Works similar to YEAR function in Excel
- Returns a number, so no quotes needed around 2022
- More flexible than date comparison method
Advantage: Cleaner and more maintainable than parsing dates manually
Scenario: Find all sales with boxes between 0 and 50
Query Method 1 - Using AND:
SELECT * FROM sales
WHERE boxes > 0
AND boxes <= 50;| SPID | GeoID | PID | SaleDate | Amount | Customers | Boxes |
|---|---|---|---|---|---|---|
| SP01 | G4 | P15 | 2021-01-01 00:00:00 | 259 | 32 | 22 |
| SP14 | G5 | P16 | 2021-01-01 00:00:00 | 1036 | 370 | 37 |
| SP12 | G6 | P09 | 2021-01-04 00:00:00 | 147 | 9 | 11 |
| SP04 | G1 | P20 | 2021-01-06 00:00:00 | 644 | 116 | 34 |
| SP10 | G2 | P01 | 2021-01-08 00:00:00 | 420 | 196 | 14 |
...
Query Method 2 - Using BETWEEN(it includes limits):
SELECT * FROM sales
WHERE boxes BETWEEN 0 AND 50;| SPID | GeoID | PID | SaleDate | Amount | Customers | Boxes |
|---|---|---|---|---|---|---|
| SP01 | G4 | P15 | 2021-01-01 00:00:00 | 259 | 32 | 22 |
| SP14 | G5 | P16 | 2021-01-01 00:00:00 | 1036 | 370 | 37 |
| SP12 | G6 | P09 | 2021-01-04 00:00:00 | 147 | 9 | 11 |
| SP04 | G1 | P20 | 2021-01-06 00:00:00 | 644 | 116 | 34 |
| SP10 | G2 | P01 | 2021-01-08 00:00:00 | 420 | 196 | 14 |
...
Explanation:
- BETWEEN is inclusive on both ends
- Range includes 0 and 50
- Both methods produce the same results
- BETWEEN is more concise and readable
Note: Both methods are valid. Use whichever is more comfortable for you.
Scenario: Find all sales that occurred on Fridays
Query:
SELECT
saledate,
amount,
boxes,
WEEKDAY(saledate) AS day_of_week
FROM sales
WHERE WEEKDAY(saledate) = 4;| SaleDate | Amount | Boxes | day_of_week |
|---|---|---|---|
| 2021-01-01 00:00:00 | 8414 | 495 | 4 |
| 2021-01-01 00:00:00 | 532 | 54 | 4 |
| 2021-01-01 00:00:00 | 5376 | 269 | 4 |
| 2021-01-01 00:00:00 | 259 | 22 | 4 |
| 2021-01-01 00:00:00 | 5530 | 179 | 4 |
Explanation:
WEEKDAY()is a built-in function that returns day of week as a number- Weekday numbering: 0=Monday, 1=Tuesday, 2=Wednesday, 3=Thursday, 4=Friday, 5=Saturday, 6=Sunday
- Friday = 4, so the condition is
WEEKDAY(SaleDate) = 4 - Can also create alias
AS day_of_weekfor clarity
Important Note: When using a calculated column in WHERE clause, you cannot reference the alias. You must repeat the calculation:
-- CORRECT:
WHERE WEEKDAY(SaleDate) = 4
-- INCORRECT (will give error):
WHERE day_of_week = 4Scenario: Find all people in either "Delish" or "Juices" team
Query Method 1 - Using Multiple OR Conditions:
SELECT * FROM people
WHERE team = 'Delish'
OR team = 'Juices';| Salesperson | SPID | Team | Location |
|---|---|---|---|
| Wilone O'Kielt | SP04 | Delish | Hyderabad |
| Gigi Bohling | SP05 | Delish | Hyderabad |
| Curtice Advani | SP06 | Delish | Hyderabad |
| Kaine Padly | SP07 | Delish | Hyderabad |
| Andria Kimpton | SP09 | Jucies | Hyderabad |
...
Explanation:
- Shows results where team equals 'Delish' OR team equals 'Juices'
- Either condition being true includes the row
- Person cannot be in both teams, so OR is appropriate
Limitation: If you need many possible values, OR conditions become repetitive and difficult to maintain
Query Method 2 - Using IN (Cleaner Approach):
SELECT * FROM people
WHERE team IN ('Delish', 'Juices');| Salesperson | SPID | Team | Location |
|---|---|---|---|
| Wilone O'Kielt | SP04 | Delish | Hyderabad |
| Gigi Bohling | SP05 | Delish | Hyderabad |
| Curtice Advani | SP06 | Delish | Hyderabad |
| Kaine Padly | SP07 | Delish | Hyderabad |
| Andria Kimpton | SP09 | Jucies | Hyderabad |
...
Explanation:
INis shorthand for multiple OR conditions- Specify multiple values in parentheses, separated by commas
- All text values must be in single quotes
- More flexible when you have many possible values
- More readable and maintainable
Advantage over OR: When you need 5, 7, or 10+ possible values, IN is much cleaner than chaining multiple OR conditions.
Scenario: Find all people whose name begins with 'B'
Query:
SELECT * FROM people
WHERE salesperson LIKE 'B%';| Salesperson | SPID | Team | Location |
|---|---|---|---|
| Barr Faughny | SP01 | Yummies | Hyderabad |
| Brien Boise | SP10 | Jucies | Wellington |
| Beverie Moffet | SP19 | Jucies | Seattle |
| Benny Karolovsky | SP32 | Jucies | Paris |
...
Explanation:
LIKEoperator performs pattern matchingB%means: starts with 'B', followed by anything (%)%is a wildcard meaning "any character, zero or more times"- Names starting with B: Boris, Bonnie, etc.
Pattern Matching Examples:
Find names starting with B:
SELECT * FROM people
WHERE salesperson LIKE 'B%';| Salesperson | SPID | Team | Location |
|---|---|---|---|
| Barr Faughny | SP01 | Yummies | Hyderabad |
| Brien Boise | SP10 | Jucies | Wellington |
| Beverie Moffet | SP19 | Jucies | Seattle |
| Benny Karolovsky | SP32 | Jucies | Paris |
Find names containing B anywhere:
SELECT * FROM people
WHERE salesperson LIKE '%B%';| Salesperson | SPID | Team | Location |
|---|---|---|---|
| Barr Faughny | SP01 | Yummies | Hyderabad |
| Gigi Bohling | SP05 | Delish | Hyderabad |
| Ches Bonnell | SP08 | Hyderabad | |
| Brien Boise | SP10 | Jucies | Wellington |
| Marney O'Breen | SP16 | Jucies | Wellington |
...
Find names ending with B:
SELECT * FROM people
WHERE salesperson LIKE '%l';| Salesperson | SPID | Team | Location |
|---|---|---|---|
| Ches Bonnell | SP08 | Hyderabad | |
| Oby Sorrel | SP20 | Jucies | Seattle |
| Van Tuxwell | SP23 | Yummies | Seattle |
...
Find names with B as second character:
LIKE '_B%'| Salesperson | SPID | Team | Location |
|---|---|---|---|
| Oby Sorrel | SP20 | Jucies | Seattle |
| Ebonee Roxburgh | SP28 | Paris |
...
(Where _ means exactly one character)
Scenario: Find sales where the team is NOT equal to a specific value
Query:
select * from products
where Category != 'Bars';
-- OR --
select * from products
where Category <> 'Bars';
-- OR --
select * from products
where not (Category = 'Bars');| PID | Product | Category | Size | Cost_per_box |
|---|---|---|---|---|
| P02 | 50% Dark Bites | Bites | LARGE | 2.57 |
| P06 | Eclairs | Bites | LARGE | 2.24 |
| P07 | Drinking Coco | Other | LARGE | 1.62 |
| P10 | Spicy Special Slims | Bites | LARGE | 5.79 |
| P11 | After Nines | Bites | LARGE | 4.43 |
| P14 | White Choc | Other | SMALL | 0.16 |
...
Explanation:
!=and<>both mean "not equal to"NOToperator reverses a condition- All three queries produce identical results
Purpose: Create categorizations or conditional logic within SELECT statements
Scenario: Categorize sales amounts into different tiers:
- Under $1,000: "Under 1K"
- $1,000 to $5,000: "Under 5K"
- $5,000 to $10,000: "Under 10K"
- Over $10,000: "10K or More"
Query:
select SaleDate,Amount ,case when Amount < 1000 then 'Under 1K' when Amount < 5000 then 'Under 2K' when Amount < 10000 then 'Under 10K' else '10K or More' end as amount_category from sales ;| SaleDate | Amount | amount_category |
|---|---|---|
| 2021-01-01 00:00:00 | 8414 | Under 10K |
| 2021-01-01 00:00:00 | 532 | Under 1K |
| 2021-01-01 00:00:00 | 5376 | Under 10K |
| 2021-01-01 00:00:00 | 259 | Under 1K |
| 2021-01-01 00:00:00 | 5530 | Under 10K |
...
Explanation:
CASEbegins the conditional logicWHEN condition THEN resultchecks each condition sequentially- Conditions are evaluated in order from top to bottom
- First matching condition returns its result
ELSEprovides default value if no conditions matchENDterminates the CASE statement- Assign alias
AS amount_categoryto name the result column
You can use multiple WHEN statements and combine conditions as needed.
Query Structure Best Practices:
For clarity in longer queries, break CASE statements into multiple lines:
SELECT
saledate,
amount
,CASE
WHEN amount < 1000 THEN 'Under 1K'
WHEN amount < 5000 THEN 'Under 5K'
WHEN amount < 10000 THEN 'Under 10K'
ELSE '10K or More'
END AS amount_category
FROM sales;| SaleDate | Amount | amount_category |
|---|---|---|
| 2021-01-01 00:00:00 | 8414 | Under 10K |
| 2021-01-01 00:00:00 | 532 | Under 1K |
| 2021-01-01 00:00:00 | 5376 | Under 10K |
| 2021-01-01 00:00:00 | 259 | Under 1K |
| 2021-01-01 00:00:00 | 5530 | Under 10K |
...
- Create numeric categorizations
- Create text-based categorizations
- Use in WHERE clause to filter on categorizations
- Map values for reporting and analysis
- Create custom display values
Definition: JOINs combine data from multiple tables based on related columns.
Concept: Similar to VLOOKUP in Excel, but SQL uses more optimized methods.
Critical Prerequisite: Understand how tables are linked through foreign keys before attempting JOINs.
Table Relationships:
- Sales table contains:
SPID(links to People table) - Sales table contains:
pid(links to Products table) - Sales table contains:
GeoID(links to Geography table)
Principle: IDs that appear in multiple tables represent the same entity and can be used to JOIN tables.
Scenario: Show sales data with the person's name instead of just ID
Without JOIN - Problem:
SELECT * FROM sales;
-- Shows sp01, sp02, etc. but we don't know who these people are| SPID | GeoID | PID | SaleDate | Amount | Customers | Boxes |
|---|---|---|---|---|---|---|
| SP01 | G4 | P04 | 2021-01-01 00:00:00 | 8414 | 276 | 495 |
| SP02 | G3 | P14 | 2021-01-01 00:00:00 | 532 | 317 | 54 |
| SP12 | G2 | P08 | 2021-01-01 00:00:00 | 5376 | 178 | 269 |
| SP01 | G4 | P15 | 2021-01-01 00:00:00 | 259 | 32 | 22 |
| SP19 | G2 | P18 | 2021-01-01 00:00:00 | 5530 | 4 | 179 |
...
With JOIN - Solution:
select
s.SaleDate,
s.Amount,
p.Salesperson
from sales s
join people p on p.SPID = s.SPID;| SaleDate | Amount | Salesperson |
|---|---|---|
| 2021-01-01 00:00:00 | 8414 | Barr Faughny |
| 2021-01-01 00:00:00 | 532 | Dennison Crosswaite |
| 2021-01-01 00:00:00 | 5376 | Karlen McCaffrey |
| 2021-01-01 00:00:00 | 259 | Barr Faughny |
| 2021-01-01 00:00:00 | 5530 | Beverie Moffet |
...
select
s.SaleDate,
s.Amount,
p.Salesperson,
s.SPID,
p.SPID
from sales s
join people p on p.SPID = s.SPID
;| SaleDate | Amount | Salesperson | SPID |
|---|---|---|---|
| 2021-01-01 00:00:00 | 8414 | Barr Faughny | SP01 |
| 2021-01-01 00:00:00 | 532 | Dennison Crosswaite | SP02 |
| 2021-01-01 00:00:00 | 5376 | Karlen McCaffrey | SP12 |
| 2021-01-01 00:00:00 | 259 | Barr Faughny | SP01 |
| 2021-01-01 00:00:00 | 5530 | Beverie Moffet | SP19 |
Explanation:
FROM sales sstarts with sales table, aliased as 's'JOIN people padds the people table, aliased as 'p'ON p.SPID = s.SPIDspecifies the join condition - matching IDss.andp.prefixes specify which table a column comes from- When IDs match, data from both tables appears on the same row
Purpose: Shorten long table names and make queries more readable
Syntax:
FROM sales s
-- OR --
FROM sales AS sConsistency Practice: Use consistent aliases within your query. Example:
sfor salespfor peopleprfor productsgfor geo
Why Qualify Columns with Table Prefix?
When columns have the same name in multiple tables, prefix with table alias:
s.SPID -- sales table's SPID
p.SPID -- people table's SPIDBest Practice: Even when not required, qualify columns for clarity.
JOIN (INNER JOIN):
- Returns only rows where IDs match in BOTH tables
- If sales table has SPID that doesn't exist in people table, that row is excluded
LEFT JOIN:
- Returns ALL rows from the left table (first table after FROM)
- Includes matching rows from the right table
- If no match found, right table columns show as blank/NULL
- Used to preserve all data from the primary table
Visual Representation:
Sales table (LEFT) ----------- People table (RIGHT)
[Keep ALL from sales]
[If match in people → Include people data]
[If no match in people → Show blank for people columns]
When to Use LEFT JOIN: Most common in business situations because you want to preserve all sales data even if the person's record doesn't exist in the people table.
select
s.SaleDate,
s.Amount,
pr.Product
from sales s
left join products pr on pr.PID = s.PID
;| SaleDate | Amount | Product |
|---|---|---|
| 2021-01-01 00:00:00 | 8414 | Raspberry Choco |
| 2021-01-01 00:00:00 | 532 | White Choc |
| 2021-01-01 00:00:00 | 5376 | 99% Dark & Pure |
| 2021-01-01 00:00:00 | 259 | Baker's Choco Chips |
| 2021-01-01 00:00:00 | 5530 | Manuka Honey Choco |
...
select
s.SaleDate,
s.Amount,
pr.Product
from sales s
right join products pr on pr.PID = s.PID
;| SaleDate | Amount | Product |
|---|---|---|
| 2022-03-23 00:00:00 | 637 | Milk Bars |
| 2022-02-25 00:00:00 | 2730 | Milk Bars |
| 2022-02-24 00:00:00 | 9023 | Milk Bars |
| 2022-03-04 00:00:00 | 1155 | Milk Bars |
| 2022-03-30 00:00:00 | 1120 | Milk Bars |
...
Scenario: Show sales under $500 for people in the "Delish" team
Query:
SELECT
s.SaleDate,
s.amount,
p.salesperson,
p.team
FROM sales s
JOIN people p ON p.SPID = s.SPID
WHERE s.amount < 500
AND p.team = 'Delish';| SaleDate | Amount | Salesperson | Team |
|---|---|---|---|
| 2021-01-08 00:00:00 | 364 | Gigi Bohling | Delish |
| 2021-01-14 00:00:00 | 35 | Jan Morforth | Delish |
| 2021-01-15 00:00:00 | 308 | Curtice Advani | Delish |
| 2021-02-04 00:00:00 | 182 | Camilla Castle | Delish |
| 2021-02-04 00:00:00 | 392 | Jan Morforth | Delish |
...
Explanation:
- JOIN combines the tables
- WHERE clause filters results on joined data
- Can filter on any column from either table
for fun, see the empty stuff !!!
select
s.SaleDate,
s.Amount,
p.Salesperson,
p.Team
from sales s
join people p on p.SPID = s.SPID
where s.Amount< 500
;| SaleDate | Amount | Salesperson | Team |
|---|---|---|---|
| 2021-01-01 00:00:00 | 259 | Barr Faughny | Yummies |
| 2021-01-04 00:00:00 | 147 | Karlen McCaffrey | Yummies |
| 2021-01-08 00:00:00 | 420 | Brien Boise | Jucies |
| 2021-01-08 00:00:00 | 364 | Gigi Bohling | Delish |
| 2021-01-08 00:00:00 | 357 | Ches Bonnell | |
| 2021-01-12 00:00:00 | 189 | Husein Augar | Yummies |
| 2021-01-12 00:00:00 | 490 | Barr Faughny | Yummies |
| 2021-01-14 00:00:00 | 35 | Jan Morforth | Delish |
| 2021-01-15 00:00:00 | 308 | Curtice Advani | Delish |
| 2021-01-18 00:00:00 | 238 | Brien Boise | Jucies |
| 2021-01-19 00:00:00 | 161 | Ches Bonnell | |
| 2021-01-20 00:00:00 | 343 | Brien Boise | Jucies |
| 2021-01-20 00:00:00 | 126 | Kelci Walkden | Yummies |
| 2021-01-21 00:00:00 | 280 | Ches Bonnell | |
| 2021-01-21 00:00:00 | 427 | Marney O'Breen | Jucies |
| 2021-01-22 00:00:00 | 168 | Oby Sorrel | Jucies |
| 2021-01-25 00:00:00 | 42 | Ches Bonnell | |
| 2021-01-25 00:00:00 | 343 | Mallorie Waber |
...
select
s.SaleDate,
s.Amount,
p.Salesperson,
p.Team
from sales s
join people p on p.SPID = s.SPID
where s.Amount< 500
and p.Team = ''
; | SaleDate | Amount | Salesperson | Team |
|---|---|---|---|
| 2021-01-08 00:00:00 | 357 | Ches Bonnell | |
| 2021-01-19 00:00:00 | 161 | Ches Bonnell | |
| 2021-01-21 00:00:00 | 280 | Ches Bonnell | |
| 2021-01-25 00:00:00 | 42 | Ches Bonnell | |
| 2021-01-25 00:00:00 | 343 | Mallorie Waber |
...
Scenario: Show sales with person name, product name, and team
Query:
SELECT
s.SaleDate,
s.amount,
p.salesperson,
pr.product,
p.team
FROM sales s
JOIN people p ON p.SPID = s.SPID
JOIN products pr ON pr.pid = s.pid;| SaleDate | Amount | Salesperson | Product | Team |
|---|---|---|---|---|
| 2021-01-01 00:00:00 | 8414 | Barr Faughny | Raspberry Choco | Yummies |
| 2021-01-01 00:00:00 | 532 | Dennison Crosswaite | White Choc | Yummies |
| 2021-01-01 00:00:00 | 5376 | Karlen McCaffrey | 99% Dark & Pure | Yummies |
| 2021-01-01 00:00:00 | 259 | Barr Faughny | Baker's Choco Chips | Yummies |
| 2021-01-01 00:00:00 | 5530 | Beverie Moffet | Manuka Honey Choco | Jucies |
| 2021-01-01 00:00:00 | 2184 | Rafaelita Blaksland | 85% Dark Bars |
...
Explanation:
- Chain multiple JOINs by adding additional JOIN clauses
- Each JOIN specifies its own ON condition
- Data from all three tables appears in result
- Order of JOINs: START with FROM clause, then add each JOIN sequentially
Problem: Some people records might not have a team assigned (blank or NULL)
Query:
SELECT
s.SaleDate,
s.amount,
p.salesperson,
p.team
FROM sales s
LEFT JOIN people p ON p.SPID = s.SPID
WHERE s.amount < 500
AND p.team IS NULL; -- p.team = ''| SaleDate | Amount | Salesperson | Team |
|---|---|---|---|
| 2021-01-08 00:00:00 | 357 | Ches Bonnell | |
| 2021-01-19 00:00:00 | 161 | Ches Bonnell | |
| 2021-01-21 00:00:00 | 280 | Ches Bonnell | |
| 2021-01-25 00:00:00 | 42 | Ches Bonnell | |
| 2021-01-25 00:00:00 | 343 | Mallorie Waber |
...
Explanation:
IS NULLchecks for NULL values (true null, not blank spaces)IS NOT NULLchecks for non-null values
Database Nuance: Null vs Blank
- NULL: Truly no value assigned (appears as "NULL" in results)
- Blank: Empty string or spaces (appears as empty in results)
Different filtering required for each:
- NULL:
WHERE column IS NULL - Blank:
WHERE column = ''
Complete Query with Multiple Filters:
SELECT
s.SaleDate,
s.amount,
p.salesperson,
pr.product,
g.geo
FROM sales s
JOIN people p ON p.SPID = s.SPID
JOIN products pr ON pr.pid = s.pid
JOIN geo g ON g.GeoID = s.GeoID
WHERE s.amount < 500
AND p.team = ''
AND g.geo IN ('New Zealand', 'India')
ORDER BY s.SaleDate;| SaleDate | Amount | Salesperson | Product | Geo |
|---|---|---|---|---|
| 2021-01-01 00:00:00 | 294 | Ches Bonnell | Peanut Butter Cubes | New Zealand |
| 2021-01-01 00:00:00 | 294 | Ches Bonnell | Peanut Butter Cubes | India |
| 2021-01-01 00:00:00 | 294 | Rafaelita Blaksland | Peanut Butter Cubes | New Zealand |
| 2021-01-01 00:00:00 | 294 | Rafaelita Blaksland | Peanut Butter Cubes | India |
| 2021-01-01 00:00:00 | 294 | Mallorie Waber | Peanut Butter Cubes | New Zealand |
...
Explanation:
- Combines data from 4 tables
- Multiple WHERE conditions filter results
- ORDER BY sorts by date chronologically
- Results show only sales meeting all criteria
helps to make pivot reports !!!
Purpose: Summarize data at a higher level by grouping rows and applying aggregation functions.
Concept: Similar to Pivot Table in Excel - takes detailed data and creates summary reports.
When to Use: You have data at a detailed level but want to see it at a higher level.
Common SQL aggregation functions:
| Function | Purpose |
|---|---|
SUM() |
Add values together |
AVG() |
Calculate average |
COUNT() |
Count number of rows |
MIN() |
Find minimum value |
MAX() |
Find maximum value |
Scenario: Total sales amount by geo
Query:
SELECT
GeoID,
SUM(amount) AS total_amount
FROM sales
GROUP BY GeoID;| GeoID | sum(amount) |
|---|---|
| G4 | 7435918 |
| G3 | 7350091 |
| G2 | 7012523 |
| G1 | 7310254 |
| G6 | 7189609 |
| G5 | 7263151 |
...
Explanation:
SUM(amount)adds up all amounts within each groupGROUP BY GeoIDcreates one row for each unique GeoID value- Results show: g1, g2, g3, g4, etc. with their total amounts
Query:
SELECT
GeoID,
SUM(amount) AS total_amount,
AVG(amount) AS average_amount,
SUM(boxes) AS total_boxes
FROM sales
GROUP BY GeoID;Explanation:
- Multiple aggregation functions on same GROUP BY
- Each function operates within the group
- Results show summary statistics for each geo
Scenario: Total sales amount by country (using geo table)
Query:
SELECT
g.geo,
SUM(s.amount) AS total_amount,
AVG(s.amount) AS average_amount,
SUM(s.boxes) AS total_boxes
FROM sales s
JOIN geo g ON g.GeoID = s.GeoID
GROUP BY g.geo;Explanation:
- JOIN merges tables first
- Then GROUP BY operates on joined data
- GROUP BY uses the column being displayed (geo name from geo table)
- Results are more readable with actual country names instead of IDs
Scenario: Total sales by product category AND team
Query:
SELECT
pr.category,
p.team,
SUM(s.boxes) AS total_boxes,
SUM(s.amount) AS total_amount
FROM sales s
JOIN people p ON p.SPID = s.SPID
JOIN products pr ON pr.pid = s.pid
GROUP BY pr.category, p.team
ORDER BY pr.category, p.team;Explanation:
GROUP BY pr.category, p.teamgroups by two levels- Creates combinations like: Bars-Team1, Bars-Team2, Chocolate-Team1, etc.
- Any column displayed must either be:
- In the GROUP BY clause, OR
- Wrapped in an aggregation function (SUM, AVG, COUNT, etc.)
Critical Rule: You cannot display columns not in GROUP BY unless they're aggregated. This will cause an error.
Correct Structure: Display = GROUP BY + Aggregated Columns
Query:
SELECT
pr.category,
p.team,
SUM(s.boxes) AS total_boxes,
SUM(s.amount) AS total_amount
FROM sales s
JOIN people p ON p.SPID = s.SPID
JOIN products pr ON pr.pid = s.pid
WHERE p.team IS NOT NULL
GROUP BY pr.category, p.team
ORDER BY total_amount DESC;Explanation:
WHEREfilters BEFORE grouping- Removes blank team records before aggregation
- Results only include non-null teams
Note: WHERE filters individual rows BEFORE aggregation. For filtering AFTER aggregation, use HAVING (not covered in detail here).
Query:
SELECT
g.geo,
SUM(s.amount) AS total_amount
FROM sales s
JOIN geo g ON g.GeoID = s.GeoID
GROUP BY g.geo
ORDER BY total_amount DESC;Explanation:
ORDER BY total_amount DESCsorts results by aggregated column- DESC = descending (highest to lowest)
- Highest total amount countries appear first
Scenario: Show only top 10 products by sales
Query:
SELECT
pr.product,
SUM(s.amount) AS total_amount
FROM sales s
JOIN products pr ON pr.pid = s.pid
GROUP BY pr.product
ORDER BY total_amount DESC
LIMIT 10;Explanation:
LIMIT 10restricts output to first 10 rows- Works on sorted data, so first 10 are the TOP 10
- Applies AFTER ordering
Query Execution Order:
- FROM/JOIN (get data and combine tables)
- WHERE (filter rows)
- GROUP BY (aggregate)
- ORDER BY (sort)
- LIMIT (restrict output)
1. Write FROM clause first:
SELECT
FROM table_nameThen fill in SELECT columns using auto-suggest.
2. Format for readability:
- Put SELECT, FROM, WHERE, GROUP BY, ORDER BY on separate lines
- Use proper indentation
- Use table aliases consistently
3. Use meaningful aliases:
FROM sales s -- Clear alias
JOIN people p ON ... -- Consistent naming4. Always qualify columns:
s.amount -- Clear which table
p.salesperson -- No ambiguity5. Include semicolons:
SELECT * FROM sales;
SELECT * FROM people;Semicolon marks end of statement, allows multiple queries in one file.
Mistake 1: Forgetting GROUP BY
-- WRONG:
SELECT category, amount FROM sales;
-- Error or incomplete results
-- CORRECT:
SELECT category, SUM(amount) FROM sales GROUP BY category;Mistake 2: Using unaggregated column in GROUP BY
-- WRONG:
SELECT category, product, SUM(amount)
FROM sales
GROUP BY category;
-- Error: product not in GROUP BY
-- CORRECT:
SELECT category, product, SUM(amount)
FROM sales
GROUP BY category, product;Mistake 3: Referencing alias in WHERE clause
-- WRONG:
SELECT amount AS amt
WHERE amt > 1000;
-- Error: alias not recognized in WHERE
-- CORRECT:
SELECT amount AS amt
WHERE amount > 1000;Mistake 4: Missing ON condition in JOIN
-- WRONG:
SELECT * FROM sales JOIN people;
-- Result: Cartesian product (too many rows!)
-- CORRECT:
SELECT * FROM sales
JOIN people ON people.SPID = sales.SPID;NULL vs Blank:
- NULL: No value assigned in database
- Blank: Empty string or spaces
Checking for NULL:
WHERE column IS NULL -- Check for true NULL
WHERE column IS NOT NULL -- Check for non-NULL
WHERE column != NULL -- WRONG! Always falseImportant: Use IS NULL, not = NULL. Never use != NULL.
Extract components:
YEAR(SaleDate) -- Returns year as number
MONTH(SaleDate) -- Returns month as number
DAY(SaleDate) -- Returns day as number
WEEKDAY(SaleDate) -- Returns day of week (0-6)Date format: YYYY-MM-DD (Year-Month-Day)
To save queries:
- Go to File menu
- Select "Save Script"
- Choose location and filename
- File saves as .sql format
Purpose: Reuse queries later, share with colleagues, maintain query library.
- Master the basics covered in this guide
- Practice with provided homework problems (shown in video)
- Study JOINs deeper - read additional resources for complex scenarios
- Learn HAVING clause - for filtering aggregated data
- Explore advanced functions - window functions, subqueries, etc.
After mastering SQL queries, use your data with:
- Power BI: Data visualization and analysis
- Excel: Data analysis and reporting
- Python: Data science and machine learning
- Tableau: Advanced visualization
Key Principle: Learning SQL requires consistent practice. Without practice, you will forget most concepts.
Practice Strategy:
- Download provided homework problems
- Solve easy problems first
- Progress to hard problems
- Hard problems require investigation beyond basic concepts
- Consistent practice builds mastery
> Greater than
< Less than
= Equal to
>= Greater than or equal to
<= Less than or equal to
!= Not equal to
<> Not equal to (alternative)
AND Both conditions must be true
OR At least one condition must be true
NOT Reverses condition
IN Value is in list
BETWEEN Value is between two numbers
LIKE Pattern matching
SUM(column) Total of values
AVG(column) Average of values
COUNT(column) Number of rows
MIN(column) Minimum value
MAX(column) Maximum value
SELECT Specify columns to return
FROM Specify primary table
WHERE Filter rows before grouping
JOIN Combine multiple tables
ON Specify join condition
GROUP BY Group rows for aggregation
ORDER BY Sort results
LIMIT Restrict number of rows
SELECT columns
FROM table
WHERE conditions
GROUP BY columns
ORDER BY columns
LIMIT number;Data Definition Language (DDL) defines the structure of your database.
CREATE TABLE LoyaltyProgram (
LoyaltyID INT AUTO_INCREMENT,
SPID VARCHAR(10),
Points INT DEFAULT 0,
Status VARCHAR(20),
EnrollmentDate DATETIME,
PRIMARY KEY (LoyaltyID)
);Used to modify existing table structures.
-- Add a new column
ALTER TABLE LoyaltyProgram ADD COLUMN LastUpdated DATETIME;
-- Modify an existing column type
ALTER TABLE LoyaltyProgram MODIFY COLUMN Status VARCHAR(50);
-- Rename a column
ALTER TABLE LoyaltyProgram RENAME COLUMN Status TO TierLevel;- DROP: Deletes table structure + data (irreversible).
- TRUNCATE: Deletes data, keeps structure.
- DELETE: Removes rows (transaction-safe).
DROP TABLE OldSalesData;
TRUNCATE TABLE TempLog;Constraints ensure accuracy and reliability of your data.
CREATE TABLE OrderAudit (
AuditID INT PRIMARY KEY,
SPID VARCHAR(10),
ActionDate DATETIME,
FOREIGN KEY (SPID) REFERENCES People(SPID)
);ALTER TABLE LoyaltyProgram
ADD CONSTRAINT UQ_SPID UNIQUE (SPID);
ALTER TABLE LoyaltyProgram
ADD CONSTRAINT CHK_Points CHECK (Points >= 0);A transaction is a single logical unit of work.
- Atomicity — All or nothing
- Consistency — Always valid state
- Isolation — No interference
- Durability — Permanent after commit
START TRANSACTION;
UPDATE Accounts SET Balance = Balance - 100 WHERE ID = 1;
UPDATE Accounts SET Balance = Balance + 100 WHERE ID = 2;
COMMIT; -- or ROLLBACK;- Read Uncommitted — allows dirty reads
- Read Committed — prevents dirty reads
- Repeatable Read — consistent rows
- Serializable — highest safety, lowest performance
SELECT
Salesperson,
Amount,
ROW_NUMBER() OVER(ORDER BY Amount DESC) AS RowNum,
RANK() OVER(ORDER BY Amount DESC) AS RankVal,
DENSE_RANK() OVER(ORDER BY Amount DESC) AS DenseRankVal
FROM Sales;SELECT
SaleDate,
Amount,
LAG(Amount, 1) OVER(ORDER BY SaleDate) AS PreviousSale,
LEAD(Amount, 1) OVER(ORDER BY SaleDate) AS NextSale
FROM Sales;SELECT
SaleDate,
Amount,
SUM(Amount) OVER(ORDER BY SaleDate) AS RunningTotal
FROM Sales;WITH HighValueSales AS (
SELECT * FROM Sales WHERE Amount > 5000
)
SELECT * FROM HighValueSales;WITH RECURSIVE EmployeeTree AS (
SELECT ID, Name, ManagerID
FROM Employees
WHERE ManagerID IS NULL
UNION ALL
SELECT e.ID, e.Name, e.ManagerID
FROM Employees e
JOIN EmployeeTree et ON e.ManagerID = et.ID
)
SELECT * FROM EmployeeTree;SELECT * FROM People p
WHERE EXISTS (
SELECT 1 FROM Sales s
WHERE s.SPID = p.SPID AND s.Amount > 10000
);UNION— distinctUNION ALL— includes duplicatesINTERSECT— common rowsEXCEPT / MINUS— rows in A not in B
SELECT
GeoID,
PID,
SUM(Amount)
FROM Sales
GROUP BY GeoID, PID WITH ROLLUP;- ROLLUP → hierarchical subtotals
- CUBE → all combinations
- GROUPING SETS → custom combinations
- Clustered Index — sorts actual rows
- Non-Clustered Index — separate structure pointing to data
- WHERE
- JOIN
- ORDER BY
- Frequent INSERT/UPDATE/DELETE
- Large storage cost
EXPLAIN SELECT * FROM Sales WHERE Amount > 5000;Look for:
- Table Scan (bad)
- Index Seek / Range Scan (good)
DELIMITER //
CREATE PROCEDURE GetSalesByRegion(IN regionName VARCHAR(50))
BEGIN
SELECT * FROM Sales s
JOIN Geo g ON s.GeoID = g.GeoID
WHERE g.Geo = regionName;
END //
DELIMITER ;CREATE FUNCTION CalculateTax(amount DECIMAL(10,2))
RETURNS DECIMAL(10,2)
DETERMINISTIC
RETURN amount * 0.15;CREATE TRIGGER BeforeSalesInsert
BEFORE INSERT ON Sales
FOR EACH ROW
BEGIN
IF NEW.Amount < 0 THEN
SIGNAL SQLSTATE '45000'
SET MESSAGE_TEXT = 'Amount cannot be negative';
END IF;
END;CREATE TABLE ProductsJSON (
ID INT,
Attributes JSON
);
SELECT
ID,
JSON_EXTRACT(Attributes, '$.color') AS Color
FROM ProductsJSON;- Fact Tables: Sales, Metrics
- Dimension Tables: People, Products, Geo
- Type 1: overwrite
- Type 2: add rows with
StartDate,EndDate
Useful for large time-series datasets.
CREATE USER 'app_user'@'localhost' IDENTIFIED BY 'password123';
GRANT SELECT, INSERT ON AwesomeChocolates.*
TO 'app_user'@'localhost';
REVOKE DELETE ON AwesomeChocolates.*
FROM 'app_user'@'localhost';- Logical — mysqldump
- Physical — underlying files
- Use explicit column names
- Avoid
SELECT * - Use meaningful aliases
- Comment complex logic
❌ Bad
SELECT * FROM A, B WHERE A.id = B.id;✔️ Good
SELECT * FROM A JOIN B ON A.id = B.id;Use JOIN instead of performing queries inside loops.
Break out into normalized tables.
Avoid:
WHERE YEAR(SaleDate) = 2022;Use:
WHERE SaleDate BETWEEN '2022-01-01' AND '2022-12-31';“The goal is to turn data into information, and information into insight.” — Carly Fiorina
INTERMEDIATE PROBLEMS
👉 You need to combine various concepts covered in the video to solve these
1. Print details of shipments (sales) where amounts are > 2,000 and boxes are <100?
2. How many shipments (sales) each of the sales persons had in the month of January 2022?
3. Which product sells more boxes? Milk Bars or Eclairs?
4. Which product sold more boxes in the first 7 days of February 2022? Milk Bars or Eclairs?
5. Which shipments had under 100 customers & under 100 boxes? Did any of them occur on Wednesday?
HARD PROBLEMS
👉 These require concepts not covered in the video
1. What are the names of salespersons who had at least one shipment (sale) in the first 7 days of January 2022?
2. Which salespersons did not make any shipments in the first 7 days of January 2022?
3. How many times we shipped more than 1,000 boxes in each month?
4. Did we ship at least one box of ‘After Nines’ to ‘New Zealand’ on all the months?
5. India or Australia? Who buys more chocolate boxes on a monthly basis?
INTERMEDIATE PROBLEMS:
— 1. Print details of shipments (sales) where amounts are > 2,000 and boxes are <100?
select * from sales where amount > 2000 and boxes < 100;
— 2. How many shipments (sales) each of the sales persons had in the month of January 2022?
select p.Salesperson, count(*) as ‘Shipment Count’
from sales s
join people p on s.spid = p.spid
where SaleDate between ‘2022-1-1’ and ‘2022-1-31’
group by p.Salesperson;
— 3. Which product sells more boxes? Milk Bars or Eclairs?
select pr.product, sum(boxes) as ‘Total Boxes’
from sales s
join products pr on s.pid = pr.pid
where pr.Product in (‘Milk Bars’, ‘Eclairs’)
group by pr.product;
— 4. Which product sold more boxes in the first 7 days of February 2022? Milk Bars or Eclairs?
select pr.product, sum(boxes) as ‘Total Boxes’
from sales s
join products pr on s.pid = pr.pid
where pr.Product in (‘Milk Bars’, ‘Eclairs’)
and s.saledate between ‘2022-2-1’ and ‘2022-2-7’
group by pr.product;
— 5. Which shipments had under 100 customers & under 100 boxes? Did any of them occur on Wednesday?
select * from sales
where customers < 100 and boxes < 100;
select *,
case when weekday(saledate)=2 then ‘Wednesday Shipment’
else ”
end as ‘W Shipment’
from sales
where customers < 100 and boxes < 100;
HARD PROBLEMS:
— What are the names of salespersons who had at least one shipment (sale) in the first 7 days of January 2022?
select distinct p.Salesperson
from sales s
join people p on p.spid = s.SPID
where s.SaleDate between ‘2022-01-01’ and ‘2022-01-07’;
— Which salespersons did not make any shipments in the first 7 days of January 2022?
select p.salesperson
from people p
where p.spid not in
(select distinct s.spid from sales s where s.SaleDate between ‘2022-01-01’ and ‘2022-01-07’);
— How many times we shipped more than 1,000 boxes in each month?
select year(saledate) ‘Year’, month(saledate) ‘Month’, count(*) ‘Times we shipped 1k boxes’
from sales
where boxes>1000
group by year(saledate), month(saledate)
order by year(saledate), month(saledate);
— Did we ship at least one box of ‘After Nines’ to ‘New Zealand’ on all the months?
set @product_name = ‘After Nines’;
set @country_name = ‘New Zealand’;
select year(saledate) ‘Year’, month(saledate) ‘Month’,
if(sum(boxes)>1, ‘Yes’,’No’) ‘Status’
from sales s
join products pr on pr.PID = s.PID
join geo g on g.GeoID=s.GeoID
where pr.Product = @product_name and g.Geo = @country_name
group by year(saledate), month(saledate)
order by year(saledate), month(saledate);
— India or Australia? Who buys more chocolate boxes on a monthly basis?
select year(saledate) ‘Year’, month(saledate) ‘Month’,
sum(CASE WHEN g.geo=’India’ = 1 THEN boxes ELSE 0 END) ‘India Boxes’,
sum(CASE WHEN g.geo=’Australia’ = 1 THEN boxes ELSE 0 END) ‘Australia Boxes’
from sales s
join geo g on g.GeoID=s.GeoID
group by year(saledate), month(saledate)
order by year(saledate), month(saledate);
SQL is a powerful tool for data analysis. The 50+ queries and concepts covered in this guide provide a strong foundation for working with databases. Success with SQL requires:
- Understanding table structures and relationships
- Mastering basic clauses (SELECT, WHERE, FROM)
- Using JOINs to combine related data
- Aggregating data with GROUP BY
- Consistent practice and exploration
End-of-File
The KintsugiStack repository, authored by Kintsugi-Programmer, is less a comprehensive resource and more an Artifact of Continuous Research and Deep Inquiry into Computer Science and Software Engineering. It serves as a transparent ledger of the author's relentless pursuit of mastery, from the foundational algorithms to modern full-stack implementation.
Made with 💚 Kintsugi-Programmer
