# SQL Data Analysis Workshop

 This workshop is designed to help you build practical SQL skills by analyzing real-world business data. 
     
The tasks are divided into 4 levels to gradually progress from basic data retrieval to advanced analysis. Each level focuses on specific SQL concepts and challenges, allowing you to practice query writing, data filtering, aggregations, joins, and subqueries.
- Level 1: Basic SQL queries (SELECT, WHERE, ORDER BY, COUNT).
- Level 2: Aggregations (SUM, AVG, GROUP BY) and basic joins.
- Level 3: Complex joins, subqueries, and multi-table analysis.
- Level 4: Advanced analytical tasks focused on complex queries, subqueries, and aggregations..

This structured approach ensures you build confidence and expertise in SQL, preparing you for real-world data analysis challenges. 🚀


---
## Setup the work environment 

#### 1. Library's import

In [4]:
#Improts 
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

#### 2. Connecting to the Database


In [5]:
import mysql.connector

# Establish the connection
connection = mysql.connector.connect(
    host="localhost",
    user="root",
    password="302112",
    database="bike_1"
)

# Create a cursor object
cursor = connection.cursor()
print("Connection established successfully.")

Connection established successfully.


#### 3. Setup for SQL Magic

In [6]:
%load_ext sql


In [7]:
!pip install ipython-sql




In [8]:
%sql mysql+mysqlconnector://root:302112@localhost:3306/bike_1



---
## Level 1: Beginner – Basic Queries (using SQL Magic)


### Task 1: Retrieve all rows and columns from the customers table.



In [None]:
%%sql
-- Task 1:  
SELECT * FROM customers;


customer_id,first_name,last_name,phone,email,street,city,state,zip_code
1,Debra,Burks,,debra.burks@yahoo.com,9273 Thorne Ave.,Orchard Park,NY,14127
2,Kasha,Todd,,kasha.todd@yahoo.com,910 Vine Street,Campbell,CA,95008
3,Tameka,Fisher,,tameka.fisher@aol.com,769C Honey Creek St.,Redondo Beach,CA,90278
4,Daryl,Spence,,daryl.spence@aol.com,988 Pearl Lane,Uniondale,NY,11553
5,Charolette,Rice,(916) 381-6003,charolette.rice@msn.com,107 River Dr.,Sacramento,CA,95820
6,Lyndsey,Bean,,lyndsey.bean@hotmail.com,769 West Road,Fairport,NY,14450
7,Latasha,Hays,(716) 986-3359,latasha.hays@hotmail.com,7014 Manor Station Rd.,Buffalo,NY,14215
8,Jacquline,Duncan,,jacquline.duncan@yahoo.com,15 Brown St.,Jackson Heights,NY,11372
9,Genoveva,Baldwin,,genoveva.baldwin@msn.com,8550 Spruce Drive,Port Washington,NY,11050
10,Pamelia,Newman,,pamelia.newman@gmail.com,476 Chestnut Ave.,Monroe,NY,10950


### Task 2: Retrieve customers who live in the city "New York".

In [32]:
%%sql
-- Task 2:     New York
SELECT * FROM customers
WHERE city = 'New York';


customer_id,first_name,last_name,phone,email,street,city,state,zip_code
16,Emmitt,Sanchez,(212) 945-8823,emmitt.sanchez@hotmail.com,461 Squaw Creek Road,New York,NY,10002
178,Genoveva,Tyler,(212) 152-6381,genoveva.tyler@gmail.com,8121 Windfall Ave.,New York,NY,10002
327,Sharie,Alvarez,(212) 211-7621,sharie.alvarez@msn.com,987 West Leatherwood Dr.,New York,NY,10002
411,Octavia,Case,(212) 171-1335,octavia.case@aol.com,40 Charles Road,New York,NY,10002
854,Phylis,Adkins,(212) 325-9145,phylis.adkins@msn.com,7781 James Ave.,New York,NY,10002
927,Guillermo,Hart,(212) 652-7198,guillermo.hart@hotmail.com,81 Indian Summer Drive,New York,NY,10002
1016,Shenna,Benton,(212) 578-2912,shenna.benton@msn.com,57 Shadow Brook Road,New York,NY,10002


### Task 3: Retrieve all products sorted by their price in descending order.



In [None]:
%%sql
-- Task 3:     
SELECT * FROM products
ORDER BY price DESC;


RuntimeError: (mysql.connector.errors.ProgrammingError) 1054 (42S22): Unknown column 'price' in 'order clause'
[SQL: SELECT * FROM products
ORDER BY price DESC;]
(Background on this error at: https://sqlalche.me/e/20/f405)


### Task 4: Find Orders by Status


#### Task 4.1: Retrieve all orders with status "Completed".


In [None]:
%%sql
-- Task 4.1:    
SELECT * FROM orders
WHERE status = 'Completed';


RuntimeError: (mysql.connector.errors.ProgrammingError) 1054 (42S22): Unknown column 'status' in 'where clause'
[SQL: SELECT * FROM orders
WHERE status = 'Completed';]
(Background on this error at: https://sqlalche.me/e/20/f405)


#### Task 4.1: Retrieve count of orders with status "Rejected".



### Task 5: Count how many customers exist in the customers table.

In [None]:
%%sql
-- Task 5:  
SELECT COUNT(*) AS total_customers
FROM customers;



total_customers
1445


---
## Level 2: Intermediate – Aggregations and Basic Joins
##### (Task 1,2,3 using Pandas read_sql and Task 4,5 using execute and fetchall() in mysql.connector)

### Task 1: Calculate the total revenue (sum of list_price * quantity) from the order_items table.


In [None]:
%%sql
-- Task 1:    
SELECT AVG(price) AS avg_price
FROM products;


RuntimeError: (mysql.connector.errors.ProgrammingError) 1054 (42S22): Unknown column 'price' in 'field list'
[SQL: SELECT AVG(price) AS avg_price
FROM products;]
(Background on this error at: https://sqlalche.me/e/20/f405)


### Task 2: Calculate the average price of products in the products table.


In [None]:
%%sql
-- Task 2:      
SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;


customer_id,order_count
1,3
2,3
3,3
4,3
5,3
6,3
7,3
8,3
9,3
10,3


### Task 3: Retrieve the number of orders placed by each customer.


In [None]:
%%sql
-- Task 3:   
SELECT SUM(quantity * list_price) AS total_sales
FROM order_items;


total_sales
8578988.88


### Task 4: Join Orders with Customers


#### Task 4.1: Retrieve customer names along with their order IDs.

In [None]:
%%sql
-- Task 4:      
SELECT c.first_name, c.last_name, o.order_id
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;


first_name,last_name,order_id
Debra,Burks,599
Debra,Burks,1555
Debra,Burks,1613
Kasha,Todd,692
Kasha,Todd,1084
Kasha,Todd,1509
Tameka,Fisher,1468
Tameka,Fisher,1496
Tameka,Fisher,1612
Daryl,Spence,700


#### Task 4.2: Retrieve Customer Names with Total Orders

### Task 5: Retrieve products with stock quantities less than 50.

In [None]:
%%sql
-- Task 5:     
SELECT category_id, COUNT(*) AS product_count
FROM products
GROUP BY category_id;


category_id,product_count
1,59
2,30
3,78
4,10
5,24
6,60
7,60


---

## Level 3: Advanced – Complex Joins and Subqueries
(using Pandas read_sql)

### Task 1: Retrieve the Top 5 Products with the Highest Total Sales Revenue


In [None]:
%%sql
-- Task 1:   
SELECT product_name, price
FROM products
ORDER BY price DESC
LIMIT 5;


RuntimeError: (mysql.connector.errors.ProgrammingError) 1054 (42S22): Unknown column 'price' in 'field list'
[SQL: SELECT product_name, price
FROM products
ORDER BY price DESC
LIMIT 5;]
(Background on this error at: https://sqlalche.me/e/20/f405)


### Task 2: Find the Customers Who Placed the Most Orders in June, November 2018


In [None]:
%%sql
-- Task 2:     
SELECT *
FROM customers c
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);


customer_id,first_name,last_name,phone,email,street,city,state,zip_code


### Task 3: List All Stores with Their Total Stock Quantities for All Products


In [None]:
%%sql
-- Task 3:     
SELECT oi.order_id, p.product_name, oi.quantity, oi.list_price
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id;


order_id,product_name,quantity,list_price
4,Ritchey Timberwolf Frameset - 2016,2,749.99
18,Ritchey Timberwolf Frameset - 2016,2,749.99
26,Ritchey Timberwolf Frameset - 2016,1,749.99
59,Ritchey Timberwolf Frameset - 2016,1,749.99
66,Ritchey Timberwolf Frameset - 2016,1,749.99
93,Ritchey Timberwolf Frameset - 2016,1,749.99
97,Ritchey Timberwolf Frameset - 2016,1,749.99
98,Ritchey Timberwolf Frameset - 2016,1,749.99
114,Ritchey Timberwolf Frameset - 2016,1,749.99
121,Ritchey Timberwolf Frameset - 2016,2,749.99


### Task 4: Retrieve Staff Members Who Work at Stores Located in a Specific State
California (CA) , New York (NY) , Texas (TX)


In [None]:
%%sql
-- Task 4:     
SELECT store_id, COUNT(*) AS order_count
FROM orders
GROUP BY store_id;


store_id,order_count
1,348
2,1093
3,174


### Task 5: Identify the Categories of Products with the Highest Total Revenue


In [None]:
%%sql
-- Task 5:      
SELECT category_id, COUNT(*) AS product_count
FROM products
GROUP BY category_id
ORDER BY product_count DESC
LIMIT 1;


category_id,product_count
3,78


---
## Level 4: Expert – Advanced Analytical Queries
(using Pandas read_sql)

### Task 1: Find the Store with the Highest Total Revenue
Retrieve the store name and total revenue (sum of list_price * quantity) across all orders.




In [24]:
%%sql
-- Task 1: إجمالي الإيرادات لكل متجر
SELECT o.store_id, SUM(oi.quantity * oi.list_price) AS revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
GROUP BY o.store_id;


store_id,revenue
1,1790145.91
2,5826242.21
3,962600.76


### Task 2: Find all products that have never been ordered.


In [25]:
%%sql
-- Task 2: العميل اللي صرف أكتر فلوس
SELECT c.customer_id, c.first_name, c.last_name,
       SUM(oi.quantity * oi.list_price) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
GROUP BY c.customer_id
ORDER BY total_spent DESC
LIMIT 1;


customer_id,first_name,last_name,total_spent
10,Pamelia,Newman,37801.84


### Task 3: Identify the Customers Who Placed the Fewest Orders
Retrieve customer names and their total order counts, ordered in ascending order of order count.




In [26]:
%%sql
-- Task 3: أكتر منتج اتباع (أعلى كمية مباعة)
SELECT p.product_id, p.product_name, SUM(oi.quantity) AS total_sold
FROM products p
JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.product_name
ORDER BY total_sold DESC
LIMIT 1;


product_id,product_name,total_sold
6,Surly Ice Cream Truck Frameset - 2016,167


### Task 4: Analyze Monthly Revenue for the Last Year (Specific to 2017)


In [27]:
%%sql
-- Task 4: الموظفين اللي نفذوا أكبر عدد طلبات
SELECT s.staff_id, s.first_name, s.last_name, COUNT(o.order_id) AS orders_processed
FROM staffs s
JOIN orders o ON s.staff_id = o.staff_id
GROUP BY s.staff_id
ORDER BY orders_processed DESC;


staff_id,first_name,last_name,orders_processed
6,Marcelene,Boyer,553
7,Venita,Daniel,540
3,Genna,Serrano,184
2,Mireya,Copeland,164
8,Kali,Vargas,88
9,Layla,Terrell,86


### Task 5: Find Products That Are Low in Stock Across All Stores
Identify products where the total stock quantity (across all stores) is less than 100.

In [28]:
%%sql
-- Task 5: الشهر اللي حقق أعلى مبيعات
SELECT DATE_FORMAT(o.order_date, '%Y-%m') AS month,
       SUM(oi.quantity * oi.list_price) AS revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
GROUP BY month
ORDER BY revenue DESC
LIMIT 1;


month,revenue
2018-04,909179.47



---

## Level 5: Additional Advanced SQL
(using Pandas read_sql)

### Task 1: Aggregate Sales by City and State
Write a query to calculate the total sales, average order value, and maximum order value for customers in each city and state.


In [29]:
%%sql
-- Extra Task 1: أعلى 3 عملاء من حيث عدد الطلبات
SELECT c.customer_id, c.first_name, c.last_name, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
ORDER BY order_count DESC
LIMIT 3;


customer_id,first_name,last_name,order_count
1,Debra,Burks,3
2,Kasha,Todd,3
3,Tameka,Fisher,3


### Task 2: Find Top-Selling Products in 2018
Write a query to identify the products with the highest sales volume in the year 2018.




In [30]:
%%sql
-- Extra Task 2: المنتج اللي جاب أعلى إيراد (الكمية × السعر)
SELECT p.product_id, p.product_name, SUM(oi.quantity * oi.list_price) AS revenue
FROM products p
JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.product_name
ORDER BY revenue DESC
LIMIT 1;


product_id,product_name,revenue
7,Trek Slash 8 27.5 - 2016,615998.46


### Task 3: Calculate Month-over-Month Sales Growth
Write a query to calculate the month-over-month sales growth for the past years.


In [31]:
%%sql
-- Extra Task 3: عدد العملاء في كل مدينة
SELECT city, COUNT(*) AS total_customers
FROM customers
GROUP BY city
ORDER BY total_customers DESC;


city,total_customers
Mount Vernon,20
Scarsdale,17
Ballston Spa,17
Canandaigua,14
Longview,13
Ossining,13
Floral Park,13
Sunnyside,12
Astoria,12
Richmond Hill,12
