
# Workflow Practice

In this notebook, you’ll practice connecting to a SQLite database, creating tables from CSV files using Pandas, and writing SQL queries to explore the data.

The dataset comes from the [Bike Store Sample Database](https://www.kaggle.com/datasets/dillonmyrick/bike-store-sample-database) by Dillon Myrick. It models a fictional bike retailer with multiple stores, products, customers, and staff. Each table connects to others using foreign keys such as `customer_id`, `store_id`, and `product_id`.

You’ll:
- Connect to a local SQLite database
- Create tables using `pandas.to_sql()`
- Write and test SQL queries using `pd.read_sql()`

All of your work will take place directly in this notebook. Each question prompt is written below as a Markdown cell, followed by an empty code cell for you to write your query.



## Step 1: Connect to the Database

Run the following cell to connect to (or create) a SQLite database called `bike_store.db`.  
If the file doesn’t exist yet, SQLite will automatically create it.


In [2]:
import sqlite3
import pandas as pd

In [3]:
connection = sqlite3.connect("bike_store.db")
connection

<sqlite3.Connection at 0x29942c5d300>


## Step 2: Create Tables from CSV Files

The `data/` folder contains one CSV file per table.  
Use `pandas.read_csv()` and `DataFrame.to_sql()` to load each file into your database.

You only need to do this once.  
After that, you’ll be able to run queries against your newly created tables.


In [4]:
# Example for one file
customers = pd.read_csv("data/customers.csv")
customers.to_sql("customers", connection, if_exists="replace", index=False)

1445

In [5]:
# Repeat for all other files in the data folder, or use a loop.
tables = ['data/brands.csv', 'data/categories.csv', 'data/order_items.csv', 'data/orders.csv', 'data/products.csv', 'data/staffs.csv', 'data/stocks.csv', 'data/stores.csv']
for file in tables:
    df = pd.read_csv(file)
    df.to_sql(name = file.replace('.csv', '').replace('data/',''), con = connection, if_exists='replace', index=False)

### Verify Your Tables

Run a query to make sure your tables were created successfully.

In [6]:

pd.read_sql("SELECT name FROM sqlite_master WHERE type='table';", connection)


Unnamed: 0,name
0,data/brands
1,data/categories
2,data/order_items
3,data/orders
4,data/products
5,data/staffs
6,data/stocks
7,data/stores
8,customers
9,brands


## Step 3: Test a Simple Query

Before starting the exercises, confirm your connection and tables are working by previewing the first few rows of the `customers` table.

In [7]:

pd.read_sql("SELECT discount from order_items", connection)


Unnamed: 0,discount
0,0.20
1,0.07
2,0.05
3,0.05
4,0.20
...,...
4717,0.07
4718,0.20
4719,0.20
4720,0.07


### Q1. List all customers and their cities.

Return the first name, last name, and city of each customer. Sort alphabetically by last name and then by first name.

In [8]:
# Your query here
query1 = """ 
"""

### Q2. Show all products and their prices.

Display each product name along with its list price. Sort by price in descending order.

In [9]:
# Your query here

### Q3. Find all customers from California.

Return first name, last name, city, and state for all customers whose state is 'CA'. Sort alphabetically by last name.

In [10]:
# Your query here

### Q4. Count how many products are in each category.

Return the category name and the number of products in that category. Sort from the highest count to the lowest.

In [11]:
# Your query here

### Q5. Find all orders placed in 2018.

List the order ID, order date, and customer ID for orders made during the year 2018. Sort by order date.

In [12]:
# Your query here

### Q6. Show each order with its total number of items.

Join the `orders` and `order_items` tables. Group by order ID and return the number of items per order.

In [13]:
# Your query here

### Q7. List total revenue per store.

Revenue = quantity * list_price * (1 - discount). Join `orders`, `order_items`, and `stores`, group by store name, and return total revenue.

In [19]:
pd.set_option('display.float_format', '{:.2f}'.format)

# Your query here
query7 = """
select sum(order_items.quantity * order_items.list_price * (1- order_items.discount)) as total_revenue, stores.store_name
from order_items 
join orders on order_items.order_id = orders.order_id
join stores on orders.store_id = stores.store_id
group by stores.store_name
 """
pd.read_sql(query7, connection)

Unnamed: 0,total_revenue,store_name
0,5215751.28,Baldwin Bikes
1,867542.24,Rowlett Bikes
2,1605823.04,Santa Cruz Bikes


### Q8. Find the top 5 customers who spent the most overall.

Join `customers`, `orders`, and `order_items`. Sum the total spending per customer and return the top five spenders.

In [15]:
# Your query here

### Q9. Show the best-selling product in each category.

Join `products`, `order_items`, and `categories`. For each category, identify the product with the highest total quantity sold.

In [22]:
# Your query here
query9 = """ 
WITH category_product_sales AS (
    SELECT
        c.category_name,
        p.product_name,
        SUM(oi.quantity) AS total_sold,
        RANK() OVER (
            PARTITION BY c.category_name
            ORDER BY SUM(oi.quantity) DESC
        ) AS rnk
    FROM categories c
    JOIN products p ON c.category_id = p.category_id
    JOIN order_items oi ON p.product_id = oi.product_id
    GROUP BY c.category_name, p.product_name
)
SELECT category_name, product_name, total_sold
FROM category_product_sales
WHERE rnk = 1
ORDER BY category_name;
"""
pd.read_sql(query9, connection)

Unnamed: 0,category_name,product_name,total_sold
0,Children Bicycles,Electra Girl's Hawaii 1 (20-inch) - 2015/2016,154
1,Comfort Bicycles,Electra Townie Original 7D - 2015/2016,148
2,Cruisers Bicycles,Electra Cruiser 1 (24-Inch) - 2016,157
3,Cyclocross Bicycles,Surly Straggler 650b - 2016,151
4,Electric Bikes,Trek Conduit+ - 2016,145
5,Mountain Bikes,Surly Ice Cream Truck Frameset - 2016,167
6,Road Bikes,Trek Domane SLR 6 Disc - 2017,43


In [21]:
pd.read_sql(""" 
SELECT
        c.category_name,
        p.product_name,
        SUM(oi.quantity) AS total_sold,
        RANK() OVER (
            PARTITION BY c.category_name
            ORDER BY SUM(oi.quantity) DESC
        ) AS rnk
    FROM categories c
    JOIN products p ON c.category_id = p.category_id
    JOIN order_items oi ON p.product_id = oi.product_id
    GROUP BY c.category_name, p.product_name
""", connection)


Unnamed: 0,category_name,product_name,total_sold,rnk
0,Children Bicycles,Electra Girl's Hawaii 1 (20-inch) - 2015/2016,154,1
1,Children Bicycles,Electra Girl's Hawaii 1 (16-inch) - 2015/2016,145,2
2,Children Bicycles,Electra Cruiser 1 (24-Inch) - 2016,139,3
3,Children Bicycles,"Electra Girl's Hawaii 1 16"" - 2017",47,4
4,Children Bicycles,Electra Townie 7D (20-inch) - Boys' - 2017,40,5
...,...,...,...,...
302,Road Bikes,Trek Domane SL Frameset Women's - 2018,1,47
303,Road Bikes,Trek Domane SL 5 Women's - 2018,1,47
304,Road Bikes,Trek Domane ALR 3 - 2018,1,47
305,Road Bikes,Trek CrossRip 2 - 2018,1,47


In [None]:
pd.read_sql(""" 
WITH category_product_sales AS (SELECT
c.category_name,
p.product_name,
SUM(oi.quantity) AS total_sold
FROM categories c
JOIN products p ON c.category_id = p.category_id
JOIN order_items oi ON p.product_id = oi.product_id
group by c.category_name, p.product_name
)

select category_name, product_name, max(total_sold)
from category_product_sales
group by category_name            
""", connection)

Unnamed: 0,category_name,product_name,max(total_sold)
0,Children Bicycles,Electra Girl's Hawaii 1 (20-inch) - 2015/2016,154
1,Comfort Bicycles,Electra Townie Original 7D - 2015/2016,148
2,Cruisers Bicycles,Electra Cruiser 1 (24-Inch) - 2016,157
3,Cyclocross Bicycles,Surly Straggler 650b - 2016,151
4,Electric Bikes,Trek Conduit+ - 2016,145
5,Mountain Bikes,Surly Ice Cream Truck Frameset - 2016,167
6,Road Bikes,Trek Domane SLR 6 Disc - 2017,43


### Q10. Identify the employees (staff) who processed the most orders.

Join `staffs` and `orders`. Count the number of orders handled by each staff member and return the results sorted by highest total.

In [17]:
# Your query here