<a href="https://colab.research.google.com/github/sarikasea/SQL_Mastery/blob/main/Joins_Practice_Lab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ungraded Lab: Join Practice Lab

## üìã Overview
Welcome to BookCycle's data analysis team! In this lab, you'll explore advanced SQL techniques by learning how to use JOIN operations to combine data from multiple tables. You'll help BookCycle's management understand the relationships between customers, transactions, and books, providing valuable insights for business decisions.

## üéØ Learning Outcomes
By the end of this lab, you will be able to:

- Implement INNER JOIN to combine data from two related tables
- Use LEFT JOIN to include all records from one table and matching records from another
- Write multi-table joins to answer complex business questions
- Apply joins to real-world scenarios in a book retail context

## üìö Dataset Information
You'll be working with three main tables in the BookCycle database:
1. <b>customers:</b> Contains customer information including IDs, join dates, and preferences
2. <b>transactions:</b> Records of book purchases, including transaction details and customer IDs
3. <b>books:</b> Inventory information about the books, including titles, authors, and prices


## üñ•Ô∏è Activities

### Activity 1: Understanding INNER JOIN

BookCycle wants to analyze customer purchases by combining customer and transaction data.

<b>Step 1:</b> Import the necessary libraries and connect to the database:

In [1]:
import sqlite3
import pandas as pd

# Setting up the database. DO NOT edit the code given below
from db_setup import setup_database
setup_database()

ModuleNotFoundError: No module named 'db_setup'

In [2]:
conn = sqlite3.connect('bookcycle.db')
cursor = conn.cursor()

# Drop tables if they exist to start fresh
cursor.execute("DROP TABLE IF EXISTS customers")
cursor.execute("DROP TABLE IF EXISTS transactions")
cursor.execute("DROP TABLE IF EXISTS books")

# Create customers table
cursor.execute("""
CREATE TABLE customers (
    customer_id INTEGER PRIMARY KEY,
    first_name TEXT,
    last_name TEXT,
    join_date TEXT,
    preferred_store TEXT
);
""")

# Create transactions table
cursor.execute("""
CREATE TABLE transactions (
    transaction_id INTEGER PRIMARY KEY,
    customer_id INTEGER,
    book_id INTEGER,
    date_time TEXT,
    sale_price REAL,
    store_location TEXT,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
    FOREIGN KEY (book_id) REFERENCES books(book_id)
);
""")

# Create books table
cursor.execute("""
CREATE TABLE books (
    book_id INTEGER PRIMARY KEY,
    title TEXT,
    author TEXT,
    genre TEXT,
    price REAL
);
""")

# Insert sample data into customers
customers_data = [
    (1, 'Alice', 'Smith', '2023-01-15', 'Downtown'),
    (2, 'Bob', 'Johnson', '2023-02-20', 'Uptown'),
    (3, 'Charlie', 'Brown', '2023-03-10', 'Downtown'),
    (4, 'Diana', 'Prince', '2023-04-05', 'Midtown'),
    (5, 'Eve', 'Adams', '2023-05-01', 'Uptown')
]
cursor.executemany("INSERT INTO customers VALUES (?, ?, ?, ?, ?)", customers_data)

# Insert sample data into books
books_data = [
    (101, 'The Great Adventure', 'A. Author', 'Fiction', 15.99),
    (102, 'Data Science for All', 'B. Data', 'Non-Fiction', 29.99),
    (103, 'Mystery on Elm Street', 'C. Thriller', 'Mystery', 12.50),
    (104, 'Cooking with Love', 'D. Chef', 'Cooking', 22.00),
    (105, 'Gardening Basics', 'E. Green', 'Gardening', 18.75)
]
cursor.executemany("INSERT INTO books VALUES (?, ?, ?, ?, ?)", books_data)

# Insert sample data into transactions
transactions_data = [
    (1001, 1, 101, '2023-01-20 10:00:00', 15.99, 'Downtown'),
    (1002, 2, 102, '2023-02-25 11:30:00', 29.99, 'Uptown'),
    (1003, 1, 103, '2023-03-01 14:15:00', 12.50, 'Downtown'),
    (1004, 3, 101, '2023-03-15 09:45:00', 15.99, 'Downtown'),
    (1005, 4, 104, '2023-04-10 16:00:00', 22.00, 'Midtown'),
    (1006, 2, 105, '2023-05-05 13:00:00', 18.75, 'Uptown'),
    (1007, 1, 102, '2023-05-10 10:30:00', 29.99, 'Downtown'),
    (1008, 3, 103, '2023-05-12 11:00:00', 12.50, 'Downtown')
]
cursor.executemany("INSERT INTO transactions VALUES (?, ?, ?, ?, ?, ?)", transactions_data)

conn.commit()
conn.close()
print("Sample database 'bookcycle.db' created and populated successfully!")

Sample database 'bookcycle.db' created and populated successfully!


In [3]:
conn = sqlite3.connect('bookcycle.db')

<b>Step 2:</b> Let's start with a simple INNER JOIN to get customer names along with their transaction details:

In [6]:
query = """
SELECT c.customer_id, c.join_date, t.transaction_id, t.date_time, t.sale_price
FROM customers c
INNER JOIN transactions t ON c.customer_id = t.customer_id
LIMIT 5;
"""

df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,customer_id,join_date,transaction_id,date_time,sale_price
0,1,2023-01-15,1001,2023-01-20 10:00:00,15.99
1,2,2023-02-20,1002,2023-02-25 11:30:00,29.99
2,1,2023-01-15,1003,2023-03-01 14:15:00,12.5
3,3,2023-03-10,1004,2023-03-15 09:45:00,15.99
4,4,2023-04-05,1005,2023-04-10 16:00:00,22.0


<b>Step 3: Try it yourself:</b> Write a query to get the customer's preferred store along with their transaction details:

In [7]:
query = """
SELECT c.customer_id, c.first_name, c.last_name, c.preferred_store, t.transaction_id, t.date_time, t.sale_price
FROM customers c
INNER JOIN transactions t ON c.customer_id = t.customer_id
LIMIT 5;
"""

df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,customer_id,first_name,last_name,preferred_store,transaction_id,date_time,sale_price
0,1,Alice,Smith,Downtown,1001,2023-01-20 10:00:00,15.99
1,2,Bob,Johnson,Uptown,1002,2023-02-25 11:30:00,29.99
2,1,Alice,Smith,Downtown,1003,2023-03-01 14:15:00,12.5
3,3,Charlie,Brown,Downtown,1004,2023-03-15 09:45:00,15.99
4,4,Diana,Prince,Midtown,1005,2023-04-10 16:00:00,22.0


 <b>üí° Tip:</b> Remember to include the new column in your SELECT statement and keep the join condition the same.

### Activity 2: Exploring LEFT JOIN

BookCycle wants to identify customers who haven't made any purchases yet.

<b>Step 1:</b> Here's an example of a LEFT JOIN to get all customers and their transactions (if any):

In [8]:
query = """
SELECT c.customer_id, c.join_date, t.transaction_id
FROM customers c
LEFT JOIN transactions t ON c.customer_id = t.customer_id
LIMIT 10;
"""

df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,customer_id,join_date,transaction_id
0,1,2023-01-15,1001.0
1,1,2023-01-15,1003.0
2,1,2023-01-15,1007.0
3,2,2023-02-20,1002.0
4,2,2023-02-20,1006.0
5,3,2023-03-10,1004.0
6,3,2023-03-10,1008.0
7,4,2023-04-05,1005.0
8,5,2023-05-01,


<b>Step 2: Try it yourself:</b> Write a query to find customers who haven't made any purchases:

In [None]:
query = """
<YOUR CODE HERE>
"""

df = pd.read_sql_query(query, conn)
display(df)

 <b>üí° Tip:</b> Use a WHERE clause to filter for NULL transaction_id values.

### Activity 3: Multi-table Joins

BookCycle wants to analyze which books are popular in different store locations.

<b>Step 1:</b> Here's an example of joining three tables to get customer, transaction, and book information:

In [9]:
query = """
SELECT t.store_location, b.title, COUNT(*) as purchase_count
FROM customers c
JOIN transactions t ON c.customer_id = t.customer_id
JOIN books b ON t.book_id = b.book_id
GROUP BY t.store_location, b.title
LIMIT 5;
"""

df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,store_location,title,purchase_count
0,Downtown,Data Science for All,1
1,Downtown,Mystery on Elm Street,2
2,Downtown,The Great Adventure,2
3,Midtown,Cooking with Love,1
4,Uptown,Data Science for All,1


<b>Step 2:</b> Write a query to find the most popular book (by purchase count) for each store location:

In [10]:
query = """
SELECT rb.store_location, rb.title, rb.purchase_count
FROM (
    SELECT
        t.store_location,
        b.title,
        COUNT(*) AS purchase_count
    FROM customers c
    JOIN transactions t ON c.customer_id = t.customer_id
    JOIN books b ON t.book_id = b.book_id
    GROUP BY t.store_location, b.title
) AS rb
WHERE rb.purchase_count = (
    -- Get the max purchase count for each store_location
    SELECT MAX(sub.purchase_count)
    FROM (
        SELECT
            t2.store_location,
            b2.title,
            COUNT(*) AS purchase_count
        FROM customers c2
        JOIN transactions t2 ON c2.customer_id = t2.customer_id
        JOIN books b2 ON t2.book_id = b2.book_id
        GROUP BY t2.store_location, b2.title
    ) AS sub
    WHERE sub.store_location = rb.store_location
)
ORDER BY rb.store_location, rb.title;

"""

df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,store_location,title,purchase_count
0,Downtown,Mystery on Elm Street,2
1,Downtown,The Great Adventure,2
2,Midtown,Cooking with Love,1
3,Uptown,Data Science for All,1
4,Uptown,Gardening Basics,1


<b>üí° Tip:</b> Here you are using GROUP BY, ORDER BY, and a subquery, which you will learn more about in upcoming modules.

#### Close the Connection
It's good practice to close the database connection when you're done

In [None]:
# Close the database connection
conn.close()

## ‚úÖ Success Checklist
- Successfully implemented INNER JOIN to combine customer and transaction data
- Used LEFT JOIN to identify customers without purchases
- Created a multi-table join to analyze book popularity by store location
- All queries run without errors and produce meaningful results

## üîç Common Issues & Solutions

- Problem: Join returning unexpected number of rows
    - Solution: Double-check your join conditions and ensure you're not creating unintended Cartesian products

- Problem: Column ambiguity errors
    - Solution: Always qualify column names with table aliases when joining tables

## ‚û°Ô∏è Summary
Great job completing this lab on SQL joins! You've gained valuable skills in combining data from multiple tables, which is crucial for comprehensive data analysis in real-world scenarios.

### üîë Key Points
- INNER JOIN combines rows from two tables based on a matching condition
- LEFT JOIN returns all rows from the left table and matching rows from the right table
- Multi-table joins allow for complex analyses across various data entities