###EXAMPLE:

In [None]:
# Step 1: Import necessary libraries
import sqlite3
import pandas as pd

# Step 2: Connect to a SQLite database
conn = sqlite3.connect('sales_data.db')
cursor = conn.cursor()

# Step 3: Create tables
cursor.execute('''
CREATE TABLE IF NOT EXISTS sales (
    order_id INTEGER PRIMARY KEY,
    customer_id INTEGER,
    product_id INTEGER,
    sale_date TEXT,
    amount REAL
);
''')

cursor.execute('''
CREATE TABLE IF NOT EXISTS products (
    product_id INTEGER PRIMARY KEY,
    product_name TEXT
);
''')

# Step 4: Insert data
sales_data = [
    (1, 101, 1, '2023-01-01', 150.00),
    (2, 102, 2, '2023-01-02', 200.50),
    (3, 101, 3, '2023-01-03', 75.25),
    (4, 103, 1, '2023-01-04', 150.00),
    (5, 102, 2, '2023-01-05', 200.50)
]
cursor.executemany("INSERT OR IGNORE INTO sales VALUES (?, ?, ?, ?, ?)", sales_data)

products_data = [
    (1, 'Laptop'),
    (2, 'Monitor'),
    (3, 'Mouse')
]
cursor.executemany("INSERT OR IGNORE INTO products VALUES (?, ?)", products_data)

conn.commit()
print("Database populated successfully!")

# Step 5: Simple SQL query
query_1 = "SELECT * FROM sales WHERE amount > 150;"
df_high_sales = pd.read_sql_query(query_1, conn)
print("\n--- Sales with Amount > $150 ---")
print(df_high_sales)

# Step 6: JOIN query
query_2 = """
SELECT s.order_id, s.sale_date, s.amount, p.product_name
FROM sales AS s
JOIN products AS p
ON s.product_id = p.product_id;
"""
df_sales = pd.read_sql_query(query_2, conn)
print("\n--- Sales with Product Names ---")
print(df_sales)

# Step 7: GROUP BY query
query_3 = """
SELECT customer_id, SUM(amount) AS total_amount
FROM sales
GROUP BY customer_id
ORDER BY total_amount DESC;
"""
df_summary = pd.read_sql_query(query_3, conn)
print("\n--- Total Sales per Customer ---")
print(df_summary)

# Step 8: HAVING query (filter aggregated results)
query_4 = """
SELECT customer_id, SUM(amount) AS total_amount
FROM sales
GROUP BY customer_id
HAVING SUM(amount) > 200
ORDER BY total_amount DESC;
"""
df_having = pd.read_sql_query(query_4, conn)
print("\n--- Customers with Total Sales > $200 ---")
print(df_having)

# Step 9: Close connection
conn.close()
print("\nConnection to database closed.")


Database populated successfully!

--- Sales with Amount > $150 ---
   order_id  customer_id  product_id   sale_date  amount
0         2          102           2  2023-01-02   200.5
1         5          102           2  2023-01-05   200.5

--- Sales with Product Names ---
   order_id   sale_date  amount product_name
0         1  2023-01-01  150.00       Laptop
1         2  2023-01-02  200.50      Monitor
2         3  2023-01-03   75.25        Mouse
3         4  2023-01-04  150.00       Laptop
4         5  2023-01-05  200.50      Monitor

--- Total Sales per Customer ---
   customer_id  total_amount
0          102        401.00
1          101        225.25
2          103        150.00

--- Customers with Total Sales > $200 ---
   customer_id  total_amount
0          102        401.00
1          101        225.25

Connection to database closed.


# Key Takeaways: SQL

1. **Relational Databases Structure**
   - Data is organized into **tables**, consisting of **rows** (records) and **columns** (attributes).  
   - **Primary Keys (PK)** uniquely identify rows, and **Foreign Keys (FK)** link tables to maintain **referential integrity**.

2. **Core SQL Commands**
   - `CREATE TABLE` – define a new table.  
   - `INSERT INTO` – add rows of data.  
   - `SELECT` – retrieve data.  
   - `WHERE` – filter rows.  
   - `UPDATE` / `DELETE` – modify or remove data.  
   - `DROP TABLE` – remove a table permanently.  

3. **The “Big 6” Elements of a SELECT Statement**
   - **SELECT:** Choose columns.  
   - **FROM:** Specify tables.  
   - **WHERE:** Filter rows.  
   - **GROUP BY:** Aggregate rows.  
   - **HAVING:** Filter aggregated results.  
   - **ORDER BY:** Sort results.  
   - **LIMIT:** Restrict the number of rows returned.

4. **JOINs for Combining Tables**
   - **INNER JOIN:** Only matching rows.  
   - **LEFT JOIN:** All left table rows, matched right table rows.  
   - **RIGHT JOIN:** All right table rows, matched left table rows.  
   - **FULL OUTER JOIN:** All rows from both tables, NULL for missing matches.

5. **Aggregations and Filtering**
   - Use `SUM()`, `COUNT()`, `AVG()`, `MIN()`, `MAX()` for aggregation.  
   - Use `GROUP BY` to summarize data per category.  
   - Use `HAVING` to filter **after aggregation** (cannot use WHERE for aggregated results).

6. **SQL in Python with Pandas**
   - `sqlite3` allows creating a **lightweight database** in Colab.  
   - Use `pd.read_sql_query()` to **load SQL query results directly into a DataFrame** for analysis.  
   - Combining SQL + Pandas enables **powerful data workflows** in Python.

7. **Best Practices**
   - Always use **WHERE** when updating or deleting rows.  
   - Use **table aliases** for readability in JOINs.  
   - Use **LIMIT** when exploring large datasets to preview data efficiently.  
   - Test queries on **sample data** before running on full datasets.  

---

**Conclusion:**  
By mastering table creation, data insertion, SELECT statements, JOINs, aggregation, and integration with Pandas, you can perform **complex data analysis** efficiently in SQL and Python. This chapter lays the foundation for building **real-world data pipelines and analytical workflows**.


## Knowledge Check

<iframe src="https://docs.google.com/forms/d/e/1FAIpQLSduyV-41gyQSCvhPZVweI7VZjrayWSMa2OFB-ra-BsTnRPgeQ/viewform?embedded=true" width="100%" height="800px" frameborder="0" style="min-height: 800px; height: 100vh">Loading…</iframe>

<iframe src="https://docs.google.com/forms/d/e/1FAIpQLSduyV-41gyQSCvhPZVweI7VZjrayWSMa2OFB-ra-BsTnRPgeQ/viewform?embedded=true" width="100%" height="800px" frameborder="0" style="min-height: 800px; height: 100vh">Loading…</iframe>

