# SQL REVIEW

### Important SQL keywords:

```sql
-- DDL (Data Definition Language)
CREATE       -- create tables, databases
ALTER        -- modify table structure, rename tables/columns
DROP         -- delete tables, databases

-- DML (Data Manipulation Language)
INSERT       -- add rows
UPDATE       -- change existing rows
DELETE       -- remove rows
SELECT       -- read/query data

-- Other Key Clauses
WHERE        -- filter rows
ORDER BY     -- sort rows
GROUP BY     -- group rows
HAVING       -- filter groups
JOIN         -- combine rows from other tables
UNION        -- merge result sets
LIMIT        -- restrict result count
DISTINCT     -- remove duplicates
IN, LIKE, BETWEEN, IS NULL -- special filters

-- Table constraints
PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL, DEFAULT, CHECK
```

### Data types
```sql
-- Numbers
INT / INTEGER
FLOAT / REAL / DOUBLE
DECIMAL(p, s)  -- exact precision (e.g., money)

-- Strings
VARCHAR(n)     -- variable length
CHAR(n)        -- fixed length
TEXT           -- long text

-- Dates and Times
DATE
TIME
TIMESTAMP
DATETIME

-- Boolean
BOOLEAN / BOOL

-- Others
BLOB           -- binary large object
```

### Creating tables

```sql
CREATE TABLE users (
  id INT PRIMARY KEY,
  name VARCHAR(100) NOT NULL,
  email VARCHAR(150) UNIQUE,
  age INT DEFAULT 18,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE orders (
  id INT PRIMARY KEY,
  user_id INT NOT NULL,
  product VARCHAR(100) NOT NULL,
  quantity INT DEFAULT 1,
  order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (user_id) REFERENCES users(id)
);
```

### Altering Tables

```sql
-- Add column
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- Modify column (depends on DB)
ALTER TABLE users ALTER COLUMN age SET DEFAULT 21;

-- Drop column
ALTER TABLE users DROP COLUMN phone;

-- Rename column or table
ALTER TABLE users RENAME COLUMN name TO full_name;
ALTER TABLE users RENAME TO members;
```

### Inserting data into table

inserting one row

```sql
INSERT INTO users (id, name, email, age)
VALUES (1, 'Alice', 'alice@example.com', 25);
```

bulk insert (notice the comma between values)
```sql
INSERT INTO users (id, name, email, age)
VALUES 
(2, 'Bob', 'bob@example.com', 30),
(3, 'Eve', 'eve@example.com', 22);
```

### Selecting Data

select will return rows from the table given

selecting all columns:
```sql
SELECT * FROM users;
```

selecting specific columns:
```sql
SELECT name, email FROM users;
```

filtering rows while selecting:
note: we do this when we want to filter the rows we've selected
```sql
SELECT * FROM users WHERE age > 25;
SELECT * FROM users WHERE name LIKE 'A%';
SELECT * FROM users WHERE email IS NOT NULL; -- cannot use `=` with Null;
SELECT * FROM users WHERE age BETWEEN 18 AND 30;
```

ordering results:
```sql
SELECT * FROM users ORDER BY age ASC;
SELECT * FROM users ORDER BY created_at DESC;
```

grouping with GROUP BY (this allows us to use an aggregate method):
```sql
SELECT age, COUNT(*) AS num_users
FROM users
GROUP BY age;
```

filtering by grouping using GROUP BY and HAVING:
```sql
SELECT age, COUNT(*) AS num_users
FROM users
GROUP BY age
HAVING COUNT(*) > 1;
```
note: we do this when we want to filter the results of a grouping

### Joins

inner joins:
```sql
SELECT *
FROM orders
INNER JOIN users ON orders.user_id = users.id;
```

left joins:
```sql
SELECT *
FROM users
LEFT JOIN orders ON users.id = orders.user_id;
```
right joins:
```sql
SELECT *
FROM users
RIGHT JOIN orders ON users.id = orders.user_id;
```
full outer joins:
```sql
SELECT *
FROM users
FULL OUTER JOIN orders ON users.id = orders.user_id;

cross joins (cartesian product):
```
SELECT *
FROM users
CROSS JOIN roles;
```sql

## 1. Import Required Libraries
We'll use `sqlite3` for SQL and `pandas` to load the CSV.

In [None]:
import sqlite3
import pandas as pd

## 2. Load a Simple CSV Dataset
Let's create a tiny CSV in memory for this demo. In practice, you would use `pd.read_csv('filename.csv')`.

## Side-by-side: Pandas vs SQL with Users and Products
We'll load users and products from CSV, then show how to do the same operations in pandas and SQL.

In [None]:
import pandas as pd
import sqlite3
from io import StringIO

users_csv = '''id,name,email,age
1,Alice,alice@example.com,25
2,Bob,bob@example.com,30
3,Eve,eve@example.com,22
4,David,david@example.com,28
'''
products_csv = '''id,name,price,stock
1,Widget,19.99,100
2,Gadget,29.99,50
3,Thing,9.99,200
'''
users_df = pd.read_csv(StringIO(users_csv))
products_df = pd.read_csv(StringIO(products_csv))

In [None]:
users_df

In [None]:
products_df

### Create SQLite DB, tables, and insert data

In [None]:
import csv

conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('''CREATE TABLE users (id INT PRIMARY KEY, name TEXT, email TEXT, age INT)''')
c.execute('''CREATE TABLE products (id INT PRIMARY KEY, name TEXT, price REAL, stock INT)''')
with open('users.csv', 'w') as f:
    f.write(users_csv)

with open('users.csv', newline='') as f:
    reader = csv.reader(f)
    next(reader)  # skip header row containing column names
    for row in reader:
        c.execute(
            "INSERT INTO users (id, name, email, age) VALUES (?, ?, ?, ?)",
            # csv.reader gives us lists so here we have to use the indices
            (int(row[0]), row[1], row[2], int(row[3]))
        )

with open('products.csv', 'w') as f:
    f.write(products_csv)

with open('products.csv', newline='') as f:
        # we use DictReader here to turn each row of our csv into a dictionary
    # note using `csv.reader` would work as well but we'd need to skip the first row
    # since it's a list of the column names. DictReader turns our column names into the key
    # for each row's dictionary
    reader = csv.DictReader(f)
    for row in reader:
        c.execute(
            "INSERT INTO products (id, name, price, stock) VALUES (?, ?, ?, ?)",
            # csv.DictReader gives us a dictionary for each row and we can refer to each column by name
            (int(row['id']), row['name'], float(row['price']), int(row['stock']))
        )

# We could've also turned our pandas dataframes into sql tables using the following:
# users_df.to_sql('users', conn, if_exists='replace', index=False)
# products_df.to_sql('products', conn, if_exists='replace', index=False)

### Select all users/products (Pandas vs SQL)

In [None]:
# Pandas
users_df

In [None]:
products_df

In [None]:
# this command allows us to select from an sql db 
# and gives us back the results as a dataframe!
pd.read_sql_query('SELECT * FROM users', conn)

### SELECT specific columns: name and age from users

In [None]:
users_df[["name", "age"]]

In [None]:
# Pandas
pd.read_sql_query('SELECT name, age FROM users', conn)

In [None]:
# Pure Python
c.execute('SELECT name, age FROM users')
print(c.fetchall())

### Filter users age > 25 (Pandas vs SQL)

In [None]:
# Pandas
users_df[users_df['age'] > 25]

In [None]:
# SQL
pd.read_sql_query('SELECT * FROM users WHERE age > 25', conn)

In [None]:
# Pure Python
c.execute('SELECT * FROM users WHERE Age > 25')
print(c.fetchall())

### SELECT with ORDER BY and LIMIT: top 2 oldest users

In [None]:
users_df.sort_values('age', ascending=False).head(2)

In [None]:
pd.read_sql_query('SELECT * FROM users ORDER BY age DESC LIMIT 2', conn)

In [None]:
c.execute('SELECT * FROM users ORDER BY age DESC LIMIT 2')
print(c.fetchall())

### WHERE with multiple conditions: users age > 22 and name starts with 'A' or 'D'

In [None]:
users_df[(users_df['age'] > 22) & (users_df['name'].str.startswith(('A', 'D')))]

In [None]:
pd.read_sql_query("SELECT * FROM users WHERE age > 22 AND (name LIKE 'A%' OR name LIKE 'D%')", conn)

In [None]:
c.execute("SELECT * FROM users WHERE age > 22 AND (name LIKE 'A%' OR name LIKE 'D%')")
print(c.fetchall())

### WHERE with IN, BETWEEN, IS NULL (add a row with NULL age for demo)

In [None]:
c.execute("INSERT INTO users (id, name, email, age) VALUES (?, ?, ?, ?)", (5, 'NullGuy', 'nullguy@example.com', None))
conn.commit()

In [None]:
# IN
pd.read_sql_query("SELECT * FROM users WHERE name IN ('Alice', 'Eve')", conn)

In [None]:
# BETWEEN
pd.read_sql_query("SELECT * FROM users WHERE age BETWEEN 23 AND 29", conn)

In [None]:
# IS NULL
pd.read_sql_query("SELECT * FROM users WHERE age IS NULL", conn)

In [None]:
# Pure Python for IS NULL
c.execute("SELECT * FROM users WHERE age IS NULL")
print(c.fetchall())

### Group by: count users by age (Pandas vs SQL)

In [None]:
# Pandas
users_df.groupby('age').size().reset_index(name='num_users')

In [None]:
# SQL
pd.read_sql_query('SELECT age, COUNT(*) as num_users FROM users GROUP BY age', conn)

In [None]:
# Pure Python
c.execute('SELECT age, COUNT(*) as num_users FROM users GROUP BY age')
print(c.fetchall())

### GROUP BY with Aggregate Methods and HAVING

In [None]:
# Pure Python GROUP BY COUNT
c.execute('SELECT age, COUNT(*) as num_users FROM users GROUP BY age')
print(c.fetchall())

In [None]:
# COUNT
pd.read_sql_query('SELECT age, COUNT(*) as num_users FROM users GROUP BY age', conn)

In [None]:
# SUM and AVG on products
pd.read_sql_query('SELECT SUM(price) as total_price, AVG(price) as avg_price FROM products', conn)

In [None]:
# HAVING
pd.read_sql_query('SELECT age, COUNT(*) as num_users FROM users GROUP BY age HAVING num_users > 1', conn)

#### Examples of Joins

## Examples of Joins Between Users and Products (Pandas, SQL via Pandas, Pure Python)

### INNER JOIN: users and products on id (Pandas)

In [None]:
pd.merge(users_df, products_df, left_on='id', right_on='id', how='inner')

### INNER JOIN: users and products on id (SQL via Pandas)

In [None]:
pd.read_sql_query('SELECT users.*, products.name as product_name, products.price FROM users INNER JOIN products ON users.id = products.id', conn)

### INNER JOIN: users and products on id (Pure Python)

In [None]:
c.execute('SELECT users.*, products.name, products.price FROM users INNER JOIN products ON users.id = products.id')
print(c.fetchall())

### LEFT JOIN: all users and their product if exists (Pandas)

In [None]:
pd.merge(users_df, products_df, left_on='id', right_on='id', how='left')

### LEFT JOIN: all users and their product if exists (SQL via Pandas)

In [None]:
pd.read_sql_query('SELECT users.*, products.name as product_name, products.price FROM users LEFT JOIN products ON users.id = products.id', conn)

### LEFT JOIN: all users and their product if exists (Pure Python)

In [None]:
c.execute('SELECT users.*, products.name, products.price FROM users LEFT JOIN products ON users.id = products.id')
print(c.fetchall())

### Close the SQLite connection

After our work, we still have an open database connection that we must close.

In [None]:
conn.close()