# SQL Module Test

This test covers all topics from the SQL module:
- SQLite basics (connecting, creating tables)
- CRUD operations (INSERT, SELECT, UPDATE, DELETE)
- Advanced queries (WHERE, ORDER BY, GROUP BY, JOINs)
- Pandas SQL integration (read_sql, to_sql)

**Instructions:**
1. Run the setup cell below to create the test database
2. Answer each question in the provided code cell
3. Each question specifies what output or result is expected
4. Do not modify the setup cell

## Setup - Run This Cell First

This creates the test database with sample data.

In [None]:
import sqlite3
import pandas as pd
from datetime import date

# Create an in-memory database for testing
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()

# Create tables
cursor.executescript('''
    CREATE TABLE departments (
        id INTEGER PRIMARY KEY,
        name TEXT NOT NULL,
        budget REAL NOT NULL
    );
    
    CREATE TABLE employees (
        id INTEGER PRIMARY KEY,
        name TEXT NOT NULL,
        department_id INTEGER,
        salary REAL NOT NULL,
        hire_date TEXT NOT NULL,
        FOREIGN KEY (department_id) REFERENCES departments (id)
    );
    
    CREATE TABLE projects (
        id INTEGER PRIMARY KEY,
        name TEXT NOT NULL,
        department_id INTEGER,
        start_date TEXT NOT NULL,
        end_date TEXT,
        FOREIGN KEY (department_id) REFERENCES departments (id)
    );
''')

# Insert sample data - Departments
departments_data = [
    (1, 'Engineering', 500000),
    (2, 'Marketing', 200000),
    (3, 'Sales', 300000),
    (4, 'Human Resources', 150000),
    (5, 'Research', 400000)
]
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', departments_data)

# Insert sample data - Employees
employees_data = [
    (1, 'Alice Johnson', 1, 95000, '2020-03-15'),
    (2, 'Bob Smith', 1, 85000, '2021-06-01'),
    (3, 'Carol White', 2, 72000, '2019-11-20'),
    (4, 'David Brown', 3, 68000, '2022-01-10'),
    (5, 'Eva Martinez', 1, 110000, '2018-05-22'),
    (6, 'Frank Wilson', 2, 65000, '2023-02-14'),
    (7, 'Grace Lee', 3, 78000, '2020-09-30'),
    (8, 'Henry Taylor', 4, 55000, '2021-04-05'),
    (9, 'Ivy Chen', 5, 92000, '2019-08-12'),
    (10, 'Jack Davis', 5, 88000, '2022-07-18'),
    (11, 'Karen Miller', 1, 78000, '2023-01-25'),
    (12, 'Leo Garcia', 3, 71000, '2020-12-01')
]
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', employees_data)

# Insert sample data - Projects
projects_data = [
    (1, 'Website Redesign', 1, '2023-01-01', '2023-06-30'),
    (2, 'Mobile App', 1, '2023-03-15', '2023-12-31'),
    (3, 'Marketing Campaign', 2, '2023-02-01', '2023-04-30'),
    (4, 'Sales Training', 3, '2023-04-01', '2023-05-15'),
    (5, 'AI Research', 5, '2023-01-01', None),
    (6, 'Product Launch', 2, '2023-06-01', '2023-08-31'),
    (7, 'Data Pipeline', 1, '2023-07-01', None)
]
cursor.executemany('INSERT INTO projects VALUES (?, ?, ?, ?, ?)', projects_data)

conn.commit()
print("Database setup complete!")
print(f"Tables created: departments, employees, projects")
print(f"Departments: {len(departments_data)} rows")
print(f"Employees: {len(employees_data)} rows")
print(f"Projects: {len(projects_data)} rows")

---
## Part 1: SQLite Basics (Questions 1-3)

### Question 1: Exploring Table Structure

Write a query to retrieve the schema information for the `employees` table using SQLite's `PRAGMA table_info()` command.

Print the column names and their data types.

In [None]:
# Your answer here


### Question 2: Creating a New Table

Create a new table called `skills` with the following structure:
- `id`: INTEGER, primary key
- `employee_id`: INTEGER, foreign key referencing employees(id)
- `skill_name`: TEXT, not null
- `proficiency_level`: INTEGER (1-5)

After creating the table, verify it exists by listing all tables in the database.

In [None]:
# Your answer here


### Question 3: Database Connection Context Manager

Write a function called `get_employee_count()` that:
1. Creates a new connection to an in-memory SQLite database
2. Creates an employees table with columns: id (INTEGER PRIMARY KEY), name (TEXT)
3. Inserts 3 sample employees
4. Returns the total count of employees
5. Uses the connection as a context manager to ensure proper cleanup

Call the function and print the result.

In [None]:
# Your answer here


---
## Part 2: CRUD Operations (Questions 4-6)

### Question 4: INSERT - Adding New Records

Insert a new department called 'Finance' with a budget of 250000.

Then insert two new employees into this Finance department:
- 'Maria Santos', salary 82000, hire_date '2023-06-15'
- 'Nathan Park', salary 76000, hire_date '2023-08-01'

Use parameterized queries to prevent SQL injection. Print the new records to verify.

In [None]:
# Your answer here


### Question 5: UPDATE - Modifying Records

The Engineering department has received a budget increase. Update the budget to 600000.

Also, give all employees in the Engineering department a 10% raise.

Print the updated department and employee records to verify the changes.

In [None]:
# Your answer here


### Question 6: DELETE - Removing Records

Delete all projects that have already ended (end_date is not NULL and end_date < '2023-07-01').

Print how many projects were deleted and list the remaining projects.

In [None]:
# Your answer here


---
## Part 3: Advanced Queries (Questions 7-10)

### Question 7: Filtering with WHERE and ORDER BY

Write a query to find all employees who:
- Were hired after January 1, 2020
- Have a salary greater than 70000

Order the results by salary in descending order.

Display the employee name, salary, and hire_date.

In [None]:
# Your answer here


### Question 8: GROUP BY with Aggregate Functions

Write a query that shows for each department:
- Department name
- Number of employees
- Average salary (rounded to 2 decimal places)
- Highest salary
- Lowest salary

Only include departments that have more than 1 employee.
Order by average salary descending.

In [None]:
# Your answer here


### Question 9: INNER JOIN

Write a query using INNER JOIN to display:
- Employee name
- Department name
- Project name(s) they are working on (based on their department)

Only show employees whose departments have active projects (projects with no end_date).

In [None]:
# Your answer here


### Question 10: LEFT JOIN and Subqueries

Write a query to find all departments and their total employee salary cost.

Use a LEFT JOIN to ensure departments with no employees are also included (showing 0 for salary cost).

Also include a column showing if the total salary cost exceeds 50% of the department's budget (show 'Over Budget Risk' or 'Within Budget').

In [None]:
# Your answer here


---
## Part 4: Pandas SQL Integration (Questions 11-14)

### Question 11: Reading SQL Data into Pandas

Use `pd.read_sql()` to load the employees table into a DataFrame.

Then use pandas methods to:
1. Display basic statistics for the salary column
2. Show the distribution of employees by department_id

In [None]:
# Your answer here


### Question 12: Complex Query with Pandas

Use `pd.read_sql()` with a JOIN query to create a DataFrame containing:
- Employee name
- Department name
- Salary
- Department budget

Add a new calculated column called 'salary_to_budget_ratio' showing what percentage of the department budget each employee's salary represents.

Display the top 5 employees by this ratio.

In [None]:
# Your answer here


### Question 13: Writing DataFrame to SQL

Create a pandas DataFrame with the following performance review data:

| employee_id | review_date | rating | comments |
|-------------|-------------|--------|----------|
| 1 | 2023-06-15 | 5 | Excellent work on the website redesign |
| 2 | 2023-06-15 | 4 | Strong technical skills |
| 3 | 2023-06-15 | 4 | Great marketing campaigns |
| 5 | 2023-06-15 | 5 | Outstanding leadership |

Use `to_sql()` to write this DataFrame to a new table called 'performance_reviews'.

Then query the table to verify the data was written correctly.

In [None]:
# Your answer here


### Question 14: Pandas SQL Analysis Challenge

Using pandas and SQL together, answer this business question:

**"Which department has the best return on investment in terms of projects per dollar of salary spent?"**

Calculate:
1. Total salary cost per department
2. Number of projects per department
3. Projects per $100,000 of salary cost

Display the results sorted by efficiency (projects per salary cost) descending.

Hint: You may need multiple queries or joins, and you can use pandas for the final calculations.

In [None]:
# Your answer here


---
## Cleanup

In [None]:
# Close the database connection when done
conn.close()
print("Database connection closed.")

---
## End of Test

Make sure you have answered all 14 questions before submitting.