# Optional: Working with Databases

Storing and querying data in SQL databases.

## Learning Objectives

1. Understand when to use databases vs files
2. Create and query SQLite databases
3. Use pandas with SQL
4. Design simple database schemas

In [None]:
! pip install -q pycse
from pycse.colab import pdf

In [None]:
import sqlite3
import pandas as pd
import numpy as np

## When to Use Databases

| Use Case | Files | Database |
|----------|-------|----------|
| Small datasets (<10K rows) | ✓ | |
| Large datasets | | ✓ |
| Multiple users accessing data | | ✓ |
| Complex queries | | ✓ |
| Data integrity important | | ✓ |
| One-time analysis | ✓ | |

In [None]:
# Create an SQLite database
conn = sqlite3.connect('experiments.db')
cursor = conn.cursor()

# Create tables
cursor.execute('''
CREATE TABLE IF NOT EXISTS experiments (
    id INTEGER PRIMARY KEY,
    date TEXT,
    temperature REAL,
    pressure REAL,
    catalyst TEXT,
    conversion REAL
)
''')

conn.commit()
print("Database created!")

In [None]:
# Insert data
experiments = [
    ('2024-01-15', 350, 5.0, 'Pt/Al2O3', 0.72),
    ('2024-01-16', 400, 5.0, 'Pt/Al2O3', 0.85),
    ('2024-01-17', 350, 10.0, 'Pt/Al2O3', 0.78),
    ('2024-01-18', 400, 10.0, 'Pd/Al2O3', 0.82),
    ('2024-01-19', 450, 5.0, 'Pd/Al2O3', 0.91),
]

cursor.executemany('''
INSERT INTO experiments (date, temperature, pressure, catalyst, conversion)
VALUES (?, ?, ?, ?, ?)
''', experiments)

conn.commit()
print(f"Inserted {len(experiments)} records")

In [None]:
# Query with pandas
df = pd.read_sql_query('SELECT * FROM experiments', conn)
df

In [None]:
# Filtered queries
high_conversion = pd.read_sql_query('''
SELECT * FROM experiments 
WHERE conversion > 0.8
ORDER BY conversion DESC
''', conn)

print("High conversion experiments:")
high_conversion

In [None]:
# Aggregation
summary = pd.read_sql_query('''
SELECT catalyst, 
       COUNT(*) as n_experiments,
       AVG(conversion) as avg_conversion,
       MAX(conversion) as max_conversion
FROM experiments
GROUP BY catalyst
''', conn)

print("Summary by catalyst:")
summary

In [None]:
# Save DataFrame to database
new_data = pd.DataFrame({
    'date': ['2024-01-20', '2024-01-21'],
    'temperature': [375, 425],
    'pressure': [7.5, 7.5],
    'catalyst': ['Pt/Al2O3', 'Pt/Al2O3'],
    'conversion': [0.79, 0.88]
})

new_data.to_sql('experiments', conn, if_exists='append', index=False)
print("Data appended!")

In [None]:
# Clean up
conn.close()
import os
os.remove('experiments.db')
print("Database cleaned up")

## Summary

- SQLite is a lightweight database for local use
- Use `pd.read_sql_query()` to load data into pandas
- Use `df.to_sql()` to save DataFrames to database
- SQL queries enable complex filtering and aggregation

For production use, consider PostgreSQL or MySQL.