# Reading Data from MySQL and SQLite in Python

This Jupyter Notebook demonstrates how to read data from MySQL and SQLite databases using Python, with a focus on handling large datasets efficiently. We'll use `mysql-connector-python` for MySQL and the built-in `sqlite3` module for SQLite. The notebook covers:
- Basic database connections and queries
- Handling large datasets with chunking and pagination
- Connection pooling for MySQL
- Visualizing data with matplotlib
- Best practices for large databases

## Prerequisites
- Install required libraries: `pip install mysql-connector-python pandas matplotlib`
- Set up MySQL and SQLite databases (see Setup section).
- Replace placeholders (`your_username`, `your_password`) with actual MySQL credentials.

## Setup

### Install Libraries
```bash
pip install mysql-connector-python pandas matplotlib
```

### MySQL Database Setup
Run the following SQL commands in your MySQL client to create a sample database:
```sql
CREATE DATABASE sales_db;
USE sales_db;
CREATE TABLE sales (
    id INT AUTO_INCREMENT PRIMARY KEY,
    product VARCHAR(100),
    amount DECIMAL(10, 2),
    sale_date DATE,
    INDEX idx_sale_date (sale_date)
);
```

### SQLite Database Setup
Create a SQLite database file (`sales.db`) with the following Python code:


In [None]:
import sqlite3

# Create SQLite database and table
conn = sqlite3.connect('sales.db')
cursor = conn.cursor()
cursor.execute('''
    CREATE TABLE IF NOT EXISTS sales (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        product TEXT,
        amount REAL,
        sale_date TEXT
    )
''')
conn.commit()
conn.close()

## MySQL: Reading Data

### Basic Connection and Query
Connect to MySQL and fetch a small subset of data from the `sales` table.

In [None]:
import mysql.connector
from mysql.connector import Error

try:
    connection = mysql.connector.connect(
        host='localhost',
        database='sales_db',
        user='your_username',
        password='your_password'
    )
    query = "SELECT * FROM sales WHERE sale_date >= '2023-01-01' LIMIT 10"
    cursor = connection.cursor()
    cursor.execute(query)
    rows = cursor.fetchall()
    for row in rows:
        print(row)
except Error as e:
    print(f"Error: {e}")
finally:
    if connection.is_connected():
        cursor.close()
        connection.close()

### Handling Large Datasets
Use chunking with `fetchmany` and an unbuffered cursor to process large datasets efficiently. Also, implement pagination with `LIMIT` and `OFFSET`.

In [None]:
import mysql.connector
import pandas as pd

def fetch_in_chunks(query, chunk_size=1000):
    try:
        connection = mysql.connector.connect(
            host='localhost',
            database='sales_db',
            user='your_username',
            password='your_password'
        )
        cursor = connection.cursor(buffered=False)  # Stream results
        cursor.execute(query)
        while True:
            chunk = cursor.fetchmany(chunk_size)
            if not chunk:
                break
            df = pd.DataFrame(chunk, columns=[desc[0] for desc in cursor.description])
            yield df
    except Error as e:
        print(f"Error: {e}")
    finally:
        cursor.close()
        connection.close()

# Example: Process large dataset
query = "SELECT * FROM sales"
for df_chunk in fetch_in_chunks(query, chunk_size=1000):
    print(df_chunk.head())  # Process chunk (e.g., save to CSV, analyze)

# Pagination Example
def fetch_paginated(page, page_size=1000):
    offset = (page - 1) * page_size
    query = f"SELECT * FROM sales LIMIT {page_size} OFFSET {offset}"
    connection = mysql.connector.connect(
        host='localhost',
        database='sales_db',
        user='your_username',
        password='your_password'
    )
    df = pd.read_sql(query, connection)
    connection.close()
    return df

# Fetch page 1
df_page = fetch_paginated(page=1)
print(df_page)

### Connection Pooling
Use connection pooling for efficient handling of multiple queries.

In [None]:
from mysql.connector.pooling import MySQLConnectionPool

pool_config = {
    "pool_name": "mypool",
    "pool_size": 5,
    "host": "localhost",
    "database": "sales_db",
    "user": "your_username",
    "password": "your_password"
}
db_pool = MySQLConnectionPool(**pool_config)

# Fetch data using pooled connection
connection = db_pool.get_connection()
cursor = connection.cursor()
cursor.execute("SELECT COUNT(*) FROM sales")
count = cursor.fetchone()[0]
print(f"Total rows: {count}")
cursor.close()
connection.close()

## SQLite: Reading Data

### Basic Connection and Query
Connect to SQLite and fetch data from the `sales` table.

In [None]:
import sqlite3

try:
    connection = sqlite3.connect('sales.db')
    cursor = connection.cursor()
    cursor.execute("SELECT * FROM sales WHERE sale_date >= '2023-01-01' LIMIT 10")
    rows = cursor.fetchall()
    for row in rows:
        print(row)
except sqlite3.Error as e:
    print(f"Error: {e}")
finally:
    cursor.close()
    connection.close()

### Handling Large Datasets
Use `fetchmany` to process large datasets in chunks.

In [None]:
import sqlite3
import pandas as pd

def fetch_in_chunks(query, chunk_size=1000):
    try:
        connection = sqlite3.connect('sales.db')
        cursor = connection.cursor()
        cursor.execute(query)
        while True:
            chunk = cursor.fetchmany(chunk_size)
            if not chunk:
                break
            df = pd.DataFrame(chunk, columns=[desc[0] for desc in cursor.description])
            yield df
    except sqlite3.Error as e:
        print(f"Error: {e}")
    finally:
        cursor.close()
        connection.close()

# Example: Process large dataset
query = "SELECT * FROM sales"
for df_chunk in fetch_in_chunks(query, chunk_size=1000):
    print(df_chunk.head())  # Process chunk

## Visualization

Visualize sample data (e.g., row counts) from MySQL and SQLite databases using `matplotlib`.

In [None]:
import matplotlib.pyplot as plt

# Sample data (replace with actual row counts)
databases = ['MySQL', 'SQLite']
row_counts = [1000000, 500000]  # Example counts

plt.bar(databases, row_counts, color=['#1E90FF', '#FF6347'])
plt.title('Database Row Counts')
plt.xlabel('Database')
plt.ylabel('Rows')
plt.show()

## Best Practices for Large Databases

- **Indexing**: Create indexes on frequently queried columns (e.g., `sale_date`).
- **_CHUNKING**: Use `fetchmany` or pandas chunking to process large datasets.
- **Pagination**: Implement `LIMIT` and `OFFSET` for controlled data retrieval.
- **Connection Pooling**: Use for MySQL to manage multiple connections efficiently.
- **Error Handling**: Always include try-except blocks for robust code.
- **Optimize Queries**: Avoid `SELECT *`; specify required columns.

## Notes
- Replace placeholders (`your_username`, `your_password`) with actual MySQL credentials.
- For large datasets, adjust `chunk_size` based on available memory.
- Store credentials securely (e.g., in environment variables) for production use.
- Consider using `SQLAlchemy` for more advanced database operations.