# Reading and Writing Data with Python and Pandas (Jupyter Notebook Ready)

This notebook provides a concise guide on how to read data from, and write data to, common data sources using Python's `pandas` library. We'll cover CSV, Excel, SQLite databases, and interacting with Web APIs.

In [None]:
# Essential Imports for this Notebook
import pandas as pd
import numpy as np
import requests
import sqlite3
import os # For file operations

# Optional: For better plot visualization in Jupyter
# %matplotlib inline 
# import matplotlib.pyplot as plt
# import seaborn as sns

print("Libraries imported successfully.")

## 1. CSV (Comma Separated Values) Files

CSV files are plain text files that store tabular data, with values separated by commas (or other delimiters).

### Reading from CSV

We'll use a public CSV file for demonstration. Note: If you're using a local file, ensure it's in the same directory as your notebook or provide the full path.

In [None]:
# Data Source URL (Example: California Housing Dataset)
csv_url = "https://raw.githubusercontent.com/ageron/handson-ml2/master/datasets/housing/housing.csv"

# Read the CSV file into a DataFrame
df_csv = pd.read_csv(csv_url)

print("--- DataFrame from CSV (head) ---")
print(df_csv.head())

print("\n--- DataFrame Info ---")
df_csv.info()

### Writing to CSV

We'll save a portion of our `df_csv` to a new CSV file.

In [None]:
# Take a subset of the DataFrame for writing
df_csv_subset = df_csv.head(100)

# Define the output file path
output_csv_path = 'housing_subset.csv'

# Write the DataFrame to a CSV file
# index=False prevents writing the DataFrame index as a column
df_csv_subset.to_csv(output_csv_path, index=False)

print(f"DataFrame saved to {output_csv_path}")

# Verify by reading the new CSV file
df_verify_csv = pd.read_csv(output_csv_path)
print("\n--- Verified CSV (head) ---")
print(df_verify_csv.head())

## 2. Excel Files (.xlsx, .xls)

Excel files can contain multiple sheets and more complex formatting. Pandas simplifies reading data from specific sheets.

**Note for Excel Data:** The dataset provided (`https://archive.ics.uci.edu/dataset/352/online+retail`) is a `.zip` file. You need to download it manually, extract it, and place `Online Retail.xlsx` in the same directory as your Jupyter notebook, or provide its full path. Due to the nature of web servers, it's generally not possible to directly read `.xlsx` files from arbitrary web URLs without specific server configurations or using a library that can extract from zip streams.

**Instructions:**
1.  Download the file from: `https://archive.ics.uci.edu/static/public/352/online+retail.zip`
2.  Extract the `Online Retail.xlsx` file.
3.  Place `Online Retail.xlsx` in the same directory as this notebook.

### Reading from Excel

In [None]:
# Local file path for the extracted Excel file
excel_file_path = 'Online Retail.xlsx'

if os.path.exists(excel_file_path):
    # Read the first 100 rows from the first sheet (default sheet_name=0)
    df_excel = pd.read_excel(excel_file_path, nrows=100)
    
    print("--- DataFrame from Excel (head) ---")
    print(df_excel.head())
    
    print("\n--- DataFrame Info ---")
    df_excel.info()
else:
    print(f"Error: {excel_file_path} not found. Please download and extract the Excel file as instructed above.")
    df_excel = pd.DataFrame() # Create an empty DataFrame to avoid errors later

### Writing to Excel

We'll save a processed version of `df_excel` to a new Excel file, potentially on a specific sheet.

In [None]:
if not df_excel.empty:
    # Add a dummy 'TotalCost' column for demonstration
    df_excel['TotalCost'] = df_excel['Quantity'] * df_excel['UnitPrice']
    
    # Define the output file path
    output_excel_path = 'processed_online_retail.xlsx'
    
    # Write the DataFrame to an Excel file on a specific sheet
    df_excel.to_excel(output_excel_path, index=False, sheet_name='ProcessedData')
    
    print(f"DataFrame saved to {output_excel_path} on sheet 'ProcessedData'")
    
    # Verify by reading the new Excel file
    df_verify_excel = pd.read_excel(output_excel_path, sheet_name='ProcessedData')
    print("\n--- Verified Excel (head) ---")
    print(df_verify_excel.head())
else:
    print("Skipping Excel write operation as df_excel is empty.")

## 3. SQLite Databases

SQLite is a lightweight, file-based SQL database. Python's `sqlite3` module provides native support, and Pandas integrates seamlessly.

### Connection Setup

We'll use an in-memory database (`:memory:`) for simplicity, but you can replace this with a file path like `'my_data.db'`.

In [None]:
# Connect to an in-memory SQLite database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()

# Create a dummy table and insert some data
cursor.execute('''
CREATE TABLE IF NOT EXISTS users (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    email TEXT UNIQUE
);
''')

sample_users = [
    (1, 'Alice', 'alice@example.com'),
    (2, 'Bob', 'bob@example.com'),
    (3, 'Charlie', 'charlie@example.com')
]
cursor.executemany("INSERT OR IGNORE INTO users (id, name, email) VALUES (?, ?, ?);", sample_users)
conn.commit()

print("SQLite database setup complete (in-memory).")

### Reading from SQLite

Use `pd.read_sql_query()` to execute SQL and get a DataFrame.

In [None]:
# Read all data from the 'users' table
df_sql_read = pd.read_sql_query("SELECT * FROM users;", conn)

print("--- DataFrame from SQLite (head) ---")
print(df_sql_read.head())

### Writing (Inserting) to SQLite

Use `df.to_sql()` to insert DataFrame contents into a table. Note the `if_exists` parameter.

In [None]:
# Create a new DataFrame with new user data
new_users_df = pd.DataFrame({
    'id': [4, 5],
    'name': ['Diana', 'Eve'],
    'email': ['diana@example.com', 'eve@example.com']
})

# Append new users to the 'users' table
new_users_df.to_sql('users', conn, if_exists='append', index=False)

print("New users added to SQLite via df.to_sql().")

# Verify insertion
print("\n--- Users after insertion ---")
print(pd.read_sql_query("SELECT * FROM users;", conn))

### Updating Data in SQLite (Direct SQL)

While `to_sql(if_exists='replace')` can overwrite entire tables, for targeted updates, direct SQL `UPDATE` statements are more efficient and safer.

In [None]:
# Update Bob's email using a direct SQL UPDATE statement
cursor.execute("UPDATE users SET email = ? WHERE name = ?;", ('bob.new@example.com', 'Bob'))
conn.commit()

print("Bob's email updated in SQLite.")

# Verify update
print("\n--- Users after update ---")
print(pd.read_sql_query("SELECT * FROM users;", conn))

### Deleting Data from SQLite (Direct SQL)

Similar to updates, direct SQL `DELETE` statements are used for removing specific records.

In [None]:
# Delete user 'Charlie' using a direct SQL DELETE statement
cursor.execute("DELETE FROM users WHERE name = ?;", ('Charlie',))
conn.commit()

print("Charlie deleted from SQLite.")

# Verify deletion
print("\n--- Users after deletion ---")
print(pd.read_sql_query("SELECT * FROM users;", conn))

### Close SQLite Connection

Always close your database connection when done.

In [None]:
conn.close()
print("SQLite connection closed.")

## 4. APIs (Application Programming Interfaces)

APIs are interfaces for web services to exchange data. We use the `requests` library to make HTTP calls and then convert JSON responses into DataFrames. Pandas does *not* have a direct `to_api()` method; instead, you build payloads and use `requests` for writing.

### Reading from APIs (GET Requests)

We'll use `jsonplaceholder.typicode.com` which provides dummy REST API data.

In [None]:
api_get_url = "https://jsonplaceholder.typicode.com/todos" # Endpoint for todos

try:
    response = requests.get(api_get_url)
    response.raise_for_status() # Raises an HTTPError for bad responses (4xx or 5xx)
    
    data = response.json() # Parse the JSON response into a Python list/dict
    df_api_read = pd.DataFrame(data)
    
    print("--- DataFrame from API (head) ---")
    print(df_api_read.head())

    print("\n--- DataFrame Info ---")
    df_api_read.info()
    
except requests.exceptions.RequestException as e:
    print(f"Error fetching data from API: {e}")

### Writing (Sending) Data to APIs (POST/PUT Requests)

For writing to APIs, you construct the data payload (often as a Python dictionary, converted to JSON) and send it using `requests.post()` (for creating new resources) or `requests.put()`/`requests.patch()` (for updating existing resources). Pandas itself doesn't directly manage this, but it can help prepare the data to be sent.

In [None]:
import json # Needed to convert Python dict to JSON string

api_post_url = "https://jsonplaceholder.typicode.com/posts" # Endpoint for creating posts

# Example: Create a new post from a dictionary
new_post_payload = {
    'title': 'My New Blog Post',
    'body': 'This is the content of my exciting new blog post.',
    'userId': 101 # A fictional user ID
}

# Convert the Python dictionary to a JSON string
json_data_to_send = json.dumps(new_post_payload)

# Set the Content-Type header to indicate we are sending JSON
headers = {'Content-Type': 'application/json'}

print("Attempting to send new post to API...")

try:
    response_post = requests.post(api_post_url, data=json_data_to_send, headers=headers)
    response_post.raise_for_status() # Check for HTTP errors
    
    print("New post created successfully via POST!")
    print("API Response:", response_post.json()) # The API typically returns the created resource with its new ID
    
except requests.exceptions.RequestException as e:
    print(f"Error sending data to API: {e}")


# --- Example for PUT (Updating an existing resource) ---
api_put_url = "https://jsonplaceholder.typicode.com/posts/1" # Updating post with ID 1

updated_post_payload = {
    'id': 1, # Often, the ID is included in the payload for PUT requests
    'title': 'Updated Title for Post 1',
    'body': 'This content has been revised and updated.',
    'userId': 1
}
json_data_to_update = json.dumps(updated_post_payload)

print("\nAttempting to update post ID 1 via PUT...")

try:
    response_put = requests.put(api_put_url, data=json_data_to_update, headers=headers)
    response_put.raise_for_status()
    
    print("Post ID 1 updated successfully via PUT!")
    print("API Response:", response_put.json())
    
except requests.exceptions.RequestException as e:
    print(f"Error updating data via API: {e}")

## Conclusion

This notebook covers the essential methods for data ingress (reading) and egress (writing/updating/deleting) with Pandas for common data formats and sources. Mastering these operations is foundational for any data analysis or engineering task in Python.