# Example Data Analysis Notebook

This notebook demonstrates how to use the ETL modules for data analysis.

## Setup

Import required libraries and modules.

In [None]:
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Add parent directory to path for importing ETL modules
sys.path.append('..')

from etl import extract, transform, load

## Create Sample Data

Let's create some sample data to work with.

In [None]:
# Create sample dataset
data = {
    'user_id': [1, 2, 3, 4, 5],
    'username': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'email': ['alice@example.com', 'bob@example.com', 'charlie@example.com', 'david@example.com', 'eve@example.com'],
    'orders': [5, 3, 7, 2, 4],
    'total_spent': [150.50, 89.99, 220.00, 45.50, 110.25]
}

df = pd.DataFrame(data)
print("Sample Data:")
df.head()

## Data Analysis

Perform basic analysis on the data.

In [None]:
# Basic statistics
print("Basic Statistics:")
print(df.describe())

print("\nTotal Orders:", df['orders'].sum())
print("Total Revenue:", df['total_spent'].sum())
print("Average Order Value:", df['total_spent'].sum() / df['orders'].sum())

## Visualization

Create visualizations of the data.

In [None]:
# Create bar chart
plt.figure(figsize=(10, 6))
plt.bar(df['username'], df['total_spent'], color='skyblue')
plt.xlabel('User')
plt.ylabel('Total Spent ($)')
plt.title('Total Spending by User')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## ETL Pipeline Example

Demonstrate using the ETL modules.

In [None]:
# Transform data
df_cleaned = transform.clean_data(df)
df_normalized = transform.normalize_columns(df_cleaned, ['username', 'email'])

print("Transformed Data:")
df_normalized.head()

## Conclusion

This notebook demonstrates basic data analysis and ETL operations. You can extend this notebook to include:
- More complex transformations
- Additional visualizations
- Integration with external data sources
- Machine learning models