# ETL Pipeline Development Notebook

This notebook demonstrates how to run and develop the ETL pipeline interactively using Jupyter within Docker containers. We'll walk through setting up the environment and running each pipeline component.

In [None]:
# Import required libraries
import os
import pandas as pd
import requests
from sqlalchemy import create_engine
import sys

# Add project root to Python path
sys.path.append('..')

## Check Project Dependencies

Let's verify the current project dependencies and ensure we have Jupyter support:

In [None]:
with open('../requirements.txt', 'r') as f:
    requirements = f.read()
print("Current requirements.txt contents:")
print(requirements)

## Pipeline Components

Let's import and test each component of our ETL pipeline:

In [None]:
# Import pipeline components
from etl_scripts.extract import extract_data
from etl_scripts.transform import clean_data
from etl_scripts.load import load_to_db
from etl_scripts.analytics import run_analytics

print("Pipeline components imported successfully!")

## Run Pipeline Components

Now let's execute each component of the pipeline sequentially:

In [None]:
# 1. Extract data
print("Starting extraction...")
extract_data()
print("Extraction complete!")

In [None]:
# 2. Transform data
print("Starting transformation...")
clean_data()
print("Transformation complete!")

In [None]:
# 3. Load data
print("Starting load...")
load_to_db()
print("Load complete!")

In [None]:
# 4. Run analytics
print("Starting analytics...")
run_analytics()
print("Analytics complete!")