# Tutorial 3: Error Handling and External Data

In this tutorial, we'll learn how to handle errors gracefully in DaggerML and work with external data sources like files and APIs.

## Prerequisites

Complete Tutorials 1 and 2 first. We'll build on those concepts while adding error handling and external data capabilities.

In [1]:
# Import required modules
import json
import os
from pathlib import Path

from daggerml import Dml, Error

from dml_util import funkify

# Create a DaggerML instance
dml = Dml(repo="tutorial", branch="main")
os.environ.update({"DML_S3_BUCKET": "does-not-matter", "DML_S3_PREFIX": "does-not-matter"})
print("DaggerML ready for error handling and external data tutorial!")

DaggerML ready for error handling and external data tutorial!


## Understanding Errors in DaggerML

DaggerML captures and stores errors as part of your computation graph. This means you can handle errors gracefully and continue processing other parts of your workflow.

In [2]:
@funkify
def divide_numbers(dag):
    """Divide two numbers, handling division by zero."""
    return dag.argv[1].value() / dag.argv[2].value()

# Create a DAG for error handling examples
dag = dml.new("error_handling", "Learning error handling in DaggerML")

# Add our functions
dag.divide_fn = divide_numbers

## Handling Errors Gracefully

Let's see how DaggerML handles functions that raise errors.

In [3]:
# Normal division (should work fine)
print("=== Normal Division ===")
dag.normal_result = dag.divide_fn(10, 2)
print(f"10 ÷ 2 = {dag.normal_result.value()}")

# Division by zero (will cause an error)
print("\n=== Division by Zero (Error Handling) ===")
try:
    dag.error_result = dag.divide_fn(10, 0)
    print(f"10 ÷ 0 = {dag.error_result.value()}")
except Error as e:
    print(f"Caught DaggerML Error: {e}")
    print("The error is stored in the DAG and can be handled appropriately")

=== Normal Division ===


scriptrunner [944c0ecee26bed3fc10d4785c0a2d066] :: pid = 31193 started
scriptrunner [944c0ecee26bed3fc10d4785c0a2d066] :: pid = 31193 running
scriptrunner [944c0ecee26bed3fc10d4785c0a2d066] :: pid = 31193 running
scriptrunner [944c0ecee26bed3fc10d4785c0a2d066] :: pid = 31193 running
scriptrunner [944c0ecee26bed3fc10d4785c0a2d066] :: pid = 31193 running
scriptrunner [944c0ecee26bed3fc10d4785c0a2d066] :: pid = 31193 finished


10 ÷ 2 = 5.0

=== Division by Zero (Error Handling) ===


scriptrunner [81bbf738b81d228378ff22a9e28ea569] :: pid = 31936 started
scriptrunner [81bbf738b81d228378ff22a9e28ea569] :: pid = 31936 running
scriptrunner [81bbf738b81d228378ff22a9e28ea569] :: pid = 31936 running
scriptrunner [81bbf738b81d228378ff22a9e28ea569] :: pid = 31936 running
scriptrunner [81bbf738b81d228378ff22a9e28ea569] :: pid = 31936 running
scriptrunner [81bbf738b81d228378ff22a9e28ea569] :: pid = 31936 finished


Caught DaggerML Error: Traceback (most recent call last) from python:
  File "/Users/amn/code/daggerml/dml-util/src/dml_util/funk.py", line 250, in aws_fndag
    yield dag
  File "/var/folders/s9/25vs18pn1kj680k76kc6zvlm0000gn/T/dml.03supcvi/script.py", line 10, in <module>
    res = divide_numbers(dag)
  File "/var/folders/s9/25vs18pn1kj680k76kc6zvlm0000gn/T/dml.03supcvi/script.py", line 6, in divide_numbers
    return dag.argv[1].value() / dag.argv[2].value()
ZeroDivisionError: division by zero
The error is stored in the DAG and can be handled appropriately


## Working with External Files

Let's create some sample data files and learn how to work with external data in DaggerML.

In [None]:
# Create some sample data files for our tutorial
sample_data_dir = Path("tutorial_data")
sample_data_dir.mkdir(exist_ok=True)

# Create a JSON file
json_data = {
    "products": [
        {"id": 1, "name": "Laptop", "price": 999.99, "category": "Electronics"},
        {"id": 2, "name": "Book", "price": 15.99, "category": "Education"},
        {"id": 3, "name": "Coffee", "price": 4.50, "category": "Food"},
        {"id": 4, "name": "Headphones", "price": 79.99, "category": "Electronics"}
    ],
    "metadata": {
        "version": "1.0",
        "last_updated": "2024-01-15"
    }
}

json_file = sample_data_dir / "products.json"
with open(json_file, 'w') as f:
    json.dump(json_data, f, indent=2)

# Create a CSV-like text file
csv_data = """name,age,city,score
Alice,25,New York,85
Bob,30,San Francisco,92
Charlie,35,Chicago,78
Diana,28,Boston,88
Eve,32,Seattle,95"""

csv_file = sample_data_dir / "people.csv"
with open(csv_file, 'w') as f:
    f.write(csv_data)

print(f"Created sample data files in {sample_data_dir}:")
print(f"  - {json_file}")
print(f"  - {csv_file}")

In [None]:
@funkify
def load_json_file(dag):
    """Load and validate JSON data from a file."""
    file_path = dag.argv[1].value()

    dag.file_path = file_path

    try:
        with open(file_path, 'r') as f:
            dag.raw_content = f.read()

        dag.data = json.loads(dag.raw_content)
        dag.success = True
        dag.error = None

        # Basic validation
        dag.has_products = "products" in dag.data
        dag.product_count = len(dag.data.get("products", []))

    except FileNotFoundError:
        dag.success = False
        dag.error = f"File not found: {file_path}"
        dag.data = None
    except json.JSONDecodeError as e:
        dag.success = False
        dag.error = f"Invalid JSON: {e}"
        dag.data = None
    except Exception as e:
        dag.success = False
        dag.error = f"Unexpected error: {e}"
        dag.data = None

    return dag.data

@funkify
def parse_csv_simple(dag):
    """Parse simple CSV data."""
    file_path = dag.argv[1].value()

    dag.file_path = file_path

    try:
        with open(file_path, 'r') as f:
            lines = f.read().strip().split('\n')

        dag.headers = lines[0].split(',')
        dag.rows = []

        for line in lines[1:]:
            values = line.split(',')
            row = {}
            for i, header in enumerate(dag.headers):
                if i < len(values):
                    # Try to convert to number if possible
                    value = values[i]
                    try:
                        if '.' in value:
                            value = float(value)
                        else:
                            value = int(value)
                    except ValueError:
                        pass  # Keep as string
                    row[header] = value
            dag.rows.append(row)

        dag.success = True
        dag.error = None

    except Exception as e:
        dag.success = False
        dag.error = str(e)
        dag.rows = []
        dag.headers = []

    return dag.rows

# Add file processing functions to DAG
dag.load_json_fn = load_json_file
dag.parse_csv_fn = parse_csv_simple

# Load the files
print("=== Loading External Data ===")
dag.json_data = dag.load_json_fn(str(json_file))
dag.csv_data = dag.parse_csv_fn(str(csv_file))

print(f"JSON loaded successfully: {dag.json_data.load().success.value()}")
if dag.json_data.load().success.value():
    print(f"  Products found: {dag.json_data.load().product_count.value()}")

print(f"CSV loaded successfully: {dag.csv_data.load().success.value()}")
if dag.csv_data.load().success.value():
    print(f"  Rows parsed: {len(dag.csv_data.value())}")
    print(f"  Headers: {dag.csv_data.load().headers.value()}")

## Cleanup and Summary

Let's clean up our temporary files and summarize what we've built.

In [None]:
# Clean up sample files
import shutil

if sample_data_dir.exists():
    shutil.rmtree(sample_data_dir)
    print("Cleaned up sample data files")

# Create final summary
dag.tutorial_summary = {
    "error_handling_patterns": [
        "Try-catch within functions",
        "Safe fallback values",
        "Graceful degradation"
    ],
    "external_data_sources": [
        "JSON files",
        "CSV files",
        "With validation and error recovery"
    ],
    "functions_created": len([k for k in dag.keys() if k.endswith('_fn')]),
    "total_nodes": len(dag.keys())
}

print("=== Tutorial 3 Complete! ===")
print("You've learned:")
print("✅ Error handling in DaggerML functions")
print("✅ Working with external JSON and CSV files")
print("✅ Data validation and parsing")
print("✅ Robust data loading patterns")

summary = dag.tutorial_summary.value()
print(f"\nTotal functions created: {summary['functions_created']}")
print(f"Total DAG nodes: {summary['total_nodes']}")

## What We've Mastered

In this tutorial, you've learned advanced DaggerML patterns:

### 1. Error Handling
- ✅ **Exception Management**: How DaggerML captures and stores errors
- ✅ **Safe Functions**: Building functions that handle errors gracefully
- ✅ **Error Recovery**: Continuing workflows despite individual failures

### 2. External Data Integration
- ✅ **File Loading**: Reading JSON and CSV files safely
- ✅ **Data Validation**: Checking and validating external data
- ✅ **Parsing Strategies**: Converting text data to structured formats

### 3. Best Practices
- ✅ **Always validate external data** before processing
- ✅ **Provide meaningful error messages** for debugging
- ✅ **Store intermediate results** to help with debugging
- ✅ **Design functions to be resilient** to unexpected inputs

## Next Tutorial Preview

In Tutorial 4, we'll explore:
- Advanced execution environments (local, cloud, containers)
- Storage systems for artifact management
- Scaling computations and production deployment

You're building serious DaggerML expertise! 🔧💪