# Tutorial 1: Getting Started with DaggerML

Welcome to DaggerML! This tutorial will introduce you to the basic concepts of creating and working with DAGs (Directed Acyclic Graphs) using DaggerML.

## What is DaggerML?

DaggerML is a framework for building reproducible computational workflows. It helps you:
- Create data pipelines as DAGs
- Cache intermediate results automatically
- Execute code in various environments (local, cloud, containers)
- Track dependencies and data lineage

## Key Concepts

1. **DAG**: A directed acyclic graph representing your computation
2. **Nodes**: Individual data values or computation results in the DAG
3. **Functions**: Reusable computational units decorated with `@funkify`
4. **Caching**: Automatic memoization of results based on inputs

Let's start with the basics!

In [None]:
# First, let's import the necessary modules
import os

from daggerml import Dml

from dml_util import funkify

# Create a DaggerML instance
# This manages our computational workspace
dml = Dml(repo="tutorial", branch="main")
dml("repo", "create", "tutorial")
os.environ.update({"DML_S3_BUCKET": "does-not-matter", "DML_S3_PREFIX": "does-not-matter"})
print("DaggerML instance created successfully!")

## Creating Your First DAG

A DAG in DaggerML is created using the `dml.new()` method. Let's create a simple DAG that stores some basic data.

In [None]:
# Create a new DAG
dag = dml.new("01-my-first-dag", "Learning the basics of DaggerML")

# Add some literal values to the DAG
dag.number = 42
dag.text = "Hello DaggerML!"
dag.my_list = [1, 2, 3, 4, 5]
dag.my_dict = {"name": "Alice", "age": 30, "city": "San Francisco"}

print("DAG created with literal values")
print(f"Number: {dag.number.value()}")
print(f"Text: {dag.text.value()}")
print(f"List: {dag.my_list.value()}")
print(f"Dict: {dag.my_dict.value()}")

## Understanding Nodes vs Values

In DaggerML, when you assign data to a DAG, it becomes a "Node". Nodes are references to data, while `.value()` gives you the actual data.

In [None]:
# dag.number is a Node object
print(f"dag.number type: {type(dag.number)}")
print(f"dag.number: {dag.number}")

# dag.number.value() gives us the actual value
print(f"dag.number.value() type: {type(dag.number.value())}")
print(f"dag.number.value(): {dag.number.value()}")

# You can also access nested data
print(f"Name from dict: {type(dag.my_dict['name'])}")
print(f"Name from dict: {dag.my_dict['name'].value()}")
print(f"First list item: {dag.my_list[0].value()}")

## Working with Collections

DaggerML automatically handles collections (lists, dictionaries) and makes their elements accessible as nodes.

In [None]:
# Working with lists
first_three = dag.my_list[:3]  # This creates a new node with the first 3 elements
print(f"First three elements: {first_three.value()}")

# Working with dictionaries
person_name = dag.my_dict["name"]
print(f"Person's name: {person_name.value()}")

# You can create new collections from existing nodes
dag.processed_list = [x for x in dag.my_list]  # This creates nodes for each element
print(f"Processed list: {dag.processed_list.value()}")

# Combining data from different nodes
dag.summary = {
    "count": len(dag.my_list),
    "first_item": dag.my_list[0],
    "person": dag.my_dict["name"]
}
print(f"Summary: {dag.summary.value()}")

## Inspecting Your DAG

You can explore what's in your DAG using the `.keys()` method.

In [None]:
# See all the named nodes in your DAG
print("All named nodes in the DAG:")
for key in sorted(dag.keys()):
    print(f"  {key}")

# You can also access nodes by name
print(f"\nAccessing by name - dag['text']: {dag['text'].value()}")

## Your First Function with @funkify

The `@funkify` decorator turns regular Python functions into DAG-aware functions. These functions receive a `dag` parameter that contains their inputs and can store outputs.

The key thing to take away here is that it's dags all the way down with daggerml. Function calls are just nodes in our dag, but they are implemented as dags so that you can use all of your daggerml goodness in your function code.

In [None]:
@funkify
def standardize(dag):
    """Calculate the standard score (z-score) from a list of numbers."""
    # Get the input list from dag.argv[1] (first argument after the function itself)
    numbers = dag.argv[1].value()

    # Store intermediate results in the DAG
    sum_ = sum(numbers)
    cnt_ = len(numbers)
    dag.count = cnt_
    dag.sum = sum_
    dag.mean = mean = sum_ / cnt_ if cnt_ > 0 else 0
    dag.std_dev = std = (sum((x - mean) ** 2 for x in numbers) / cnt_) ** 0.5 if cnt_ > 0 else 0
    dag.z_scores = [(x - mean) / std for x in numbers] if std > 0 else [0] * cnt_
    return dag.z_scores

# Add the function to our DAG
dag.standardize = standardize

# Call the function with our list
dag.z_scores = dag.standardize(dag.my_list)

print("Z-scores calculated!")
print(f"Result: {dag.z_scores.value()}")

## Understanding Function Arguments

When you call a funkified function, the arguments are available in `dag.argv`:
- `dag.argv[0]` is the function itself
- `dag.argv[1]` is the first argument you passed
- `dag.argv[2]` is the second argument, etc.
- `dag.argv[1:]` gives you all arguments except the function

In [None]:
@funkify
def demonstrate_arguments(dag):
    """Show how arguments work in funkified functions."""
    # You can get all arguments except the function
    all_args = dag.argv[1:].value()
    dag.n_args = len(all_args)
    dag.msg = f"Received {dag.n_args.value()} arguments: {all_args}"
    return dag.msg

# Add to DAG and test
dag.demo_function = demonstrate_arguments
dag.demo_result = dag.demo_function("hello", 42, [1, 2, 3])

print(f"Demo result: {dag.demo_result.value()}")

## Saving and Loading DAGs

One of DaggerML's powerful features is the ability to save and load entire DAGs for later use.

In [None]:
# Commit the DAG with a final result
dag.result = {
    "original_data": {
        "number": dag.number,
        "text": dag.text,
        "list": dag.my_list,
        "dict": dag.my_dict
    },
    "z_scores": dag.z_scores,
    "demo_output": dag.demo_result
}

print("DAG is ready! Final result structure:")
for key, value in dag.result.value().items():
    print(f"  {key}: {type(value).__name__}")

# Commit the DAG to save all our work
print("\nDAG committed successfully!")
print("This DAG can now be loaded and reused in other workflows.")

## Loading a Committed DAG

Now that we've committed our DAG, let's demonstrate how to load it back and access its data. This is one of DaggerML's powerful features - you can save your computational results and reload them later.

In [None]:
# Create a new DAG to demonstrate loading
new_dag = dml.new("01-loading-example", "Demonstrating how to load committed DAGs")

# Load the result from our previously committed DAG
# We use the commit hash to load the specific version
old_dag = dml.load("01-my-first-dag")

print("Successfully loaded the committed DAG!")
print(f"Loaded result type: {type(old_dag.result)}")
print(f"Loaded result: {old_dag.result.value()}")

# Store the loaded result in our new DAG
new_dag.loaded_data = old_dag.result

## Accessing Specific Named Nodes from Loaded DAGs

You can also load specific named nodes from a committed DAG, rather than loading the entire final result. This is useful when you only need certain parts of a large computational workflow. You do this the same way you would get nodes from a running dag (because they are dags). The only condition is that you can't add nodes to a completed dag.

In [None]:
for key in sorted(old_dag.keys()):
    print(f"  {key}: {type(old_dag[key].value())}")

## Practical Use Cases for Loading DAGs

Loading committed DAGs is particularly useful for:

1. **Resuming work**: Continue from where you left off in a previous session
2. **Sharing results**: Load computational results created by teammates
3. **Building pipelines**: Use outputs from one DAG as inputs to another
4. **Debugging**: Load intermediate results to investigate issues
5. **Reproducibility**: Ensure exact results can be recreated later
6. **Sharing code**: You can share functions coupled with the appropriate infrastructure without having to share *how* to get set up on that infra.

Let's demonstrate building on loaded data:

In [None]:
# Build new computations using loaded data
@funkify
def analyze_loaded_data(dag):
    """Analyze the loaded z-scores and create a summary report."""
    z_scores = dag.argv[1].value()

    dag.min_z = min(z_scores)
    dag.max_z = max(z_scores)
    dag.range_z = dag.max_z.value() - dag.min_z.value()

    # Count how many are above/below average
    dag.above_avg = sum(1 for z in z_scores if z > 0)
    dag.below_avg = sum(1 for z in z_scores if z < 0)

    dag.analysis_summary = {
        "min_z_score": dag.min_z,
        "max_z_score": dag.max_z,
        "z_score_range": dag.range_z,
        "above_average_count": dag.above_avg,
        "below_average_count": dag.below_avg
    }

    return dag.analysis_summary

# Add the function to our new DAG and use it with loaded data
new_dag.original_z_scores = old_dag.z_scores
new_dag.analyze_function = analyze_loaded_data
new_dag.analysis_result = new_dag.analyze_function(new_dag.original_z_scores)

print("Analysis of loaded z-scores:")
analysis = new_dag.analysis_result.value()
for key, value in analysis.items():
    if hasattr(value, 'value'):
        print(f"  {key}: {value.value()}")
    else:
        print(f"  {key}: {value}")

### Loading functions

You can load a lot more than just data, you can load functions too! For example, below we import the `standardize` function from our first dag and run it in this dag without knowing anything about the implementation. This dependency is tracked under the hood.

In [None]:
new_dag.standardize = old_dag.standardize
new_dag.standardized_z_scores = new_dag.standardize(new_dag.original_z_scores[:-2])
print(f"{new_dag.standardized_z_scores.value() = }")

In [None]:
# Commit this new analysis DAG too
new_dag.result = new_dag.analysis_result
print("Now we have two related DAGs - the original and the analysis!")

## What We've Learned

In this tutorial, you've learned:

1. ✅ How to create a DaggerML instance and DAG
2. ✅ How to add literal values (numbers, strings, lists, dicts) to a DAG
3. ✅ The difference between Nodes and values (using `.value()`)
4. ✅ How to work with collections and access nested data
5. ✅ How to inspect your DAG with `.keys()`
6. ✅ How to create functions with `@funkify`
7. ✅ How function arguments work in DAGs (`dag.argv`)
8. ✅ How to store intermediate and final results
9. ✅ **How to commit DAGs to save your computational work**
10. ✅ **How to create new DAGs and load results from committed DAGs**
11. ✅ **How to load specific named nodes from committed DAGs**
12. ✅ **How to build new computations using loaded data**

## Key Takeaways

**DAG Persistence**: DaggerML automatically saves all your computational work when you commit a DAG. This includes:
- All named nodes and their values
- Intermediate computation results
- Function outputs and internal state

**Flexible Loading**: You can load either:
- The final result of a DAG
- Any specific named node from a committed DAG
- This enables powerful workflow composition and reuse

**Computational Lineage**: Every commit creates a permanent record of your computational workflow, making your work reproducible and shareable.

## Next Steps

In the next tutorial, we'll learn about:
- Function composition and chaining
- Advanced caching strategies and how they improve performance
- Working with different data types and external data sources
- Error handling in DAGs and recovery strategies

Great job completing your first DaggerML tutorial! 🎉

You now understand the fundamental concepts of DAGs, nodes, functions, and persistence - the building blocks for powerful computational workflows.