# Quick Start

Get up and running with herrkunft in 5 minutes!

## Basic Workflow

The typical herrkunft workflow involves three steps:

1. **Load** YAML configuration files with provenance tracking
2. **Access** values normally while preserving provenance information
3. **Save** configurations with provenance metadata as comments

## Your First herrkunft Program

Let's create a simple example with real, executable code.

### Step 1: Load a Configuration File

We'll use a sample configuration file from the test fixtures:

::::{tab-set}

:::{tab-item} Input YAML
Content of `defaults.yaml`:
```yaml
database:
  host: localhost
  port: 5432
  name: myapp_db

server:
  host: 0.0.0.0
  port: 8080
  workers: 4
```
:::

:::{tab-item} JSON Equivalent
```json
{
  "database": {
    "host": "localhost",
    "port": 5432,
    "name": "myapp_db"
  },
  "server": {
    "host": "0.0.0.0",
    "port": 8080,
    "workers": 4
  }
}
```
:::

::::

### Step 2: Load and Use the Configuration

In [1]:
from provenance import load_yaml, dump_yaml
from pathlib import Path

# Get the path to our test fixtures
fixtures_dir = Path("../test_fixtures")
config_file = fixtures_dir / "defaults.yaml"

# Load YAML with provenance tracking
config = load_yaml(str(config_file), category="defaults")

# Access values normally
print(f"Database host: {config['database']['host']}")
print(f"Server port: {config['server']['port']}")

Database host: localhost
Server port: 8080


### Step 3: Access Provenance Information

In [2]:
# Access provenance information
db_host = config['database']['host']
provenance_info = db_host.provenance.current

print(f"\nProvenance Information:")
print(f"Value came from: {Path(provenance_info.yaml_file).name}")
print(f"Line: {provenance_info.line}, Column: {provenance_info.col}")
print(f"Category: {provenance_info.category}")


Provenance Information:
Value came from: defaults.yaml
Line: 8, Column: 9
Category: defaults


### Step 4: Modify and Save

Let's modify some values and save the configuration with provenance comments:

In [3]:
# Modify configuration
config['database']['host'] = 'production.db.example.com'
config['database']['pool_size'] = 20

# Save with provenance as comments
import tempfile
output_file = Path(tempfile.mktemp(suffix=".yaml"))
dump_yaml(config, str(output_file), include_provenance=True)

print(f"Saved to: {output_file}")
print(f"\nContents with provenance comments:")
print(output_file.read_text())

Saved to: /var/folders/73/2rxq11j53s16d1p3ryf6t95mwcb3pw/T/tmpdgcz_1m7.yaml

Contents with provenance comments:
application:
#  from: defaults.yaml | line: 3 | col: 9 | category: defaults
  name: MyApp
#  from: defaults.yaml | line: 4 | col: 12 | category: defaults
  version: 1.0.0
#  from: defaults.yaml | line: 5 | col: 10 | category: defaults
  debug: false
database:
#  from: defaults.yaml | line: 8 | col: 9 | category: defaults
  host: production.db.example.com
#  from: defaults.yaml | line: 9 | col: 9 | category: defaults
  port: 5432
#  from: defaults.yaml | line: 10 | col: 9 | category: defaults
  name: myapp_db
#  from: defaults.yaml | line: 11 | col: 14 | category: defaults
  pool_size: 20
#  from: defaults.yaml | line: 12 | col: 12 | category: defaults
  timeout: 30
server:
#  from: defaults.yaml | line: 15 | col: 9 | category: defaults
  host: 0.0.0.0
#  from: defaults.yaml | line: 16 | col: 9 | category: defaults
  port: 8080
#  from: defaults.yaml | line: 17 | col: 12 | cat

::::{tab-set}

:::{tab-item} Output YAML with Provenance
The output file includes provenance comments:
```yaml
database:
  # modified at runtime
  host: production.db.example.com
  # from: defaults.yaml | line: 8 | col: 8 | category: defaults
  port: 5432
  # from: defaults.yaml | line: 9 | col: 8 | category: defaults
  name: myapp_db
  # added at runtime
  pool_size: 20
server:
  # from: defaults.yaml | line: 13 | col: 8 | category: defaults
  host: 0.0.0.0
  # from: defaults.yaml | line: 14 | col: 8 | category: defaults
  port: 8080
  # from: defaults.yaml | line: 15 | col: 11 | category: defaults
  workers: 4
```
:::

:::{tab-item} Clean YAML
Without provenance comments:
```yaml
database:
  host: production.db.example.com
  port: 5432
  name: myapp_db
  pool_size: 20
server:
  host: 0.0.0.0
  port: 8080
  workers: 4
```
:::

:::{tab-item} JSON Format
```json
{
  "database": {
    "host": "production.db.example.com",
    "port": 5432,
    "name": "myapp_db",
    "pool_size": 20
  },
  "server": {
    "host": "0.0.0.0",
    "port": 8080,
    "workers": 4
  }
}
```
:::

::::

## Core Concepts

### 1. Provenance Tracking

Every value loaded from YAML automatically tracks its origin:

In [4]:
value = config['database']['port']

# The value behaves like an int
print(f"Value: {value}")
print(f"Type: {type(value)}")
print(f"Math works: {value} + 10 = {value + 10}")

# But also has provenance information
prov = value.provenance
print(f"\nProvenance: {Path(prov.current.yaml_file).name}:{prov.current.line}")

Value: 5432
Type: <class 'provenance.types.wrappers.IntWithProvenance'>
Math works: 5432 + 10 = 5442

Provenance: defaults.yaml:9


### 2. Categories and Hierarchy

Categories allow hierarchical conflict resolution:

In [5]:
from provenance import ProvenanceLoader

loader = ProvenanceLoader()

# Load multiple configs with different priorities
defaults = loader.load(str(fixtures_dir / "defaults.yaml"), category="defaults")
production = loader.load(str(fixtures_dir / "production.yaml"), category="production")

# Merge with automatic conflict resolution
# Production values override defaults
final_config = defaults.copy()
final_config.update(production)

# Each value remembers which file it came from
print("Final configuration sources:")
for key, value in final_config['database'].items():
    prov = value.provenance.current
    print(f"  {key}: from {prov.category} ({Path(prov.yaml_file).name})")

Final configuration sources:
  host: from production (production.yaml)
  pool_size: from production (production.yaml)
  timeout: from production (production.yaml)


::::{tab-set}

:::{tab-item} Defaults YAML
```yaml
database:
  host: localhost
  port: 5432
  pool_size: 10
```
:::

:::{tab-item} Production YAML
```yaml
database:
  host: prod.db.example.com
  pool_size: 50
```
:::

:::{tab-item} Merged Result
```yaml
database:
  host: prod.db.example.com  # from production
  port: 5432                  # from defaults
  pool_size: 50               # from production (overrides)
```
:::

::::

### 3. Modification History

All changes are tracked in the provenance history:

In [6]:
# Start fresh with defaults
config = load_yaml(str(fixtures_dir / "defaults.yaml"), category="defaults")

# Make multiple modifications
config['database']['host'] = 'staging.db.example.com'
config['database']['host'] = 'production.db.example.com'

# Access full history
host_value = config['database']['host']
print(f"Current value: {host_value}")
print(f"History has {len(host_value.provenance)} steps:\n")

for i, step in enumerate(host_value.provenance):
    source = Path(step.yaml_file).name if step.yaml_file else "runtime"
    print(f"  Step {i}: {source} ({step.category})")

Current value: production.db.example.com
History has 3 steps:

  Step 0: defaults.yaml (defaults)
  Step 1: defaults.yaml (defaults)
  Step 2: defaults.yaml (defaults)


## Common Use Cases

### Scientific Computing

Track simulation parameters:

In [7]:
from provenance import extract_provenance_tree
import json

# Load simulation config
sim_config = load_yaml(str(fixtures_dir / "defaults.yaml"), category="experiment")

# Simulate running a simulation
def run_simulation(config):
    return {"status": "completed", "results": [1, 2, 3]}

results = run_simulation(sim_config)

# Save provenance with results for reproducibility
metadata = {
    "results": results,
    "provenance": extract_provenance_tree(sim_config)
}

print("Saved metadata with provenance:")
print(json.dumps(metadata, indent=2, default=str)[:500] + "...")

Saved metadata with provenance:
{
  "results": {
    "status": "completed",
    "results": [
      1,
      2,
      3
    ]
  },
  "provenance": {
    "application": {
      "name": {
        "category": "experiment",
        "yaml_file": "/Users/pgierz/work/Code/sandbox/provenance-as-library/docs/getting-started/../test_fixtures/defaults.yaml",
        "line": 3,
        "col": 9,
        "from_choose": []
      },
      "version": {
        "category": "experiment",
        "yaml_file": "/Users/pgierz/work/Code/sandbox/prov...


### Multi-Environment Configuration

Manage dev/staging/production configs:

In [8]:
import os

loader = ProvenanceLoader()

# Always load defaults first
config = loader.load(str(fixtures_dir / "defaults.yaml"), category="defaults")

# Override with environment-specific settings
env = "production"  # In real code: os.getenv("ENVIRONMENT", "development")
env_config = loader.load(str(fixtures_dir / f"{env}.yaml"), category="environment", subcategory=env)
config.update(env_config)

# Each value knows its source
print(f"Database host from: {config['database']['host'].provenance.current.category}")
print(f"Logging level from: {config['logging']['level'].provenance.current.category}")

Database host from: environment
Logging level from: environment


## Next Steps

Now that you understand the basics:

1. Explore [Complete Examples](examples.md)
2. Work through [Tutorials](../tutorials/basic-usage.ipynb)
3. Read the [User Guide](../user-guide/loading-yaml.md)
4. Check the [API Reference](../api/core.md)

## Quick Reference

| Task | Code |
|------|------|
| Load YAML | `config = load_yaml("file.yaml", category="defaults")` |
| Get value | `value = config["key"]["subkey"]` |
| Get provenance | `prov = value.provenance.current` |
| Modify value | `config["key"] = new_value` |
| Save YAML | `dump_yaml(config, "out.yaml", include_provenance=True)` |
| Clean provenance | `clean_config = clean_provenance(config)` |
| Extract tree | `tree = extract_provenance_tree(config)` |

In [9]:
# Cleanup temporary file
if output_file.exists():
    output_file.unlink()