# Basic Usage Tutorial

This tutorial covers the fundamental features of herrkunft through interactive examples.

## Setup

First, let's import the necessary modules and set up paths to our test fixtures:

In [1]:
from provenance import load_yaml, dump_yaml, extract_provenance_tree, clean_provenance
from pathlib import Path
import tempfile

# Path to test fixtures
fixtures_dir = Path("../test_fixtures")
config_file = fixtures_dir / "defaults.yaml"

print(f"Using test fixtures from: {fixtures_dir.absolute()}")

Using test fixtures from: /Users/pgierz/work/Code/sandbox/provenance-as-library/docs/tutorials/../test_fixtures


## Part 1: Loading YAML with Provenance

Let's examine the default configuration file:

::::{tab-set}

:::{tab-item} YAML Format
```yaml
application:
  name: MyApp
  version: 1.0.0
  debug: false

database:
  host: localhost
  port: 5432
  pool_size: 10

server:
  host: 0.0.0.0
  port: 8080
  workers: 4
```
:::

:::{tab-item} JSON Format
```json
{
  "application": {
    "name": "MyApp",
    "version": "1.0.0",
    "debug": false
  },
  "database": {
    "host": "localhost",
    "port": 5432,
    "pool_size": 10
  },
  "server": {
    "host": "0.0.0.0",
    "port": 8080,
    "workers": 4
  }
}
```
:::

::::

Now load it with provenance tracking:

In [2]:
# Load YAML with provenance
config = load_yaml(str(config_file), category="defaults")

print("Loaded configuration:")
print(f"  Application: {config['application']['name']}")
print(f"  Database: {config['database']['host']}:{config['database']['port']}")
print(f"  Server: {config['server']['host']}:{config['server']['port']}")

Loaded configuration:
  Application: MyApp
  Database: localhost:5432
  Server: 0.0.0.0:8080


## Part 2: Accessing Provenance Information

Every value has provenance information:

In [3]:
# Get a value
db_host = config["database"]["host"]

# Access its provenance
prov = db_host.provenance.current

print(f"Provenance for database.host:")
print(f"  Value: {db_host}")
print(f"  Source file: {Path(prov.yaml_file).name}")
print(f"  Line number: {prov.line}")
print(f"  Column: {prov.col}")
print(f"  Category: {prov.category}")

Provenance for database.host:
  Value: localhost
  Source file: defaults.yaml
  Line number: 8
  Column: 9
  Category: defaults


## Part 3: Modifying Values

When you modify values, the history is tracked:

In [4]:
# Modify the host
print(f"Original host: {config['database']['host']}")
print(f"  History length: {len(config['database']['host'].provenance)}")

config["database"]["host"] = "staging.db.example.com"
print(f"\nAfter 1st modification: {config['database']['host']}")
print(f"  History length: {len(config['database']['host'].provenance)}")

config["database"]["host"] = "production.db.example.com"
print(f"\nAfter 2nd modification: {config['database']['host']}")
print(f"  History length: {len(config['database']['host'].provenance)}")

# View full history
print("\nComplete history:")
for i, step in enumerate(config["database"]["host"].provenance):
    source = Path(step.yaml_file).name if step.yaml_file else "runtime modification"
    print(f"  [{i}] {source} ({step.category})")

Original host: localhost
  History length: 1

After 1st modification: staging.db.example.com
  History length: 2

After 2nd modification: production.db.example.com
  History length: 3

Complete history:
  [0] defaults.yaml (defaults)
  [1] defaults.yaml (defaults)
  [2] defaults.yaml (defaults)


## Part 4: Adding New Values

When you add new configuration sections at runtime, they are stored but don't automatically get provenance wrappers:

In [5]:
# Add new configuration using update() which preserves DictWithProvenance
from provenance import DictWithProvenance

# When adding new keys, the outer dict is DictWithProvenance but inner values are plain Python
config["cache"] = {
    "enabled": True,
    "ttl": 3600,
    "backend": "redis"
}

print(f"New cache section added:")
print(f"  cache.enabled = {config['cache']['enabled']}")
print(f"  Type of config: {type(config)}")
print(f"  Type of cache dict: {type(config['cache'])}")
print(f"  Note: New values added at runtime don't automatically get provenance wrappers")

New cache section added:
  cache.enabled = True
  Type of config: <class 'provenance.types.mappings.DictWithProvenance'>
  Type of cache dict: <class 'dict'>
  Note: New values added at runtime don't automatically get provenance wrappers


## Part 5: Saving with Provenance

Save the configuration with provenance as comments:

In [6]:
# Save with provenance comments
output_file = Path(tempfile.mktemp(suffix="_output.yaml"))
dump_yaml(config, str(output_file), include_provenance=True)

print(f"Saved to: {output_file}")
print(f"\nContents with provenance comments:")
print(output_file.read_text())

Saved to: /var/folders/73/2rxq11j53s16d1p3ryf6t95mwcb3pw/T/tmp_o7v7krx_output.yaml

Contents with provenance comments:
application:
#  from: defaults.yaml | line: 3 | col: 9 | category: defaults
  name: MyApp
#  from: defaults.yaml | line: 4 | col: 12 | category: defaults
  version: 1.0.0
#  from: defaults.yaml | line: 5 | col: 10 | category: defaults
  debug: false
database:
#  from: defaults.yaml | line: 8 | col: 9 | category: defaults
  host: production.db.example.com
#  from: defaults.yaml | line: 9 | col: 9 | category: defaults
  port: 5432
#  from: defaults.yaml | line: 10 | col: 9 | category: defaults
  name: myapp_db
#  from: defaults.yaml | line: 11 | col: 14 | category: defaults
  pool_size: 10
#  from: defaults.yaml | line: 12 | col: 12 | category: defaults
  timeout: 30
server:
#  from: defaults.yaml | line: 15 | col: 9 | category: defaults
  host: 0.0.0.0
#  from: defaults.yaml | line: 16 | col: 9 | category: defaults
  port: 8080
#  from: defaults.yaml | line: 17 | col: 1

::::{tab-set}

:::{tab-item} YAML with Provenance
Output includes provenance comments showing source:
```yaml
application:
  # from: defaults.yaml | line: 2 | col: 8 | category: defaults
  name: MyApp
  # from: defaults.yaml | line: 3 | col: 12 | category: defaults
  version: 1.0.0
  # from: defaults.yaml | line: 4 | col: 9 | category: defaults
  debug: false
database:
  # modified at runtime
  host: production.db.example.com
  # from: defaults.yaml | line: 8 | col: 8 | category: defaults
  port: 5432
```
:::

:::{tab-item} Clean YAML
Without provenance comments:
```yaml
application:
  name: MyApp
  version: 1.0.0
  debug: false
database:
  host: production.db.example.com
  port: 5432
```
:::

::::

## Part 6: Extracting Provenance Tree

Get complete provenance information:

In [7]:
# Extract full provenance tree
prov_tree = extract_provenance_tree(config)

print("Provenance tree for database section:")
for key, prov_info in prov_tree["database"].items():
    yaml_file = prov_info.get('yaml_file', 'runtime')
    line = prov_info.get('line', 'N/A')
    source = Path(yaml_file).name if yaml_file != 'runtime' else yaml_file
    print(f"  {key}: {source}:{line}")

Provenance tree for database section:
  host: defaults.yaml:8
  port: defaults.yaml:9
  name: defaults.yaml:10
  pool_size: defaults.yaml:11
  timeout: defaults.yaml:12


## Part 7: Hierarchical Configuration

Load and merge multiple configurations:

::::{tab-set}

:::{tab-item} Defaults YAML
```yaml
database:
  host: localhost
  pool_size: 10

server:
  workers: 4
```
:::

:::{tab-item} Production YAML
```yaml
database:
  host: prod.db.example.com
  pool_size: 50

server:
  workers: 16
```
:::

:::{tab-item} Merged Result
```yaml
database:
  host: prod.db.example.com  # from production
  pool_size: 50              # from production

server:
  workers: 16                # from production
```
:::

::::

In [8]:
# Load both configurations
base_config = load_yaml(str(fixtures_dir / "defaults.yaml"), category="defaults")
prod_config = load_yaml(str(fixtures_dir / "production.yaml"), category="production")

# Merge (production wins)
final_config = base_config.copy()
final_config.update(prod_config)

# Check which category each value came from
print("Final configuration sources:")
for section in ["database", "server"]:
    print(f"\n{section}:")
    for key, value in final_config[section].items():
        prov = value.provenance.current
        source = Path(prov.yaml_file).name
        print(f"  {key}: {prov.category} ({source})")

Final configuration sources:

database:
  host: production (production.yaml)
  pool_size: production (production.yaml)
  timeout: production (production.yaml)

server:
  host: production (production.yaml)
  port: production (production.yaml)
  workers: production (production.yaml)
  max_connections: production (production.yaml)


## Part 8: Cleaning Provenance

Remove provenance for external use:

In [9]:
# Clean provenance
clean_config = clean_provenance(final_config)

print("Type comparison:")
print(f"  Original: {type(final_config)}")
print(f"  Cleaned: {type(clean_config)}")
print(f"  Original host: {type(final_config['database']['host'])}")
print(f"  Cleaned host: {type(clean_config['database']['host'])}")

# Save clean version
clean_file = Path(tempfile.mktemp(suffix="_clean.yaml"))
dump_yaml(clean_config, str(clean_file), include_provenance=False, clean=True)

print(f"\nClean YAML (no provenance):")
print(clean_file.read_text())

Type comparison:
  Original: <class 'dict'>
  Cleaned: <class 'dict'>
  Original host: <class 'provenance.types.wrappers.StrWithProvenance'>
  Cleaned host: <class 'str'>

Clean YAML (no provenance):
application:
  name: MyApp
  version: 1.0.0
  debug: false
database:
  host: prod.db.example.com
  pool_size: 50
  timeout: 60
server:
  host: prod.example.com
  port: 443
  workers: 16
  max_connections: 5000
logging:
  output: file
  file_path: /var/log/myapp/app.log



## Cleanup

In [10]:
# Clean up temporary files
for file in [output_file, clean_file]:
    if file.exists():
        file.unlink()
print("Cleaned up temporary files")

Cleaned up temporary files


## Summary

In this tutorial, you learned:

1. How to load YAML files with provenance tracking
2. How to access provenance information for any value
3. How modification history is tracked automatically
4. How to add new values with provenance
5. How to save configurations with provenance comments
6. How to extract complete provenance trees
7. How to merge configurations hierarchically
8. How to clean provenance for external use

## Next Steps

- Try the [Scientific Workflows Tutorial](scientific-workflows.md)
- Explore [Multi-Environment Configuration](multi-environment.md)
- Read the [User Guide](../user-guide/loading-yaml.md)