# Advanced Features

This tutorial covers advanced features of `projio` including producer tracking, dry-run mode, gitignore integration, directory trees, and context managers.

## Producer Tracking

Track which scripts or notebooks produced which files. This is useful for reproducibility and debugging.

In [1]:
import tempfile
from pathlib import Path
from project_io import ProjectIO

tmp = tempfile.mkdtemp()
io = ProjectIO(root=tmp, use_datestamp=False)

# Track a producer relationship
output_file = io.path_for('outputs', 'results', ext='.csv')
io.track_producer(
    target=output_file,
    producer=Path('analysis.py'),
    kind='data'
)

# Track another
model_file = io.path_for('outputs', 'model', ext='.pt')
io.track_producer(
    target=model_file,
    producer=Path('train.py'),
    kind='model'
)

print(f"Tracked {len(io.producers)} producer relationships")

Tracked 2 producer relationships


### Query Producers

In [2]:
# Find all producers of a specific file
producers = io.producers_of(output_file)
print(f"Producers of results.csv: {producers}")

# Find all outputs from a specific script
outputs = io.outputs_of(Path('train.py'))
print(f"Outputs of train.py: {outputs}")

Producers of results.csv: [ProducerRecord(target=PosixPath('/private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmpgpvq4q1a/results.csv'), producer=PosixPath('/Users/sm1901/Projects/ProjectIO/tutorials/analysis.py'), kind='data')]
Outputs of train.py: [ProducerRecord(target=PosixPath('/private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmpgpvq4q1a/model.pt'), producer=PosixPath('/Users/sm1901/Projects/ProjectIO/tutorials/train.py'), kind='model')]


### ProducerRecord Structure

Each `ProducerRecord` contains:
- `target`: Path to the produced file
- `producer`: Path to the script/notebook that produced it
- `kind`: Optional type/category of output

In [3]:
# Access the raw producer records
for record in io.producers:
    print(f"Target: {record.target}")
    print(f"Producer: {record.producer}")
    print(f"Kind: {record.kind}")
    print()

Target: /private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmpgpvq4q1a/results.csv
Producer: /Users/sm1901/Projects/ProjectIO/tutorials/analysis.py
Kind: data

Target: /private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmpgpvq4q1a/model.pt
Producer: /Users/sm1901/Projects/ProjectIO/tutorials/train.py
Kind: model



## Dry-Run Mode

Preview what paths would be created without actually touching the filesystem.

In [4]:
# Create a dry-run instance
io_dry = ProjectIO(root=tmp, dry_run=True)

# Get paths - directories won't be created
path = io_dry.path_for('outputs', 'test', ext='.txt')
print(f"Path: {path}")
print(f"Parent exists: {path.parent.exists()}")

# Checkpoint path
ckpt = io_dry.checkpoint_path('model', run='exp1')
print(f"Checkpoint: {ckpt}")
print(f"Parent exists: {ckpt.parent.exists()}")

Path: /private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmpgpvq4q1a/2025_12_23/test.txt
Parent exists: False
Checkpoint: /private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmpgpvq4q1a/lightning/checkpoints/2025_12_23/exp1/model.ckpt
Parent exists: False


### Dry-Run with Gitignore

In dry-run mode, gitignore operations are also skipped:

In [5]:
# Create a new temp dir to test dry-run gitignore
import tempfile
dry_tmp = tempfile.mkdtemp()
io_dry = ProjectIO(root=dry_tmp, dry_run=True, gitignore=Path(dry_tmp) / '.gitignore')

# This won't write to .gitignore in dry-run mode
io_dry.append_gitignore(['outputs/'])

gitignore_path = io_dry.root / '.gitignore'
print(f".gitignore exists: {gitignore_path.exists()}")

.gitignore exists: False


## Gitignore Integration

Automatically manage `.gitignore` entries for generated directories.

In [6]:
from pathlib import Path

# Create IO with gitignore in the temp directory
io = ProjectIO(root=tmp, use_datestamp=False, gitignore=Path(tmp) / '.gitignore')

# Add entries to .gitignore
io.append_gitignore(['outputs/', 'cache/', '*.log'])

# Check the file
gitignore = io.root / '.gitignore'
if gitignore.exists():
    print(gitignore.read_text())

outputs/
cache/
*.log



### Ensure Common Kinds are Ignored

In [7]:
# ensure_gitignored converts kinds to paths relative to cwp
# For cleaner output, use append_gitignore with relative paths directly
io.append_gitignore(['models/', 'experiments/'])

print("Updated .gitignore:")
print((io.root / '.gitignore').read_text())

Updated .gitignore:
outputs/
cache/
*.log
models/
experiments/



### Idempotent Operations

Gitignore operations are idempotent - calling them multiple times won't add duplicates:

In [8]:
# Add the same entry multiple times
io.append_gitignore('outputs/')
io.append_gitignore('outputs/')
io.append_gitignore('outputs/')

# Still only one entry
content = (io.root / '.gitignore').read_text()
count = content.count('outputs/')
print(f"Number of 'outputs/' entries: {count}")

Number of 'outputs/' entries: 1


## Directory Tree Rendering

Visualize directory structures with ASCII trees.

In [9]:
# Create some structure first
io = ProjectIO(root=tmp, use_datestamp=False, auto_create=True)

# Access paths to create directories
_ = io.outputs
_ = io.cache
_ = io.logs
_ = io.checkpoints
_ = io.tensorboard

# Create some subdirectories
_ = io.path_for('outputs', 'exp1', subdir='run_1', ext='.txt')
_ = io.path_for('outputs', 'exp1', subdir='run_2', ext='.txt')

# Render the tree
tree = io.tree(io.root)
print(tree)

tmpgpvq4q1a
├── cache
├── lightning
│   ├── checkpoints
│   └── tensorboard
├── logs
├── run_1
└── run_2


### Control Tree Depth

In [10]:
# Limit depth
tree = io.tree(io.root, max_depth=1)
print("Max depth 1:")
print(tree)

Max depth 1:
tmpgpvq4q1a
├── cache
├── lightning
├── logs
├── run_1
└── run_2


### Include Files in Tree

In [11]:
# Create a test file
test_file = io.outputs / 'test.txt'
test_file.write_text('hello')

# Include files in tree (parameter is 'files', not 'include_files')
tree = io.tree(io.outputs, files=True)
print("With files:")
print(tree)

With files:
tmpgpvq4q1a
├── .gitignore
├── cache
├── lightning
│   ├── checkpoints
│   └── tensorboard
├── logs
├── run_1
├── run_2
└── test.txt


## Context Manager for Temporary Overrides

Use `using()` to temporarily change settings:

In [12]:
io = ProjectIO(root=tmp, use_datestamp=True, dry_run=False)

print(f"Before: datestamp={io.use_datestamp}, dry_run={io.dry_run}")

with io.using(use_datestamp=False, dry_run=True):
    print(f"Inside: datestamp={io.use_datestamp}, dry_run={io.dry_run}")
    
    # Paths generated here won't have datestamps
    # and won't create directories
    path = io.path_for('outputs', 'temp', ext='.txt')
    print(f"Path: {path}")

print(f"After: datestamp={io.use_datestamp}, dry_run={io.dry_run}")

Before: datestamp=True, dry_run=False
Inside: datestamp=False, dry_run=True
Path: /private/var/folders/f7/7pcpvrhn0p9gw509gyzmh8fxrwyskv/T/tmpgpvq4q1a/temp.txt
After: datestamp=True, dry_run=False


### Nested Context Managers

In [13]:
io = ProjectIO(root=tmp, use_datestamp=True, auto_create=True)

with io.using(use_datestamp=False):
    print(f"Level 1: datestamp={io.use_datestamp}")
    
    with io.using(auto_create=False):
        print(f"Level 2: datestamp={io.use_datestamp}, auto_create={io.auto_create}")
    
    print(f"Back to level 1: auto_create={io.auto_create}")

print(f"Outside: datestamp={io.use_datestamp}, auto_create={io.auto_create}")

Level 1: datestamp=False
Level 2: datestamp=False, auto_create=False
Back to level 1: auto_create=True
Outside: datestamp=True, auto_create=True


## Resource Paths

Access package resources (e.g., bundled data files):

In [None]:
# Resources are relative to cwp (current working path), not root
# For this demo, we'll use must_exist=False and create=True
config_path = io.resource_path('demo_config.yaml', must_exist=False, create=True)
config_path.write_text('key: value')

print(f"Config path: {config_path}")
print(f"Exists: {config_path.exists()}")

# Read it back
sample_path = io.resource_path('demo_data', 'sample.json', must_exist=False, create=True)
sample_path.write_text('{}')
print(f"Sample path: {sample_path}")

### Must-Exist Resources

In [None]:
# Require that the resource exists (will raise if not found)
try:
    missing = io.resource_path('nonexistent.txt', must_exist=True)
except FileNotFoundError as e:
    print(f"Expected error: {type(e).__name__}")

## Configuration Description

Get a complete view of the current configuration:

In [None]:
io = ProjectIO(
    root=tmp,
    use_datestamp=True,
    datestamp_in='dirs',
    auto_create=True,
    dry_run=False
)

config = io.describe()

print("Configuration:")
for key, value in config.items():
    print(f"  {key}: {value}")

## PIO Singleton Pattern

For global access throughout your project:

In [None]:
from project_io import PIO, ProjectIO

# Set up the global instance once (e.g., in your main script)
PIO.default = ProjectIO(
    root=tmp,
    use_datestamp=False,
    auto_create=True
)

# Now use PIO anywhere in your code
print(f"Root: {PIO.root}")
print(f"Outputs: {PIO.outputs}")

# Access methods too
path = PIO.path_for('cache', 'preprocessed', ext='.pkl')
print(f"Cache path: {path}")

## Extension Handling

Extensions are handled consistently across all path methods:

In [None]:
io = ProjectIO(root=tmp, use_datestamp=False, auto_create=False)

# With dot
p1 = io.path_for('outputs', 'file', ext='.csv')
print(f"With dot: {p1.name}")

# Without dot (added automatically)
p2 = io.path_for('outputs', 'file', ext='csv')
print(f"Without dot: {p2.name}")

# No double extension
p3 = io.path_for('outputs', 'file.csv', ext='.csv')
print(f"No double: {p3.name}")

## Best Practices Summary

1. **Use producer tracking** for important outputs to maintain reproducibility

2. **Use dry-run mode** when developing or debugging path logic

3. **Use gitignore integration** to keep generated files out of version control

4. **Use context managers** for temporary setting changes instead of modifying and restoring manually

5. **Use PIO singleton** for consistent paths across your entire codebase

6. **Use describe()** to log configuration at the start of scripts for debugging