# Walkthrough 6: CLI Tools and Story Management

This walkthrough demonstrates how to use the **Odibi CLI** to manage your data pipelines, including:

1. **Validating** configuration files
2. **Diagnosing** configuration issues (Doctor)
3. **Simulating** execution (Dry Run)
4. **Running** pipelines from the command line
5. **Generating** documentation stories
6. **Comparing** pipeline runs (Story Diff)

Odibi provides a rich set of command-line tools to help you integrate pipelines into your CI/CD workflows or manage them locally.

In [12]:
# Install the package in editable mode so the CLI commands work
import sys
import subprocess

# Check if odibi is installed
try:
    import odibi

    print("‚úÖ Odibi is already installed")
except ImportError:
    print("‚è≥ Installing Odibi in editable mode...")
    # We assume the notebook is running in 'walkthroughs/' so '..' is the root
    subprocess.check_call([sys.executable, "-m", "pip", "install", "-e", ".."])
    print("‚úÖ Odibi installed successfully")

‚úÖ Odibi is already installed


In [13]:
import sys

sys.path.insert(0, r"C:\Users\hodibi\OneDrive - Ingredion\Desktop\Repos\Odibi")

# Verify it worked
import odibi

print(f"‚úÖ ODIBI loaded from: {odibi.__file__}")

‚úÖ ODIBI loaded from: C:\Users\hodibi\OneDrive - Ingredion\Desktop\Repos\Odibi\odibi\__init__.py


## 1. Setup: Create a Sample Project

First, let's create a simple pipeline configuration and some sample data to work with.

In [14]:
import os
import yaml
import pandas as pd

# Create data directory
os.makedirs("data_cli", exist_ok=True)

# Create sample data
df = pd.DataFrame(
    {
        "id": [1, 2, 3, 4, 5],
        "name": ["Alice", "Bob", "Charlie", "David", "Eve"],
        "score": [85, 92, 78, 95, 88],
        "dept": ["Sales", "IT", "Sales", "IT", "HR"],
    }
)
df.to_csv("data_cli/employees.csv", index=False)
print("Created data_cli/employees.csv")

# Create pipeline configuration
config = {
    "project": "CLI Demo Project",
    "version": "1.0.0",
    "engine": "pandas",
    "connections": {"local": {"type": "local", "base_path": "./data_cli"}},
    "pipelines": [
        {
            "pipeline": "employee_stats",
            "description": "Calculate statistics by department",
            "nodes": [
                {
                    "name": "load_data",
                    "description": "Load employee data",
                    "read": {"connection": "local", "path": "employees.csv", "format": "csv"},
                },
                {
                    "name": "avg_score_by_dept",
                    "description": "Calculate average score per department",
                    "depends_on": ["load_data"],
                    "transform": {
                        "steps": [
                            {
                                "operation": "pivot",
                                "params": {
                                    "pivot_column": "dept",
                                    "value_column": "score",
                                    "agg_func": "mean",
                                },
                            }
                        ]
                    },
                },
            ],
        }
    ],
    "story": {
        "connection": "local",
        "path": "stories",
        "max_sample_rows": 10,
        "auto_generate": True,
    },
}

with open("cli_demo.yaml", "w") as f:
    yaml.dump(config, f)
print("Created cli_demo.yaml")

Created data_cli/employees.csv
Created cli_demo.yaml


## 2. Validate Configuration

Use `odibi validate` to check if your configuration file is syntactically correct and adheres to the schema. This is useful in CI/CD pipelines to catch errors early.

In [15]:
!python -m odibi.cli.main validate cli_demo.yaml

Config is valid


  from pandas.core import (


## 3. Diagnose Issues (Odibi Doctor)

The `odibi doctor` command performs a comprehensive health check of your project configuration and environment. It verifies:
- Configuration file existence
- YAML syntax and schema validity
- Required engine dependencies (e.g., PySpark)
- Connection settings and connectivity

In [16]:
!python -m odibi.cli.main doctor cli_demo.yaml

ü©∫ Running Odibi Doctor...

‚úÖ Config file found: cli_demo.yaml
‚úÖ YAML schema is valid
‚ÑπÔ∏è  Engine: EngineType.PANDAS

Testing Connections:
  ‚úÖ local (LocalConnection): OK

‚ú® All systems go! Configuration looks good.


  from pandas.core import (


## 4. Dry Run Simulation

Before executing a pipeline against real data, you can use the `--dry-run` flag to simulate the execution. This verifies the execution plan without reading, transforming, or writing any data.

In [17]:
!python -m odibi.cli.main run cli_demo.yaml --dry-run


Running pipeline: employee_stats
Mode: DRY RUN (Simulation)


‚úÖ SUCCESS - employee_stats
  Completed: 2 nodes
  Failed: 0 nodes
  Duration: 0.00s
  Story: c:\Users\hodibi\OneDrive - Ingredion\Desktop\Repos\Odibi\walkthroughs\data_cli\stories\employee_stats_20251119_174642.md

‚úÖ Pipeline completed successfully


  from pandas.core import (


## 5. Run Pipeline

Use `odibi run` to execute the pipeline defined in your configuration file.

In [18]:
!python -m odibi.cli.main run cli_demo.yaml


Running pipeline: employee_stats


‚úÖ SUCCESS - employee_stats
  Completed: 2 nodes
  Failed: 0 nodes
  Duration: 0.01s
  Story: c:\Users\hodibi\OneDrive - Ingredion\Desktop\Repos\Odibi\walkthroughs\data_cli\stories\employee_stats_20251119_174643.md

‚úÖ Pipeline completed successfully


  from pandas.core import (


## 6. Generate Documentation Stories

The `odibi story generate` command creates documentation for your pipelines. You can generate stories in various formats like HTML, Markdown, or JSON.

### Generate Markdown Story

In [19]:
!python -m odibi.cli.main story generate cli_demo.yaml --format markdown --output docs/employee_story.md

üìñ Loading configuration from cli_demo.yaml...
üìù Generating documentation story...
‚úÖ Documentation generated: docs\employee_story.md
üìÑ Format: MARKDOWN


  from pandas.core import (


Let's inspect the generated markdown file:

In [20]:
from IPython.display import Markdown, display

with open("docs/employee_story.md", "r", encoding="utf-8") as f:
    display(Markdown(f.read()))

# CLI Demo Project - Pipeline Documentation: employee_stats

## Overview

**Pipeline:** employee_stats
**Description:** Calculate statistics by department
**Total Operations:** 2
**Project:** CLI Demo Project

---

## Pipeline Flow

```
Pipeline Flow:

1. [load_data]
   Operation: read

2. [avg_score_by_dept]
   Operation: transform
   Depends on: load_data

```

## Operations

### load_data

**Operation:** `read`

Load employee data

---

### avg_score_by_dept

**Operation:** `transform`

Calculate average score per department

---

## Expected Outputs

This pipeline produces 1 final output(s)

- **avg_score_by_dept**

---

*Generated by Odibi v1.3.0-alpha.5*

### Generate JSON Story (for programmatic use)

JSON stories are useful for automated analysis or diffing.

In [21]:
!python -m odibi.cli.main story generate cli_demo.yaml --format json --output docs/run_v1.json

üìñ Loading configuration from cli_demo.yaml...
üìù Generating documentation story...
‚úÖ Documentation generated: docs\run_v1.json
üìÑ Format: JSON


  from pandas.core import (


## 7. Story Diff (Comparing Runs)

Odibi allows you to compare two story files to see what changed. This is helpful for tracking regression or validating changes.

Let's simulate a change in the pipeline (e.g., improved performance or different logic) and generate a second story.

In [22]:
# Create a 'v2' story file (simulating a second run)
# In a real scenario, you would run the pipeline again after changes
import shutil
import json

# Copy v1 to v2
shutil.copy("docs/run_v1.json", "docs/run_v2.json")

# Modify v2 to look different
with open("docs/run_v2.json", "r") as f:
    data = json.load(f)

# Simulate faster execution
data["duration"] = 0.5  # faster than before

# Simulate row count change
data["total_rows_processed"] = 150  # more rows

# Save v2
with open("docs/run_v2.json", "w") as f:
    json.dump(data, f)

print("Created simulated docs/run_v2.json")

Created simulated docs/run_v2.json


Now, let's compare the two runs using `odibi story diff`:

In [23]:
!python -m odibi.cli.main story diff docs/run_v1.json docs/run_v2.json

üìä Comparing stories...
  Story 1: docs/run_v1.json
  Story 2: docs/run_v2.json

üìà Comparison Results:

Pipeline: Unknown

‚è±Ô∏è  Execution Time:
  Story 1: 0.00s
  Story 2: 0.50s
  Difference: +0.50s (slower)

‚úÖ Success Rate:
  Story 1: 0.0%
  Story 2: 0.0%

üìä Rows Processed:
  Story 1: 0
  Story 2: 150
  Difference: +150 rows



  from pandas.core import (


## 8. List Stories

You can list all generated stories in a directory to keep track of your pipeline history.

In [24]:
!python -m odibi.cli.main story list --directory docs


üìö Stories in docs:

  üìÑ run_v2.json
     Modified: 2025-11-19 17:46:46
     Size: 864B
     Path: docs\run_v2.json

  üìÑ run_v1.json
     Modified: 2025-11-19 17:46:46
     Size: 995B
     Path: docs\run_v1.json

  üìÑ employee_story.md
     Modified: 2025-11-19 17:46:45
     Size: 719B
     Path: docs\employee_story.md



  from pandas.core import (


## Cleanup

Remove temporary files created during this walkthrough.

In [25]:
import shutil
import os

shutil.rmtree("data_cli", ignore_errors=True)
shutil.rmtree("docs", ignore_errors=True)
if os.path.exists("cli_demo.yaml"):
    os.remove("cli_demo.yaml")

print("Cleanup complete")

Cleanup complete
