Skip to content

feat(config): add optional metadata dict to workflow definition#107

Open
PolyphonyRequiem wants to merge 8 commits intomicrosoft:mainfrom
PolyphonyRequiem:feat/workflow-metadata
Open

feat(config): add optional metadata dict to workflow definition#107
PolyphonyRequiem wants to merge 8 commits intomicrosoft:mainfrom
PolyphonyRequiem:feat/workflow-metadata

Conversation

@PolyphonyRequiem
Copy link
Copy Markdown
Member

@PolyphonyRequiem PolyphonyRequiem commented Apr 21, 2026

Summary

Adds an optional metadata dict to workflow configuration with two binding paths:

  1. Static — declared in the workflow YAML
  2. Dynamic — injected at runtime via --metadata / -m CLI flags

Both are merged (CLI wins on conflicts) and included verbatim in the workflow_started event, enabling downstream consumers to adapt behavior without parsing YAML source.

Closes #106

Changes

  • src/conductor/config/schema.py: Add metadata: dict[str, Any] field to WorkflowDef (empty dict default)
  • src/conductor/engine/workflow.py: Include metadata in the workflow_started event data
  • src/conductor/cli/app.py: Add --metadata / -m CLI option, parsed separately from --input
  • src/conductor/cli/run.py: Accept metadata param, merge CLI metadata on top of YAML metadata after config load
  • src/conductor/cli/bg_runner.py: Forward --metadata flags to background child process

Example

YAML (static)

workflow:
  name: twig-sdlc
  entry_point: intake
  metadata:
    tracker: ado
    project_url: https://dev.azure.com/org/Project

CLI (dynamic, merged on top)

conductor run twig-sdlc.yaml --metadata work_item_id=1814

Result in event log

{
  "type": "workflow_started",
  "data": {
    "name": "twig-sdlc",
    "metadata": {
      "tracker": "ado",
      "project_url": "https://dev.azure.com/org/Project",
      "work_item_id": "1814"
    }
  }
}

Backward Compatibility

  • metadata defaults to {} — existing workflows and CLI invocations are completely unaffected
  • --metadata is optional — omitting it changes nothing
  • No changes to event format beyond the additive metadata key
  • All existing schema tests pass (100/100)

Daniel Green and others added 4 commits April 21, 2026 12:10
Add a metadata field to WorkflowDef that allows workflow authors to
attach arbitrary key-value pairs for external tooling. The metadata
is included verbatim in the workflow_started event, enabling
downstream consumers (dashboards, trackers, enrichers) to adapt
behavior without parsing the YAML source.

Example usage in workflow YAML:
  workflow:
    name: twig-sdlc
    metadata:
      tracker: ado
      project_url: https://dev.azure.com/org/Project
      work_item_id_agent: intake
      work_item_id_field: epic_id

The field defaults to an empty dict, so existing workflows are
unaffected.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add --metadata / -m flag to 'conductor run' that accepts key=value
pairs, merged on top of YAML-declared metadata. This enables callers
to inject dynamic values at invocation time:

    conductor run twig-sdlc.yaml --metadata work_item_id=1814

CLI metadata is:
- Parsed separately from --input (different binding path)
- Merged on top of YAML metadata (CLI wins on conflicts)
- Forwarded through --web-bg background process spawning
- Included in the workflow_started event alongside YAML metadata

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7 new tests verifying:
- Schema: metadata defaults to empty dict, accepts arbitrary keys,
  independent from input/context fields
- Loader: metadata round-trips through YAML, omission gives empty
  dict, nested values preserved, metadata and input are separate
  namespaces

All 140 config tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 62.85714% with 26 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@873c72b). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/conductor/cli/run.py 20.00% 12 Missing ⚠️
src/conductor/web/server.py 33.33% 8 Missing ⚠️
src/conductor/cli/bg_runner.py 0.00% 3 Missing ⚠️
src/conductor/engine/workflow.py 93.10% 2 Missing ⚠️
src/conductor/cli/app.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #107   +/-   ##
=======================================
  Coverage        ?   84.67%           
=======================================
  Files           ?       53           
  Lines           ?     7232           
  Branches        ?        0           
=======================================
  Hits            ?     6124           
  Misses          ?     1108           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Daniel Green and others added 2 commits April 22, 2026 09:56
Propagate the event log's random hex suffix as a run_id across all
systems:

- EventLogSubscriber: expose run_id property (was already generated)
- WorkflowEngine: accept run_id + log_file params, include in
  workflow_started event
- PID files: include run_id + log_file fields
- Web dashboard: add /api/info endpoint returning run_id, log_file,
  workflow_name, started_at, metadata

This enables the central dashboard to match per-run dashboards to
event logs by exact run_id instead of fragile name/timestamp heuristics.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Auto-inject runtime diagnostics (PID, platform, Python version, cwd,
conductor version, started_at, run_id, log_file, bg_mode) into the
workflow_started event. Dashboard port/URL included when --web is active;
parent_pid included in --web-bg mode.

System metadata flows through:
- JSONL event log (via EventLogSubscriber)
- Web dashboard /api/info endpoint
- Checkpoint files (for resume context)

PID files are intentionally left unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@jrob5756 jrob5756 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool. Thanks for contributing. I have a few comments. Please take a look and we can merge!

Comment thread src/conductor/engine/workflow.py Outdated
"platform": sys.platform,
"python_version": _platform.python_version(),
"conductor_version": self._conductor_version(),
"cwd": os.getcwd(),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: os.getcwd() can raise unhandled OSError

If the working directory is deleted between process start and this call (CI runners, containers, temp-dir cleanup), this raises FileNotFoundError/OSError. Unlike _conductor_version() which is wrapped in try/except, this method has no protection — and it's called at the top of _execute_loop(), so an unhandled exception here crashes the entire workflow before any agent runs with a confusing error.

try:
    cwd = os.getcwd()
except OSError:
    cwd = "<unavailable>"

Or wrap the entire _build_system_metadata() body in try/except to match _conductor_version()'s pattern.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Wrapped os.getcwd() in try/except — falls back to '' on OSError. Matches the defensive pattern used by _conductor_version().

Comment thread src/conductor/web/server.py Outdated
"workflow_name": data.get("name", ""),
"started_at": event.get("timestamp", 0),
"metadata": data.get("metadata", {}),
"system": data.get("system", {}),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: Unauthenticated endpoint exposes sensitive system info

The system dict contains PID, filesystem paths (cwd, log_file), platform details, and parent PID — all served via an unauthenticated HTTP GET to any network client that can reach the dashboard port.

Consider:

  • Omitting system from /api/info entirely (it's still in the event log for diagnostics)
  • Or limiting to non-sensitive fields only (e.g., conductor_version, started_at)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — stripped the system dict from /api/info entirely. The endpoint now only returns non-sensitive fields: run_id, workflow_name, started_at, metadata, and conductor_version. Full diagnostics remain in the event log for debugging.

Comment thread src/conductor/cli/app.py Outdated
# Parse --metadata key=value flags (separate from inputs)
cli_metadata: dict[str, str] = {}
if raw_metadata:
cli_metadata.update(parse_input_flags(raw_metadata))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: parse_input_flags silently coerces metadata values

parse_input_flags calls coerce_value(), which converts "42"int, "true"bool, "null"None. So --metadata work_item_id=42 silently becomes {"work_item_id": 42} (int, not string). The type annotation says dict[str, str] but actual values will be int | bool | None | list | dict.

Metadata values should stay as strings since they're opaque key-value pairs. Consider a dedicated parse_metadata_flags() that splits on first = without coercion, or add a coerce=False parameter to parse_input_flags.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point — added a dedicated parse_metadata_flags() that splits on first = without any coercion. Metadata values stay as raw strings, keeping the dict[str, str] annotation honest.

Comment thread src/conductor/cli/bg_runner.py Outdated
# Forward metadata
if metadata:
for key, value in metadata.items():
cmd.extend(["--metadata", f"{key}={value}"])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: Missing _serialize_value() — nested metadata breaks in background mode

Inputs (line 110) use _serialize_value(value) to handle non-string types (dicts, lists → JSON). Metadata uses bare f"{key}={value}". If YAML metadata contains nested dicts:

metadata:
  config:
    base_url: https://example.com

…this produces config={'base_url': 'https://example.com'} (Python repr, not JSON), which fails to parse on the child side.

Suggested change
cmd.extend(["--metadata", f"{key}={value}"])
cmd.extend(["--metadata", f"{key}={_serialize_value(value)}"])

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find — switched to _serialize_value(value) so nested dicts/lists get proper JSON serialization instead of Python repr. Matches the pattern already used for inputs on line 110.

Comment thread src/conductor/cli/run.py Outdated
web_dashboard=dashboard,
run_id=event_log_subscriber.run_id if event_log_subscriber else "",
log_file=str(event_log_subscriber.path) if event_log_subscriber else "",
dashboard_port=(dashboard._actual_port if dashboard is not None else None),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: Accessing private dashboard._actual_port

_actual_port is a private attribute. Consider adding a public @property port on WebDashboard that returns self._actual_port or self._port, then use dashboard.port here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a public port property on WebDashboard that returns _actual_port or _port. Updated run.py to use dashboard.port instead of reaching into the private attribute.

Comment thread src/conductor/engine/workflow.py Outdated
run_id: str = "",
log_file: str = "",
dashboard_port: int | None = None,
bg_mode: bool = False,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Consider grouping informational params into a dataclass

These 4 new params (run_id, log_file, dashboard_port, bg_mode) are purely informational — not used for orchestration, only passed through to event data. The constructor already has 10 params; this brings it to 14.

Consider grouping into a dataclass:

@dataclass
class RunContext:
    run_id: str = ""
    log_file: str = ""
    dashboard_port: int | None = None
    bg_mode: bool = False

Then pass a single run_context: RunContext | None = None parameter. This keeps the constructor clean and makes future additions trivial.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went ahead and did this in this pass — added a RunContext dataclass grouping run_id, log_file, dashboard_port, and bg_mode. WorkflowEngine constructor now takes a single run_context param instead of four. Should make future additions painless.

Daniel Green and others added 2 commits April 27, 2026 14:57
- Guard os.getcwd() with try/except OSError in _build_system_metadata()
- Strip sensitive system info (PID, cwd, log_file) from /api/info endpoint
- Add parse_metadata_flags() to keep metadata values as raw strings (no coercion)
- Use _serialize_value() for metadata in bg_runner to handle nested dicts
- Add public WebDashboard.port property, stop accessing _actual_port externally
- Group informational params (run_id, log_file, dashboard_port, bg_mode) into RunContext dataclass

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(config): add optional metadata dict to workflow definition

3 participants