Skip to content

fix: support LoRA archive URL outputs#29

Merged
timzsu merged 11 commits into
mainfrom
fix/rfc28-lora-artifact-url
May 11, 2026
Merged

fix: support LoRA archive URL outputs#29
timzsu merged 11 commits into
mainfrom
fix/rfc28-lora-artifact-url

Conversation

@timzsu
Copy link
Copy Markdown
Collaborator

@timzsu timzsu commented May 9, 2026

Purpose

The first PR in RFC #28. This PR removes dead template fields, supports LoRA adapter archive references in downstream stages, and cleans up checked-in workflows, so they exercise canonical artifact refs without hardcoded local callback URLs.

Changes

  • src/shared/tasks/components/model.py — support typed model.adapters[] fields, including url, apply, id, archive_format, headers, and task_id.
  • src/worker/executors/vllm_lora_executor.py — read typed adapter config directly, support url, and use canonical task_id.
  • src/server/dispatcher/base.py — normalize uploaded result payloads before rendering downstream stage references.
  • templates/lora_then_inference.yaml — use an HTTP artifact url reference for the emitted final_lora_archive.
  • templates/sft_llama_two_stage.yaml and templates/lora_sft_llama_two_stage.yaml — use server-provided HTTP callback URLs and multipart-safe headers.
  • templates/lora_sft_llama_two_stage.yaml — resume from final_lora_archive, which is the archive emitted by LoRA SFT.
  • templates/api_two_stage.yaml and templates/data_retrieval_then_inference.yaml — remove immediate regression blockers from example workflows.
  • templates/agent_paper_collector.yaml — document the required agent LLM/search keys for the paper collector example.
  • Tests and docs cover checked-in template parsing, artifact rendering, typed adapter config, and downstream archive references.

Side Notes

  • Bumped lumid-data-sdk to the latest lumid.data SHA in pyproject.toml, uv.lock, and worker requirements.

Design

Executors continue to emit artifact refs such as {"path": "final_lora.tar.gz"}. The dispatcher resolves those refs using the canonical executor result, including uploaded HTTP result payloads that are stored in the server wrapper shape. Local artifact consumers use path; HTTP artifact consumers use url, which exercises the executor download-and-unpack path.

LoRA adapter options are now first-class schema fields under model.adapters[] instead of a free-form kwargs wrapper, so unsupported adapter options fail validation, and the executor can use typed values directly.

Test Plan

  • Focused pytest coverage for dispatcher artifact rendering, checked-in template parsing, and typed LoRA adapter config.
  • Ruff on touched dispatcher/test/executor files.
  • Requirements sync after the SDK pin update.
  • Live workflow regression against the parser and running FlowMesh stack.

Test Result

E2E validated YAMLs: templates/sft_llama_two_stage.yaml, templates/lora_sft_llama_two_stage.yaml, templates/conditional_echo_test.yaml, templates/diffusion_to_vlm.yaml, templates/echo_three_node_graph.yaml, templates/vlm_llava.yaml, and templates/lora_then_inference.yaml.


Pre-submission Checklist
  • I have read the contribution guidelines.
  • I have run pre-commit run --all-files and fixed any issues.
  • I have added or updated tests covering my changes (if applicable).
  • I have verified that uv run pytest tests/ passes locally.
  • If I changed shared schemas or proto definitions, I have checked downstream compatibility across Server and Worker.
  • If I changed the SDK or CLI, I have verified the affected packages work (uv sync --all-extras --frozen).
  • If this is a breaking change, I have prefixed the PR title with [BREAKING] and described migration steps above.
  • I have updated documentation or config examples if user-facing behavior changed.

@timzsu timzsu force-pushed the fix/rfc28-lora-artifact-url branch 2 times, most recently from 470fbc7 to 3cbf517 Compare May 10, 2026 08:45
@timzsu timzsu marked this pull request as ready for review May 10, 2026 11:58
@timzsu timzsu force-pushed the fix/rfc28-lora-artifact-url branch from 7ea6682 to 8b1cf89 Compare May 10, 2026 12:07
@timzsu timzsu requested a review from kaiitunnz May 10, 2026 12:07
@timzsu timzsu mentioned this pull request May 10, 2026
1 task
Copy link
Copy Markdown
Collaborator

@kaiitunnz kaiitunnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall with two minor considerations.

Comment thread src/server/dispatcher/base.py Outdated
Comment thread src/shared/tasks/components/model.py Outdated
@timzsu timzsu force-pushed the fix/rfc28-lora-artifact-url branch from dd99e64 to 5e6fa08 Compare May 11, 2026 03:55
@timzsu timzsu requested a review from kaiitunnz May 11, 2026 03:59
Copy link
Copy Markdown
Collaborator

@kaiitunnz kaiitunnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional refactoring concerns.

Comment thread sdk/src/flowmesh/resources/results.py Outdated
from ._base import AsyncResource, SyncResource


class _ResultEnvelope(BaseModel):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved to sdk/src/flowmesh/models/results.py.

Also, you can rename it to ResultPayload to match the server's model.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread sdk/src/flowmesh/resources/results.py Outdated
ctx = result["_artifacts"]
ctx["base_dir"] = (output_dir / task_id).resolve().as_posix()
ctx.pop("base_url", None)
json_path.write_text(json.dumps(result, indent=2))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you need to write the entire payload, not just the result.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread src/shared/schemas/result.py Outdated
timzsu added 9 commits May 11, 2026 08:02
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
- Result envelope is the canonical on-disk format. ResultPayload moves
  to src/shared so both worker and server can use it. The runner and
  every executor write through write_executor_result, which stamps the
  _artifacts context and persists via ResultPayload.model_dump_json.
- Dispatcher's _load_stage_result round-trips through model_validate.
- User boundary unwraps: GET /results/{id} returns the flat result
  (HTTP 500 on JSON or schema validation failure); SDK materialize
  unwraps the bundled envelope into the on-disk results.json and
  rewrites _artifacts.base_dir to the local extracted dir.
- AdapterConfig field validators take typing.Any inputs (matches the
  Pydantic field_validator examples).

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
- Rename ResultPayload to ResultEnvelope across shared, server, SDK,
  and tests.
- SDK: move the envelope model to sdk/src/flowmesh/models/results.py;
  resources/results.py imports it.
- SDK _finalize_materialize persists the full envelope (not just
  .result) so the on-disk results.json round-trips through
  ResultEnvelope.model_validate_json.

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
@timzsu timzsu force-pushed the fix/rfc28-lora-artifact-url branch from cf17a28 to 5929269 Compare May 11, 2026 08:04
@timzsu timzsu requested a review from kaiitunnz May 11, 2026 08:09
Copy link
Copy Markdown
Collaborator

@kaiitunnz kaiitunnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor refactoring comments. Also, please add a new entry for ResultEnvelope in tests/sdk/test_schema_compat.py.

Comment thread sdk/src/flowmesh/resources/results.py
Comment thread src/shared/schemas/result.py Outdated
timzsu added 2 commits May 11, 2026 08:42
The wrap-then-write semantics are clearer than "write the envelope"
(which sounds like the caller already has one).

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Export ResultEnvelope from flowmesh.models and add the SDK<->shared
pair to the schema-compat parametrization so field drift trips a test.

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Copy link
Copy Markdown
Collaborator

@kaiitunnz kaiitunnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@timzsu timzsu merged commit ab7d982 into main May 11, 2026
11 checks passed
@timzsu timzsu deleted the fix/rfc28-lora-artifact-url branch May 11, 2026 08:55
@timzsu timzsu linked an issue May 12, 2026 that may be closed by this pull request
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC]: Cleanup workflow templates.

2 participants