Skip to content

Add AnacondaProject spec for anaconda-project.yml#96

Merged
martindurant merged 3 commits into
fsspec:mainfrom
mcg1969:add-anaconda-project-spec
May 11, 2026
Merged

Add AnacondaProject spec for anaconda-project.yml#96
martindurant merged 3 commits into
fsspec:mainfrom
mcg1969:add-anaconda-project-spec

Conversation

@mcg1969
Copy link
Copy Markdown
Contributor

@mcg1969 mcg1969 commented May 9, 2026

Summary

Adds a new AnacondaProject ProjectSpec for the legacy anaconda-project.yml format (as used by Anaconda Enterprise / Workbench and the anaconda-project CLI at https://github.com/anaconda/anaconda-project). Complements the existing CondaProject (conda-incubator's conda-project.yml), which is a different manifest despite the similar name.

Motivation

anaconda-project itself is a legacy tool — it remains in use chiefly because Anaconda Enterprise / Workbench requires it, and that platform is on its way out. The immediate value of projspec being able to read the format is to support the migration of projects away from platforms that require it and onto more actively maintained ecosystems (pixi, uv, etc.).

Looking forward, projspec is a natural home for format-conversion tooling: once it can parse every format that matters, converting between them becomes a small layer on top of the existing content / artifact model. This PR only adds the reader; future work could add an emit/convert side. At that point projspec could serve as the essential middleware in a migration pipeline — a single library that understands the source format, normalises the project model, and emits a target format — rather than every migration tool reimplementing the same readers.

What's modelled

Manifest field projspec model
env_specs.<n>.packages / channels (+ top-level propagation) Environment(stack=CONDA, precision=SPEC)
packages: [..., {pip: [...]}] additional Environment(stack=PIP) at <env>.pip
env_specs.<n>.inherit_from (string or list) additive flatten of packages / channels / platforms
anaconda-project-lock.yml present Environment(precision=LOCK), merging all / unix / <platform> buckets
commands.*.unix / bash Command(cmd=str) + Process artifact
commands.*.notebook Command(cmd="jupyter notebook <path>")
commands.*.bokeh_app Command(cmd="bokeh serve <path>")
commands.*.windows (only) fallback Command
variables (list or dict with default) EnvironmentVariables
implicit default env when env_specs: absent default env spec materialised

Artifacts (CondaEnv / LockFile / Process) are wired to anaconda-project prepare|lock|run.

What's carried on DescriptiveMetadata.meta

Following the pattern used by dataworkflows.py, infra.py, backstage.py, etc., fields that don't yet have typed representations in projspec are preserved verbatim so nothing is lost:

  • name, description, icon, categories
  • downloads, services
  • top-level platforms, skip_imports
  • command_kinds — original notebook / bokeh_app paths before shell-string translation
  • command_env_specs — per-command env_spec binding
  • command_supports_http_options
  • locking_enabled

Out of scope (future upstream work)

  • Prompted-variable metadata (default / description / encrypted)
  • First-class Download / Service content types
  • A kind field or subclass on Command (notebook, http-service)
  • platforms as a first-class Environment field

Test plan

  • 26 tests in tests/test_anaconda_project.py covering match, basic parse, inheritance, pip split, command kinds (notebook/bokeh_app/unix/string), lock file reading, downloads/services, variable forms — all inline fixtures, matching the style of tests/test_new_specs.py
  • pytest tests/test_anaconda_project.py — 26 passed
  • Full suite: pytest tests/ — same 5 pre-existing "tool not installed" failures as on main, no new failures
  • Sanity-checked against 12 real manifests from anaconda/workbench-sample-projects (attractors, databases, hadoop_spark, hello_anaconda_enterprise, holoviz, hvplot_notebook, image_classifier_flask, nyc_taxi, panel_gapminders, plot_notebook, streaming_ohlc_data, tensorboard_mnist) — all parse cleanly

Marked as draft pending any upstream shaping discussion.

Reads the legacy Anaconda Project format used by Anaconda Enterprise /
Workbench. Modelled content:

- env_specs with inherit_from flattening, top-level
  packages/channels/platforms propagation, and an implicit ``default`` env
  spec when none is declared
- pip sub-packages split into a separate PIP-stack Environment
- anaconda-project-lock.yml read for LOCK-precision Environments, merging
  the ``all``/``unix``/<platform> package buckets
- commands: unix/bash pass through; notebook/bokeh_app get the conventional
  shell-string translation (``jupyter notebook <path>`` / ``bokeh serve
  <path>``), with the original kind preserved
- variables: list form and dict form (``default`` → value)
- CondaEnv, LockFile, and Process artifacts wired to
  ``anaconda-project prepare|lock|run``

Fields not yet typed in projspec (name/description/icon/categories,
downloads, services, top-level platforms, command_kinds/env_specs/http,
locking_enabled) are carried verbatim on DescriptiveMetadata.meta so no
information is lost.
@mcg1969 mcg1969 marked this pull request as ready for review May 9, 2026 15:58
Copy link
Copy Markdown
Member

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add to the api.rst file in the docs too

Comment thread src/projspec/proj/anaconda_project.py Outdated
Comment thread src/projspec/proj/anaconda_project.py Outdated
precision=Precision.SPEC,
)
if pip_packages:
envs[f"{env_name}.pip"] = Environment(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an extension of the conda env rather than an independent one? Maybe it's OK like this or maybe we cn come up with something better. I was musing in another channel about a "FROM" (or based-on) field for environments anyway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that the current <env>.pip sibling misrepresents the semantics — pip packages in anaconda-project.yml are installed into the conda env, not alongside it. A based-on/FROM field on Environment would be the right long-term fix, and the same problem applies to pixi ([pypi-dependencies]) and uv.

For this PR, I see three options and I'd like your preference:

  1. Keep the sibling <env>.pip Environment, but clarify in the class docstring that the relationship is layered rather than alternate. Status quo, honest about the limitation.
  2. Fold pip packages into the conda env's packages list as tagged strings (e.g. pip::requests>=2.28). Loses discoverability by stack.
  3. Drop the pip emission entirely in v1 and stash pip packages in DescriptiveMetadata.meta["pip_packages"]. Data preserved, no misleading model, clean migration path when the based-on field lands.

The conda-first / pip-second convention is common enough that (1) is probably fine as an interim. Happy to do whichever you'd like.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think I agree - I should create an issue to track this and request feedback.

mcg1969 added 2 commits May 11, 2026 14:33
- Move the format description into the class docstring so it surfaces
  in info() and the UIs; keep the module docstring to one line.
- Parse anaconda-project.yml and anaconda-project-lock.yml with
  yaml.safe_load directly. The manifest is not a jinja-templated file,
  and the mustache-style HTTP option tokens ({{port}}, {{address}}, ...)
  round-trip through safe_load as plain strings, which is the correct
  behaviour for downstream consumers.
- Register AnacondaProject in docs/source/api.rst.
@martindurant martindurant merged commit 0310f2a into fsspec:main May 11, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants