Skip to content

Fix/issue 30 docs and template#32

Merged
Cuzyoung merged 2 commits into
microsoft:mainfrom
Yif-Yang:fix/issue-30-docs-and-template
Jun 2, 2026
Merged

Fix/issue 30 docs and template#32
Cuzyoung merged 2 commits into
microsoft:mainfrom
Yif-Yang:fix/issue-30-docs-and-template

Conversation

@Yif-Yang
Copy link
Copy Markdown
Contributor

@Yif-Yang Yif-Yang commented Jun 1, 2026

Summary

Issue #30 reported that docs/reference/api.md, docs/guide/new-benchmark.md,
and skillopt/envs/_template/ all described a fictional EnvAdapter API
(execute / evaluate / build_prompt + DataItem / TaskResult +
BENCHMARK_REGISTRY + skillopt.data.base) that does not exist in the
codebase. Following the guide ended in ImportError, TypeError: Can't instantiate abstract class, or ValueError: Unknown environment.

This PR rewrites the docs + template to match the real ABC in
skillopt/envs/base.py and skillopt/datasets/base.py. No production
code is touched.

Changes

  • docs/reference/api.md — replaces the fictional API with the real 5
    abstract methods (build_train_env / build_eval_env / rollout / reflect / get_task_types), correct BatchSpec shape, real
    SplitDataLoader contract, and a note that registration lives in
    _ENV_REGISTRY (in scripts/train.py), not a BENCHMARK_REGISTRY.
  • docs/guide/new-benchmark.md — full worked docfaithful example
    including loader, rollout helper (using skillopt.model.chat_target),
    adapter, registration block, and a _base_: string-form config.
  • skillopt/envs/_template/ — template now subclasses the real ABCs
    and is instantiable out of the box.

Verified

  • pytest tests/ → 60/60 passed
  • python -c "from skillopt.envs._template.env_template import TemplateBenchmarkEnv; TemplateBenchmarkEnv()" succeeds
  • python scripts/train.py --help unchanged
  • Cross-reviewed twice with codex exec --sandbox read-only (second pass: APPROVE)

Closes #30.

Yif-Yang and others added 2 commits June 1, 2026 20:14
…ter ABC

docs/reference/api.md previously documented a fictional EnvAdapter API
(execute / evaluate / build_prompt + DataItem / TaskResult) and a
BENCHMARK_REGISTRY that never existed in code. Anyone following the
documented contract would hit ImportError or TypeError on the first
instantiation.

Replace both pages with the real shape from skillopt/envs/base.py and
skillopt/datasets/base.py:

- EnvAdapter: build_train_env, build_eval_env, rollout, reflect,
  get_task_types (the 5 actual abstract methods).
- Rollout dicts: id / hard / soft required; everything else preserved
  into RolloutResult.extras.
- Reflect dicts: {patch, source_type} schema as consumed by
  run_minibatch_reflect.
- BatchSpec: slotted-but-mutable dataclass matching the actual
  definition (payload defaults to None, metadata to dict()).
- SplitDataLoader.load_split_items as the one mandatory loader method.
- Registry: _ENV_REGISTRY in scripts/train.py (lazy try/except
  ImportError block), not a non-existent BENCHMARK_REGISTRY in
  skillopt/envs/__init__.py.
- _base_: documented as a string path, since the current YAML loader
  only accepts strings.

The new-benchmark.md guide now walks through a docfaithful worked
example with a real rollout helper (chat_target + scorer) instead of
hand-waving over the rollout step. Refs microsoft#30.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
The shipped env_template.py and loader_template.py described the same
fictional async execute / evaluate / build_prompt API documented in
docs/reference/api.md. As a result TemplateBenchmarkEnv(cfg) raised
'TypeError: Can't instantiate abstract class' for every copy-and-paste
user who followed the in-tree scaffold.

Rewrite the template so it's a working starting point:

- env_template.py: TemplateBenchmarkEnv(EnvAdapter) now implements all
  five real abstract methods (build_train_env, build_eval_env, rollout,
  reflect, get_task_types) with no-op defaults documented as TODO.
  Instantiable today; pytest 60/60 still passes.
- loader_template.py: TemplateBenchmarkLoader(SplitDataLoader)
  implements load_split_items for .json / .jsonl input and explains the
  optional load_raw_items override for split_mode="ratio".
- README.md: usage steps now point at scripts/train.py's _ENV_REGISTRY
  (the real registry) instead of a non-existent BENCHMARK_REGISTRY in
  skillopt/envs/__init__.py, and link to the rewritten new-benchmark
  guide.
- config_template.yaml: _base_ is a string path (not a list, which the
  loader rejects); skill_init is commented out with a note so the
  template config doesn't reference a file the user hasn't created.

Verified locally: 'from skillopt.envs._template.env_template import
TemplateBenchmarkEnv; TemplateBenchmarkEnv()' succeeds. Refs
microsoft#30.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
@Cuzyoung Cuzyoung merged commit 25da7cb into microsoft:main Jun 2, 2026
1 check was pending
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Docs and env template describe an API that does not exist in code

2 participants