Skip to content

[ci/doc] Guard against annotated subpackages that escape the API walk#64514

Draft
dstrodtman wants to merge 6 commits into
ray-project:masterfrom
dstrodtman:doc-1313-apiref-coverage-guard
Draft

[ci/doc] Guard against annotated subpackages that escape the API walk#64514
dstrodtman wants to merge 6 commits into
ray-project:masterfrom
dstrodtman:doc-1313-apiref-coverage-guard

Conversation

@dstrodtman

Copy link
Copy Markdown
Contributor

Why

The API doc consistency check (ci/ray_ci/doc/cmd_check_api_discrepancy.py)
only sees APIs that its module walk reaches, and the walk only follows
submodules a parent __init__ actually imports. A public,
@PublicAPI-annotated subpackage that its parent does not import is invisible
to the check — its APIs can silently rot undocumented.

ray.serve.llm and ray.data.llm are both such surfaces today: documented and
annotated, but reached by no walk.

What

  • Onboard ray.serve.llm by adding it to the serve head_modules. It is
    import-safe in the docbuild image (transformers is a soft try_import, and
    vLLM is only imported lazily), so it can be its own walk root.
  • Add a cross-team coverage guard that enumerates one level of subpackages
    under every configured head module and fails when an annotated subpackage is
    neither reached by any walk nor knowingly excluded. A reviewed allowlist
    records the intentional exclusions and the reason for each.
  • Allowlist ray.data.llm. Unlike ray.serve.llm, it eagerly imports the
    vLLM/SGLang engine processor configs, which do a hard import transformers
    at module load. transformers is absent from the docbuild image
    (python/deplocks/docs/docbuild_depset_py3.11.lock), so ray.data.llm
    cannot be imported by this check. Its surface stays documented in
    doc/source/data/api/llm.rst.
  • Track reachable module names on Module so the guard distinguishes "actually
    walked" from "merely name-prefixed" (ray.data.llm is prefixed by ray.data
    yet never imported by ray/data/__init__.py).
  • Unit-test the guard's decision core and helpers against the mock module.

Stacking

This PR is stacked on top of the API-ref consistency-policy work and should be
reviewed/merged after it (its diff transitively shows those commits until they
land):

Kept as a draft until the base PRs merge; it will be rebased onto master
afterward.

Checks

The decisive check is doc: check API doc consistency — it exercises the new
guard, ray.serve.llm in head_modules, and the ray.data.llm allowlist
against the real docbuild image.

🤖 Generated with Claude Code

dstrodtman and others added 5 commits June 29, 2026 15:23
The API-doc consistency checker enforced only one direction of the
contract: every annotated public API must be documented. Extend it with
two deterministic, Sphinx-free policies over the same parsed doc surface.

Policy 02 (docs subset of code): every documented name must resolve to a
live object, and a name that resolves to a deprecated or private object
fails. Resolution is strict -- a miss is reported instead of being
swallowed into the raw name string the way get_canonical_name does -- and
each documented name's annotation is read live from the resolved object
rather than the placeholder PUBLIC_API stamped by the autosummary and
autoclass parsers. This catches stale doc entries (renamed, deleted, or
misspelled symbols) that previously only the full Sphinx render flagged.
Documented but un-annotated methods are accepted, since they are public by
virtue of their annotated class.

Policy 04 (no duplicate documentation): a canonical name may not appear in
more than one autosummary or autoclass block in a team's walked doc
surface. An intentional-duplicate allowlist holds
ray.actor.ActorMethod.bind (documented in both the Core and Compiled Graph
APIs), mirroring conf.py's DuplicateObjectFilter.

Both policies run per team in _check_team and fail the team on violation.
Adds unit tests for resolution, live re-introspection, and both policy
splits, plus _check_team-level tests over the mock module and a new
py_test target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
Found by running the new policies end-to-end for the core team against a
matching ray build:

- API.resolve() walked names with getattr first, so a re-exported function
  that shadows a same-named submodule (e.g. ray.util.placement_group, the
  function, shadows the ray.util.placement_group submodule) caused the
  remaining tokens to fail to resolve -- a false "documented API does not
  resolve". Import the dotted path first and fall back to getattr, matching
  how Sphinx autosummary resolves the name.

- ray.remote (canonical ray._private.worker.remote) is intentionally
  cross-listed under both Tasks and Actors in ray-core/api/core.rst, since
  @ray.remote defines both. Add it to core's intentional-duplicate allowlist
  next to ActorMethod.bind.

With these, the core team passes all three policies.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
Address review feedback: a malformed doc entry (an empty name, or one with
a leading/double dot that splits to an empty first token) would make
importlib.import_module("") raise ValueError and crash the checker. Guard
the empty-first-token case and catch ValueError alongside ImportError so a
malformed entry is reported as unresolved instead. Adds regression cases.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
Policy 02 derived an entry's identity from get_canonical_name() (a
getattr-only walk) while reading its annotation from resolve() (an
import-first walk). When a dotted segment is shadowed -- e.g. a re-exported
function sharing a name with a submodule -- those two walks can land on
different objects, so the annotation could be read off one object while the
reported / white-list-checked name came from another.

Resolve each documented name once and derive the canonical name from that
object (new API.canonical_name_of), so identity and annotation share one
walk. Policy 04 uses the same path, which also collapses two spellings of a
shadowed object to one canonical key. get_canonical_name() is unchanged and
still backs Policy 01.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
The API-doc consistency check reads autosummary stub .rst files (the per-class
method tables the hand-written API pages `.. include::`). Those stubs were
produced only as a side effect of `make -C doc/ html`, so every Python-touching
premerge PR paid for a full Sphinx render just to generate them.

Extract the stub generation into doc/source/api_autogen.py, shared by the
render and a standalone entry point:

- conf.py imports the autogen file list, the filename map, and the custom
  autosummary Jinja filters from the new module (importing it registers the
  filters), keeping the render and the standalone path in lockstep.
- generate_api_stubs() raises when generation produces no files, and
  _autogen_apis no longer swallows failures in a try/except + warning. A broken
  autogen step now fails the build instead of emitting an empty fixture the
  consistency check then reads as "nothing to do".
- lint.sh's api_policy_check runs `python doc/source/api_autogen.py` instead of
  `make -C doc/ html`, removing the full render from the premerge check while
  preserving the contract guarantee. Stubs are generated after installing Ray so
  they reflect the checkout's source.

Verified end-to-end: the standalone script generates the same stubs the render
produced (82 files for the data API), and the consistency check reads them.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes the API documentation consistency check by generating only the necessary autosummary stub files via a new standalone script (api_autogen.py), eliminating the need to build the entire HTML documentation. It also introduces Policy 02 (ensuring documented APIs resolve to public code) and Policy 04 (preventing duplicate documentation), along with a coverage guard to catch unwalked annotated subpackages. The feedback suggests adding exception handling around dynamic attribute inspection and module iteration in cmd_check_api_discrepancy.py to prevent crashes from lazy imports or invalid paths.

Comment on lines +185 to +191
for attr in dir(mod):
obj = getattr(mod, attr, None)
origin = getattr(obj, "__module__", None)
if not origin or (origin != module and not origin.startswith(f"{module}.")):
continue
if hasattr(obj, "_annotated"):
return (True, True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Accessing attributes dynamically via getattr on a module can trigger lazy imports or properties that raise exceptions (such as ModuleNotFoundError for missing optional dependencies). Wrapping the attribute inspection in a try-except block ensures that optional dependency failures or buggy attributes do not crash the entire API consistency check.

Suggested change
for attr in dir(mod):
obj = getattr(mod, attr, None)
origin = getattr(obj, "__module__", None)
if not origin or (origin != module and not origin.startswith(f"{module}.")):
continue
if hasattr(obj, "_annotated"):
return (True, True)
for attr in dir(mod):
try:
obj = getattr(mod, attr, None)
origin = getattr(obj, "__module__", None)
if not origin or (origin != module and not origin.startswith(f"{module}.")):
continue
if hasattr(obj, "_annotated"):
return (True, True)
except Exception:
continue

Comment on lines +166 to +170
children = []
for info in pkgutil.iter_modules(pkg.__path__, prefix=f"{package}."):
if info.name.rsplit(".", 1)[-1].startswith("_"):
continue
children.append(info.name)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling pkgutil.iter_modules on pkg.__path__ can raise exceptions if any of the paths in __path__ are invalid, non-existent, or inaccessible (e.g., due to permission issues or virtual environment configurations). Wrapping this in a try-except block makes the subpackage discovery more robust.

Suggested change
children = []
for info in pkgutil.iter_modules(pkg.__path__, prefix=f"{package}."):
if info.name.rsplit(".", 1)[-1].startswith("_"):
continue
children.append(info.name)
children = []
try:
for info in pkgutil.iter_modules(pkg.__path__, prefix=f"{package}."):
if info.name.rsplit(".", 1)[-1].startswith("_"):
continue
children.append(info.name)
except Exception:
pass

## Why

The code<->docs API consistency check only sees APIs that its module walk
reaches, and the walk only follows submodules a parent `__init__` actually
imports. A public, `@PublicAPI`-annotated subpackage that its parent does not
import is invisible to the check, so its APIs can silently rot undocumented.
`ray.serve.llm` and `ray.data.llm` are both such surfaces today: they are
documented and annotated, but no walk reaches them.

## What

- Add `ray.serve.llm` to the serve `head_modules` so its public surface is
  walked. It is import-safe in the docbuild image (transformers is a soft
  `try_import`, vLLM is only imported lazily).
- Add a cross-team guard that enumerates one level of subpackages under every
  configured head module and fails when an annotated subpackage is neither
  reached by any walk nor knowingly excluded. A reviewed allowlist records the
  intentional exclusions and why.
- Allowlist `ray.data.llm`: it eagerly imports the vLLM/SGLang engine
  processor configs, which do a hard `import transformers` at module load, and
  transformers is absent from the docbuild image, so it cannot be imported by
  this check. Its surface stays documented in `doc/source/data/api/llm.rst`.
- Track reachable module names on `Module` so the guard can tell "walked" from
  "merely name-prefixed" (`ray.data.llm` is prefixed by `ray.data` yet never
  imported).
- Unit-test the guard's decision core and helpers against the mock module.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
@dstrodtman dstrodtman force-pushed the doc-1313-apiref-coverage-guard branch from 17028a4 to 02f54be Compare July 2, 2026 18:50
@dstrodtman

Copy link
Copy Markdown
Contributor Author

Thanks for the review. Two notes:

Scope: this PR only adds three things — ray.serve.llm to the serve head_modules, an UNWALKED_ANNOTATED_ALLOWLIST entry for ray.data.llm, and the cross-team coverage guard (_check_unwalked_annotated_subpackages) plus its tests. The api_autogen.py script and Policies 02/04 mentioned in the summary belong to the PRs this one is stacked on (#64423#64420); they show up here only because GitHub can't base a fork PR on another fork branch, so the diff is against master until those land.

Both inline suggestions adopted in the latest push:

  • _import_status: wrapped the per-attribute getattr/inspection in try/except … continue. This is exactly the right call for this guard — a module with a PEP 562 __getattr__ (e.g. the batch processor package) can trigger a heavy optional-dep import on attribute access, which would otherwise crash the check for every team. Added a regression test with an "exploding" lazy attribute.
  • _immediate_child_modules: wrapped pkgutil.iter_modules(pkg.__path__, …) so a bad/inaccessible __path__ entry yields no children instead of raising.

Both are consistent with the guard's own principle: an optional-dependency failure should downgrade coverage of one module, never crash the whole consistency check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant