chore(ingestion): refresh basedpyright config; standard mode + ratchet scaffold#27794
chore(ingestion): refresh basedpyright config; standard mode + ratchet scaffold#27794
Conversation
…t scaffold Switch from implicit `recommended` mode to `typeCheckingMode = "standard"`, matching the production-default of every mature OSS pyright/basedpyright config surveyed (Pydantic, Litestar, FastAPI, OpenAI/Anthropic SDKs, Polars, Strawberry, Airflow, AnyIO). `recommended` enables the `reportUnknown*` family which is catastrophic on a 75-connector codebase with partially-typed third-party deps (snowflake, pyhive, databricks, etc.) — 30K+ baseline entries are noise from those library boundaries, not real type debt in our code. The config holds real-bug rules at `error` explicitly so a future config refactor can't silently drop them: - reportPossiblyUnboundVariable (real UnboundLocalError) - reportOptionalMemberAccess (NoneType crashes) - reportAttributeAccessIssue (typos / refactor stragglers) - reportCallIssue / reportArgumentType / reportReturnType - reportAssignmentType / reportIncompatibleMethodOverride - reportInvalidTypeArguments Plus three cheap promotions at `warning` for real-bug rules that fire rarely: - reportMatchNotExhaustive - reportUnreachable - reportInvalidCast `allowedUntypedLibraries` and an `executionEnvironments` block are scaffolded (empty / commented) for the per-subtree ratchet plan: as well-typed subtrees (`data_quality/`, `utils/`, `ometa/`) drop their local baselines, they get promoted to a stricter rule subset independent of the connector tail. Baseline shrinks 56,151 → 18,916 entries (66% smaller). The remaining 18,916 are real-bug-class entries that bound the cleanup work going forward; `reportUnknown*` noise is no longer measured.
There was a problem hiding this comment.
Pull request overview
Adjusts the ingestion package’s basedpyright configuration to explicitly use typeCheckingMode = "standard" and document rule severities, making CI type-check intent explicit and reducing noise from unknown-type diagnostics on partially-typed third-party dependencies.
Changes:
- Switch basedpyright type-checking mode from implicit
recommendedto explicitstandard. - Add an explicit per-rule severity layer (error/warning/none) including “real-bug” rules held at
error. - Add scaffolding for future ratcheting (
allowedUntypedLibrariesand commentedexecutionEnvironmentsexamples).
| reportExplicitAny = false # same | ||
| # @override was only added in python 3.12: https://docs.python.org/3/library/typing.html#typing.override | ||
| reportImplicitOverride = false | ||
| reportImplicitOverride = false # we're at Python 3.10 |
There was a problem hiding this comment.
The inline rationale for reportMissingTypeStubs = false says it's "covered via allowedUntypedLibraries", but allowedUntypedLibraries is currently an empty list and this rule being disabled doesn't rely on it. Consider rewording the comment to reflect current behavior (e.g., that allowedUntypedLibraries is a future scaffold) to avoid confusing future maintainers.
Code Review ✅ ApprovedRefreshes the basedpyright configuration to implement standard mode and prepare the ratchet scaffold. No issues found. OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
🟡 Playwright Results — all passed (16 flaky)✅ 3959 passed · ❌ 0 failed · 🟡 16 flaky · ⏭️ 86 skipped
🟡 16 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
Static-checks CI was failing on Python 3.10 after main merged the basedpyright ratchet (PR #27794) and migrated formatting to ruff. Real type-correctness fixes (4 errors caught by the type checker): - _build_downstream_flow_edge: guard against fqn.build returning None before passing to get_by_name (which requires str). - _get_pipeline_entity: same guard. - _resolve_table_entity: search_in_any_service can return List[Table]; pick the first hit when given a list, otherwise return the entity or None. Also pass empty strings for None database/schema_name to match the str type of build_es_fqn_search_string. Linter / formatter conformance: - Modernized typing across all connector + test files (Optional[X] → X | None, Dict → dict, etc.) via `ruff check --fix`. - Dropped local annotations on .right / pipeline_status (TC001: type-only imports forbidden at runtime). - Marked relative `from ._fixtures import ...` with `# noqa: TID252` to keep the conftest-vs-test-module split working in CI. - Replaced unused tuple unpacks with `_` (RUF059). basedpyright baseline: - Refreshed `.basedpyright/baseline.json` so the Pydantic-noise errors (EntityReference / Task / TaskStatus / PipelineStatus "missing optional kwargs", TopologyContext dynamic attrs, cached_property iterability, BaseSpec arg shape) match the same pattern other connectors use. 103 tests pass; basedpyright with --baselinemode=discard reports 0 errors against the new baseline. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…t scaffold (open-metadata#27794) * chore(ingestion): refresh basedpyright config; standard mode + ratchet scaffold Switch from implicit `recommended` mode to `typeCheckingMode = "standard"`, matching the production-default of every mature OSS pyright/basedpyright config surveyed (Pydantic, Litestar, FastAPI, OpenAI/Anthropic SDKs, Polars, Strawberry, Airflow, AnyIO). `recommended` enables the `reportUnknown*` family which is catastrophic on a 75-connector codebase with partially-typed third-party deps (snowflake, pyhive, databricks, etc.) — 30K+ baseline entries are noise from those library boundaries, not real type debt in our code. The config holds real-bug rules at `error` explicitly so a future config refactor can't silently drop them: - reportPossiblyUnboundVariable (real UnboundLocalError) - reportOptionalMemberAccess (NoneType crashes) - reportAttributeAccessIssue (typos / refactor stragglers) - reportCallIssue / reportArgumentType / reportReturnType - reportAssignmentType / reportIncompatibleMethodOverride - reportInvalidTypeArguments Plus three cheap promotions at `warning` for real-bug rules that fire rarely: - reportMatchNotExhaustive - reportUnreachable - reportInvalidCast `allowedUntypedLibraries` and an `executionEnvironments` block are scaffolded (empty / commented) for the per-subtree ratchet plan: as well-typed subtrees (`data_quality/`, `utils/`, `ometa/`) drop their local baselines, they get promoted to a stricter rule subset independent of the connector tail. Baseline shrinks 56,151 → 18,916 entries (66% smaller). The remaining 18,916 are real-bug-class entries that bound the cleanup work going forward; `reportUnknown*` noise is no longer measured. * Update pyproject.toml
…t scaffold (open-metadata#27794) * chore(ingestion): refresh basedpyright config; standard mode + ratchet scaffold Switch from implicit `recommended` mode to `typeCheckingMode = "standard"`, matching the production-default of every mature OSS pyright/basedpyright config surveyed (Pydantic, Litestar, FastAPI, OpenAI/Anthropic SDKs, Polars, Strawberry, Airflow, AnyIO). `recommended` enables the `reportUnknown*` family which is catastrophic on a 75-connector codebase with partially-typed third-party deps (snowflake, pyhive, databricks, etc.) — 30K+ baseline entries are noise from those library boundaries, not real type debt in our code. The config holds real-bug rules at `error` explicitly so a future config refactor can't silently drop them: - reportPossiblyUnboundVariable (real UnboundLocalError) - reportOptionalMemberAccess (NoneType crashes) - reportAttributeAccessIssue (typos / refactor stragglers) - reportCallIssue / reportArgumentType / reportReturnType - reportAssignmentType / reportIncompatibleMethodOverride - reportInvalidTypeArguments Plus three cheap promotions at `warning` for real-bug rules that fire rarely: - reportMatchNotExhaustive - reportUnreachable - reportInvalidCast `allowedUntypedLibraries` and an `executionEnvironments` block are scaffolded (empty / commented) for the per-subtree ratchet plan: as well-typed subtrees (`data_quality/`, `utils/`, `ometa/`) drop their local baselines, they get promoted to a stricter rule subset independent of the connector tail. Baseline shrinks 56,151 → 18,916 entries (66% smaller). The remaining 18,916 are real-bug-class entries that bound the cleanup work going forward; `reportUnknown*` noise is no longer measured. * Update pyproject.toml



Summary
Switch basedpyright from implicit
recommendedmode totypeCheckingMode = \"standard\"and add an explicit per-rule severity layer so the team's intent is documented in the config rather than relying on mode defaults.Baseline shrinks 56,151 → 18,916 entries (66% smaller). The shrink is mechanical — driven entirely by the mode change silencing the
reportUnknown*family on third-party-typed deps. No code touched, no behavior change beyond which errors fail CI.Why standard mode
Surveyed the actual
pyproject.toml/pyrightconfig.jsonof mature OSS Python projects in 2025-2026:No mature OSS project surveyed runs basedpyright's
recommendedmode in CI. ThereportUnknown*family is uniformly off on connector/integration codebases — too noisy on partially-typed third-party deps (snowflake-connector-python, pyhive, databricks-sqlalchemy, qlik, looker SDK, etc.).What changes (semantic, not just textual)
typeCheckingModerecommendedstandardreportUnknownMemberTypereportUnknownArgumentTypereportUnknownVariableTypereportUnknownParameterTypereportMissingParameterTypereportUnannotatedClassAttributereportImplicitStringConcatenationreportUnusedCallResultreportMatchNotExhaustivereportUnreachablereportInvalidCasterrorallowedUntypedLibrariesexecutionEnvironmentsReal-bug rules held to
error(these are mode defaults; explicit declarations document intent and prevent a futuretypeCheckingMode = \"basic\"from silently dropping them):reportPossiblyUnboundVariable— realUnboundLocalErrorreportOptionalMemberAccess—NoneTypecrashesreportAttributeAccessIssue— typos and refactor stragglersreportCallIssue— signature mismatchesreportArgumentType— wrong argument typesreportReturnType— wrong return typesreportAssignmentType— wrong assignment typesreportIncompatibleMethodOverride— Liskov violationsreportInvalidTypeArguments— wrong generic argsWhat stays unchanged
reportDeprecated,reportImplicitOverride,reportUnnecessaryTypeIgnoreComment, etc.) preserved with their existing rationales.pythonVersion = \"3.10\"pinned analysis target unchanged.--baselinemode=discardCI behavior unchanged.Verification
make py_format_check→ All checks passednox --no-venv -s static-checks→ 0 errors, 0 warnings, 0 notes