Skip to content

chore(ingestion): refresh basedpyright config; standard mode + ratchet scaffold#27794

Merged
pmbrull merged 2 commits intomainfrom
chore/basedpyright-config-refresh
Apr 28, 2026
Merged

chore(ingestion): refresh basedpyright config; standard mode + ratchet scaffold#27794
pmbrull merged 2 commits intomainfrom
chore/basedpyright-config-refresh

Conversation

@IceS2
Copy link
Copy Markdown
Contributor

@IceS2 IceS2 commented Apr 28, 2026

Summary

Switch basedpyright from implicit recommended mode to typeCheckingMode = \"standard\" and add an explicit per-rule severity layer so the team's intent is documented in the config rather than relying on mode defaults.

Baseline shrinks 56,151 → 18,916 entries (66% smaller). The shrink is mechanical — driven entirely by the mode change silencing the reportUnknown* family on third-party-typed deps. No code touched, no behavior change beyond which errors fail CI.

Why standard mode

Surveyed the actual pyproject.toml / pyrightconfig.json of mature OSS Python projects in 2025-2026:

No mature OSS project surveyed runs basedpyright's recommended mode in CI. The reportUnknown* family is uniformly off on connector/integration codebases — too noisy on partially-typed third-party deps (snowflake-connector-python, pyhive, databricks-sqlalchemy, qlik, looker SDK, etc.).

What changes (semantic, not just textual)

Setting Before After
typeCheckingMode implicit recommended standard
reportUnknownMemberType warning none
reportUnknownArgumentType warning none
reportUnknownVariableType warning none
reportUnknownParameterType warning none
reportMissingParameterType warning warning (kept)
reportUnannotatedClassAttribute warning none
reportImplicitStringConcatenation warning none
reportUnusedCallResult warning none
reportMatchNotExhaustive none warning (cheap promotion)
reportUnreachable hint warning (cheap promotion)
reportInvalidCast none warning (cheap promotion)
9 real-bug rules at error implicit (mode default) explicit
allowedUntypedLibraries absent empty scaffold
executionEnvironments absent commented scaffold

Real-bug rules held to error (these are mode defaults; explicit declarations document intent and prevent a future typeCheckingMode = \"basic\" from silently dropping them):

  • reportPossiblyUnboundVariable — real UnboundLocalError
  • reportOptionalMemberAccessNoneType crashes
  • reportAttributeAccessIssue — typos and refactor stragglers
  • reportCallIssue — signature mismatches
  • reportArgumentType — wrong argument types
  • reportReturnType — wrong return types
  • reportAssignmentType — wrong assignment types
  • reportIncompatibleMethodOverride — Liskov violations
  • reportInvalidTypeArguments — wrong generic args

What stays unchanged

  • All existing project-specific waivers (reportDeprecated, reportImplicitOverride, reportUnnecessaryTypeIgnoreComment, etc.) preserved with their existing rationales.
  • pythonVersion = \"3.10\" pinned analysis target unchanged.
  • --baselinemode=discard CI behavior unchanged.

Verification

  • make py_format_check → All checks passed
  • nox --no-venv -s static-checks → 0 errors, 0 warnings, 0 notes
  • Baseline regenerated locally (passes 1 + 2 stable at 18,916 entries)

…t scaffold

Switch from implicit `recommended` mode to `typeCheckingMode = "standard"`,
matching the production-default of every mature OSS pyright/basedpyright
config surveyed (Pydantic, Litestar, FastAPI, OpenAI/Anthropic SDKs,
Polars, Strawberry, Airflow, AnyIO). `recommended` enables the
`reportUnknown*` family which is catastrophic on a 75-connector codebase
with partially-typed third-party deps (snowflake, pyhive, databricks,
etc.) — 30K+ baseline entries are noise from those library boundaries,
not real type debt in our code.

The config holds real-bug rules at `error` explicitly so a future
config refactor can't silently drop them:
  - reportPossiblyUnboundVariable      (real UnboundLocalError)
  - reportOptionalMemberAccess          (NoneType crashes)
  - reportAttributeAccessIssue          (typos / refactor stragglers)
  - reportCallIssue / reportArgumentType / reportReturnType
  - reportAssignmentType / reportIncompatibleMethodOverride
  - reportInvalidTypeArguments

Plus three cheap promotions at `warning` for real-bug rules that fire
rarely:
  - reportMatchNotExhaustive
  - reportUnreachable
  - reportInvalidCast

`allowedUntypedLibraries` and an `executionEnvironments` block are
scaffolded (empty / commented) for the per-subtree ratchet plan: as
well-typed subtrees (`data_quality/`, `utils/`, `ometa/`) drop their
local baselines, they get promoted to a stricter rule subset
independent of the connector tail.

Baseline shrinks 56,151 → 18,916 entries (66% smaller). The remaining
18,916 are real-bug-class entries that bound the cleanup work going
forward; `reportUnknown*` noise is no longer measured.
Copilot AI review requested due to automatic review settings April 28, 2026 10:23
@IceS2 IceS2 requested a review from a team as a code owner April 28, 2026 10:23
@github-actions github-actions Bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Apr 28, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts the ingestion package’s basedpyright configuration to explicitly use typeCheckingMode = "standard" and document rule severities, making CI type-check intent explicit and reducing noise from unknown-type diagnostics on partially-typed third-party dependencies.

Changes:

  • Switch basedpyright type-checking mode from implicit recommended to explicit standard.
  • Add an explicit per-rule severity layer (error/warning/none) including “real-bug” rules held at error.
  • Add scaffolding for future ratcheting (allowedUntypedLibraries and commented executionEnvironments examples).

Comment thread ingestion/pyproject.toml
Comment on lines +323 to +325
reportExplicitAny = false # same
# @override was only added in python 3.12: https://docs.python.org/3/library/typing.html#typing.override
reportImplicitOverride = false
reportImplicitOverride = false # we're at Python 3.10
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline rationale for reportMissingTypeStubs = false says it's "covered via allowedUntypedLibraries", but allowedUntypedLibraries is currently an empty list and this rule being disabled doesn't rely on it. Consider rewording the comment to reflect current behavior (e.g., that allowedUntypedLibraries is a future scaffold) to avoid confusing future maintainers.

Copilot uses AI. Check for mistakes.
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 28, 2026

Code Review ✅ Approved

Refreshes the basedpyright configuration to implement standard mode and prepare the ratchet scaffold. No issues found.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@pmbrull pmbrull merged commit 0d44cfb into main Apr 28, 2026
47 of 59 checks passed
@pmbrull pmbrull deleted the chore/basedpyright-config-refresh branch April 28, 2026 10:58
@sonarqubecloud
Copy link
Copy Markdown

@github-actions
Copy link
Copy Markdown
Contributor

🟡 Playwright Results — all passed (16 flaky)

✅ 3959 passed · ❌ 0 failed · 🟡 16 flaky · ⏭️ 86 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 298 0 1 4
🟡 Shard 2 738 0 7 8
🟡 Shard 3 746 0 2 7
🟡 Shard 4 756 0 3 18
✅ Shard 5 687 0 0 41
🟡 Shard 6 734 0 3 8
🟡 16 flaky test(s) (passed on retry)
  • Pages/AuditLogs.spec.ts › should apply both User and EntityType filters simultaneously (shard 1, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event shows the actor who made the change (shard 2, 1 retry)
  • Features/DomainFilterQueryFilter.spec.ts › Multi-nested domain hierarchy: filters should scope correctly at every level (shard 2, 1 retry)
  • Features/ExploreQuickFilters.spec.ts › should show correct count for tier filter options from aggregation (shard 2, 1 retry)
  • Features/ExploreQuickFilters.spec.ts › tier with assigned asset appears in dropdown, tier without asset does not (shard 2, 1 retry)
  • Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2, 1 retry)
  • Features/Glossary/GlossaryWorkflow.spec.ts › should start term as Draft when glossary has reviewers (shard 2, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Flow/AddRoleAndAssignToUser.spec.ts › Verify assigned role to new user (shard 3, 1 retry)
  • Pages/CustomProperties.spec.ts › Set & Update all CP types on databaseSchema (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Api Collection (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Directory (shard 4, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/ODCSImportExport.spec.ts › Multi-object ODCS contract - object selector shows all schema objects (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

harshach added a commit that referenced this pull request Apr 28, 2026
Static-checks CI was failing on Python 3.10 after main merged the
basedpyright ratchet (PR #27794) and migrated formatting to ruff.

Real type-correctness fixes (4 errors caught by the type checker):
- _build_downstream_flow_edge: guard against fqn.build returning None
  before passing to get_by_name (which requires str).
- _get_pipeline_entity: same guard.
- _resolve_table_entity: search_in_any_service can return List[Table];
  pick the first hit when given a list, otherwise return the entity
  or None. Also pass empty strings for None database/schema_name to
  match the str type of build_es_fqn_search_string.

Linter / formatter conformance:
- Modernized typing across all connector + test files (Optional[X] →
  X | None, Dict → dict, etc.) via `ruff check --fix`.
- Dropped local annotations on .right / pipeline_status (TC001:
  type-only imports forbidden at runtime).
- Marked relative `from ._fixtures import ...` with `# noqa: TID252`
  to keep the conftest-vs-test-module split working in CI.
- Replaced unused tuple unpacks with `_` (RUF059).

basedpyright baseline:
- Refreshed `.basedpyright/baseline.json` so the Pydantic-noise
  errors (EntityReference / Task / TaskStatus / PipelineStatus
  "missing optional kwargs", TopologyContext dynamic attrs,
  cached_property iterability, BaseSpec arg shape) match the same
  pattern other connectors use.

103 tests pass; basedpyright with --baselinemode=discard reports
0 errors against the new baseline.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jatinmasaram pushed a commit to jatinmasaram/OpenMetadata that referenced this pull request May 2, 2026
…t scaffold (open-metadata#27794)

* chore(ingestion): refresh basedpyright config; standard mode + ratchet scaffold

Switch from implicit `recommended` mode to `typeCheckingMode = "standard"`,
matching the production-default of every mature OSS pyright/basedpyright
config surveyed (Pydantic, Litestar, FastAPI, OpenAI/Anthropic SDKs,
Polars, Strawberry, Airflow, AnyIO). `recommended` enables the
`reportUnknown*` family which is catastrophic on a 75-connector codebase
with partially-typed third-party deps (snowflake, pyhive, databricks,
etc.) — 30K+ baseline entries are noise from those library boundaries,
not real type debt in our code.

The config holds real-bug rules at `error` explicitly so a future
config refactor can't silently drop them:
  - reportPossiblyUnboundVariable      (real UnboundLocalError)
  - reportOptionalMemberAccess          (NoneType crashes)
  - reportAttributeAccessIssue          (typos / refactor stragglers)
  - reportCallIssue / reportArgumentType / reportReturnType
  - reportAssignmentType / reportIncompatibleMethodOverride
  - reportInvalidTypeArguments

Plus three cheap promotions at `warning` for real-bug rules that fire
rarely:
  - reportMatchNotExhaustive
  - reportUnreachable
  - reportInvalidCast

`allowedUntypedLibraries` and an `executionEnvironments` block are
scaffolded (empty / commented) for the per-subtree ratchet plan: as
well-typed subtrees (`data_quality/`, `utils/`, `ometa/`) drop their
local baselines, they get promoted to a stricter rule subset
independent of the connector tail.

Baseline shrinks 56,151 → 18,916 entries (66% smaller). The remaining
18,916 are real-bug-class entries that bound the cleanup work going
forward; `reportUnknown*` noise is no longer measured.

* Update pyproject.toml
jaya6400 pushed a commit to jaya6400/OpenMetadata that referenced this pull request May 4, 2026
…t scaffold (open-metadata#27794)

* chore(ingestion): refresh basedpyright config; standard mode + ratchet scaffold

Switch from implicit `recommended` mode to `typeCheckingMode = "standard"`,
matching the production-default of every mature OSS pyright/basedpyright
config surveyed (Pydantic, Litestar, FastAPI, OpenAI/Anthropic SDKs,
Polars, Strawberry, Airflow, AnyIO). `recommended` enables the
`reportUnknown*` family which is catastrophic on a 75-connector codebase
with partially-typed third-party deps (snowflake, pyhive, databricks,
etc.) — 30K+ baseline entries are noise from those library boundaries,
not real type debt in our code.

The config holds real-bug rules at `error` explicitly so a future
config refactor can't silently drop them:
  - reportPossiblyUnboundVariable      (real UnboundLocalError)
  - reportOptionalMemberAccess          (NoneType crashes)
  - reportAttributeAccessIssue          (typos / refactor stragglers)
  - reportCallIssue / reportArgumentType / reportReturnType
  - reportAssignmentType / reportIncompatibleMethodOverride
  - reportInvalidTypeArguments

Plus three cheap promotions at `warning` for real-bug rules that fire
rarely:
  - reportMatchNotExhaustive
  - reportUnreachable
  - reportInvalidCast

`allowedUntypedLibraries` and an `executionEnvironments` block are
scaffolded (empty / commented) for the per-subtree ratchet plan: as
well-typed subtrees (`data_quality/`, `utils/`, `ometa/`) drop their
local baselines, they get promoted to a stricter rule subset
independent of the connector tail.

Baseline shrinks 56,151 → 18,916 entries (66% smaller). The remaining
18,916 are real-bug-class entries that bound the cleanup work going
forward; `reportUnknown*` noise is no longer measured.

* Update pyproject.toml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants