Skip to content

chore(ingestion): migrate to ruff for format + isort + unused-import#27739

Merged
IceS2 merged 6 commits intomainfrom
modernize-ingestion-pyproject
Apr 27, 2026
Merged

chore(ingestion): migrate to ruff for format + isort + unused-import#27739
IceS2 merged 6 commits intomainfrom
modernize-ingestion-pyproject

Conversation

@IceS2
Copy link
Copy Markdown
Contributor

@IceS2 IceS2 commented Apr 26, 2026

Summary

Migrates the Python tooling for `ingestion/` and `openmetadata-airflow-apis/` from black + isort + pycln to ruff (one tool, one config block, ~100x faster). Strictly a tooling swap — no logic changes.

What changed

Tooling config (commit 1: 5 files)

  • `pyproject.toml` × 2: drop `[tool.black]`, `[tool.isort]`, `[tool.pycln]`, `[tool.mypy]` (dead — basedpyright is the active type checker); add `[tool.ruff]` block (line-length 120, target py3.10) with `format` + `lint` (`I` isort + `F` pyflakes); namespaced `known-first-party` to cover `metadata`, `ingestion`, `_openmetadata_testutils`, and `airflow_provider_openmetadata`
  • `setup.py` × 2: dev-deps swap black/isort/pycln → `ruff~=0.15.12`
  • `Makefile`: `py_format` and `py_format_check` rewired to ruff
  • `.pre-commit-config.yaml`: pre-commit-hooks v2.3 → v5.0; replace black, isort, pycln hooks with `astral-sh/ruff-pre-commit` v0.15.12 (ruff-check + ruff-format)
  • `requires-python` bumped to `>=3.10` (matches `noxfile.py` and CLAUDE.md; Python 3.9 is already documented as broken on Mac in noxfile)

JSON files excluded from check-json

The pre-commit-hooks v5.0 `check-json` is stricter than v2.3 and catches four pre-existing JSON issues. Excluded with an inline TODO for separate cleanup:

  • `openmetadata-spec/src/main/resources/rdf/contexts/dataAsset.jsonld` — duplicate key `columns`
  • `ingestion/examples/sample_data/pipelines/tasks.json` — duplicate key `sourceUrl`
  • `openmetadata-service/src/main/resources/dataInsights/opensearch/indexSettingsTemplate.json` — empty/template content
  • `openmetadata-ui/src/main/resources/ui/playwright/test-data/odcs-examples/invalid-malformed.json` — intentionally malformed test fixture

Code changes (commit 2: 1568 files)

  • 253 `# noqa: ` markers added via `ruff check --add-noqa` across 128 files. Existing violations grandfathered so the migration is behavior-preserving. Per-rule cleanup tracked in the TODO comment at the top of `[tool.ruff.lint]` in `ingestion/pyproject.toml`:
    • F541 (52) — `f"hello"` → `"hello"`
    • F402 (2) — loop-var-shadows-import
    • F401 (13) — unused imports
    • I001 (27) — import sorting
    • F811 (16) — redefined-while-unused
    • F841 (91) — unused-variable
    • F821 (50) — undefined-name (likely real bugs to investigate)
  • Bulk `ruff format` reformat from black 22.3 baseline. Cosmetic only.

What is not in scope (deliberate, follow-up PRs)

  • Pylint → ruff migration (Stage 2): pylint's custom plugins (`print_checker`, `import_checker`) and 894 `# pylint: disable=…` markers need a separate plan.
  • basedpyright `--writebaseline` + CI gate.
  • Per-rule lint cleanup (remove noqa markers, fix underlying issues).
  • Coverage path remap fix (`ingestion/src/*` omit + sed in Makefile is a workaround for a missing entry in `[tool.coverage.paths]`).
  • Loosening `setuptools~=70.3.0` build-system pin.
  • Cleanup of the four excluded JSON files.

Sanity check

The reformat is large (1573 files, -24K net lines from line-length 120 unwrapping multi-line constructs) but structurally balanced:

Check Deleted Added Delta
Imports (`from`/`import`) 32 32 0
Structural keywords (`def`/`class`/`return`/`raise`) 2221 2221 0

Every import and structural keyword removed has a corresponding addition. No logic loss.


Summary by Gitar

  • Dependency cleanup:
    • Removed legacy conditional python_version pins in setup.py for mysql-connector-python, testcontainers, and locust.
  • Coverage configuration:
    • Added [tool.coverage.paths] to openmetadata-airflow-apis/pyproject.toml to ensure consistent path resolution in CI.
  • Pylint compatibility:
    • Added cross-version pylint suppression for arguments-differ, signature-differs, and unused-argument in OpenMetadataValidationAction1xx.

This will update automatically on new commits.

IceS2 added 2 commits April 26, 2026 15:43
- Swap formatter + import-sorter + unused-import tooling for ruff
  (line-length 120, target py3.10) in ingestion + openmetadata-airflow-apis
- Drop dead [tool.mypy] config; basedpyright is the active type checker
- Bump requires-python to >=3.10 to match noxfile and CLAUDE.md (3.9 is
  documented as broken on Mac in noxfile.py)
- Bump pre-commit-hooks v2.3 -> v5.0; the new check-json catches four
  pre-existing JSON issues now excluded with an inline TODO
- Update Makefile py_format / py_format_check targets to call ruff
- 253 noqa markers added via 'ruff check --add-noqa' across 128 files,
  freezing existing violations so this PR is a tooling-only swap. Per-rule
  cleanup tracked in the TODO comment in ingestion/pyproject.toml.
- Bulk reformat from black 22.3 -> ruff format @ line-length 120.
  Cosmetic only: imports balanced (-32/+32), structural keywords balanced
  (-2221/+2221), no logic changes.
- Star-import rules (F403/F405) globally ignored; refactoring wildcard
  imports across connectors is a separate effort.
Copilot AI review requested due to automatic review settings April 26, 2026 13:46
@IceS2 IceS2 requested a review from a team as a code owner April 26, 2026 13:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

@github-actions github-actions Bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Apr 26, 2026
service_type = get_service_type(source_type)
connection_class = get_connection_class(source_type, service_type)

if source_type in HAS_INNER_CONNECTION:
logger.error(
f"Secret value [{secret_id}] not present in the configured secrets manager: {exc}"
)
logger.error(f"Secret value [{secret_id}] not present in the configured secrets manager: {exc}")
logger.info(
f"this is user: {user}, password: {password}, text: {text_response}"
)
logger.info(f"this is user: {user}, password: {password}, text: {text_response}")
logger.error(
f"Could not get the secret value of {secret_id} due to [{exc}]"
)
logger.error(f"Could not get the secret value of {secret_id} due to [{exc}]")
raise exc

def load_credentials(self) -> Optional["AzureCredentials"]:
def load_credentials(self) -> Optional["AzureCredentials"]: # noqa: F821
logger.error(
f"Could not get the secret value of {secret_id} due to [{exc}]"
)
logger.error(f"Could not get the secret value of {secret_id} due to [{exc}]")
logger.error(
f"Could not get the secret value of {secret_id} due to [{exc}]"
)
logger.error(f"Could not get the secret value of {secret_id} due to [{exc}]")
logger.error(
f"Could not get the secret value of {secret_id} due to [{exc}]"
)
logger.error(f"Could not get the secret value of {secret_id} due to [{exc}]")
@github-actions
Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

- filters.py: drop redundant parens around re.match(...) in `if`
  (C0325 superfluous-parens) — exposed when ruff format unwrapped them
- nosql_adaptor.py: move `# pylint: disable=unused-argument` from the
  `column:` line to the `def` line so it covers `table` too (W0613) —
  scope was line-based, lost when ruff split params onto multiple lines
- action1xx.py: replace `arguments-differ` with `signature-differs` in
  the disable directive (was always wrong code) and drop the now-useless
  `unused-argument` suppression (I0021)
@github-actions
Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Conflicts resolved by taking main's content (8 files) and re-applying
ruff format + ruff check --add-noqa to keep the modernization PR
behavior-preserving.
Copilot AI review requested due to automatic review settings April 26, 2026 14:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

Comment on lines 493 to 498
def patch_column_tags(
self,
table: ClassifiableEntityType,
column_tags: List[ColumnTag],
operation: Union[
PatchOperation.ADD, PatchOperation.REMOVE
] = PatchOperation.ADD,
operation: Union[PatchOperation.ADD, PatchOperation.REMOVE] = PatchOperation.ADD,
) -> Optional[T]:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: patch_column_tags parameter still named table for Container entities

After generalizing patch_column_tags to accept ClassifiableEntityType (Table or Container), the parameter is still named table (line 495) and documented as such. Downstream callers like metadata_rest.py:823 pass table=entity where entity is a Container. This is confusing and will make the code harder to maintain. Consider renaming to entity to match the actual semantics.

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

@github-actions
Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

…ions

CI's `make py_format_check` runs from the repo root and passes both
`ingestion/` and `./openmetadata-airflow-apis/` to ruff in a single
invocation. With multiple root paths, ruff's parallel file discovery
races on extend-exclude matching against the project root, so files
under `ingestion/src/metadata/generated/` were intermittently scanned
and produced ~830 I001 violations.

20-run repro: 10/20 fail without the fix, 20/20 pass with the fix.

Each excluded directory now appears twice in extend-exclude:
- the project-root-relative pattern (cwd = ingestion/)
- the prefixed pattern (cwd = repo root, multi-root invocation)
@github-actions
Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

…isable

- openmetadata-airflow-apis/pyproject.toml: switch coverage to module-name
  source + [tool.coverage.paths] glob remap (matches the ingestion pattern).
  Drops the hardcoded `env/lib/python3.9/site-packages/...` source path,
  which broke after the requires-python bump to 3.10. (Finding 1)
- ingestion/setup.py: remove dead python_version<'3.9' / >='3.9' guards on
  mysql-connector-python and testcontainers; promote locust to a regular
  test dep (was conditionally added under sys.version_info >= (3, 9)). Also
  remove the now-unused `import sys`. (Finding 3)
- ingestion/src/metadata/great_expectations/action1xx.py: cover both
  arguments-differ (great_expectations 0.18.x parent) and signature-differs
  (great_expectations 1.x parent) in the pylint disable comment, since
  CI installs 0.18.x and local often has 1.x. unused-argument covers the
  unused action_context. The opposite rule fires as I0021 useless-suppression
  on each environment, which is informational and does not affect pylint's
  exit code.
Copilot AI review requested due to automatic review settings April 26, 2026 15:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

Comment thread openmetadata-airflow-apis/pyproject.toml
Comment thread ingestion/setup.py
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 26, 2026

Code Review ⚠️ Changes requested 2 resolved / 4 findings

Migration to Ruff correctly cleans up redundant imports and legacy Python 3.9 guards. The PR requires updates to ensure _prepare_container_destination handles null columns and to rename the misleading table parameter in patch_column_tags.

⚠️ Edge Case: _prepare_container_destination can pass None columns to update_column_tags

📄 ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py:463-475 📄 ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py:90

_prepare_container_destination checks that instance.dataModel is not None (line 467), but does not check that instance.dataModel.columns is not None. If the API returns a Container whose dataModel exists but has columns=None, line 471 sets container.dataModel.columns = None, and line 474 calls update_column_tags(destination.dataModel.columns, ...). Since update_column_tags iterates for col in columns: (line 90), this will raise TypeError: 'NoneType' is not iterable.

The table path doesn't have this issue because Table.columns is a required field, but ContainerDataModel.columns can be None or absent.

Suggested fix
def _prepare_container_destination(
    self,
    container: Container,
    instance: Container,
    column_tags: List[ColumnTag],
    operation: PatchOperation,
) -> Optional[Container]:
    if container.dataModel is None or instance.dataModel is None or not instance.dataModel.columns:
        logger.warning(f"Container {container.fullyQualifiedName.root} has no dataModel or columns, skipping column tag patch")
        return None
    ...
💡 Quality: patch_column_tags parameter still named table for Container entities

📄 ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py:493-498 📄 ingestion/src/metadata/ingestion/sink/metadata_rest.py:823

After generalizing patch_column_tags to accept ClassifiableEntityType (Table or Container), the parameter is still named table (line 495) and documented as such. Downstream callers like metadata_rest.py:823 pass table=entity where entity is a Container. This is confusing and will make the code harder to maintain. Consider renaming to entity to match the actual semantics.

✅ 2 resolved
Bug: Coverage source path hardcodes python3.9 after >=3.10 bump

📄 openmetadata-airflow-apis/pyproject.toml:51-52
The requires-python was bumped from >=3.9 to >=3.10 in this PR, but openmetadata-airflow-apis/pyproject.toml still hardcodes python3.9 in the coverage source path:

[tool.coverage.run]
source = ["env/lib/python3.9/site-packages/openmetadata_managed_apis"]

With no Python 3.9 environments expected anymore, coverage run will never find source files under this path, silently producing empty coverage reports for the airflow-apis package. This was already fragile before, but the requires-python bump in this PR makes it definitively broken.

Quality: Dead python_version<3.9 guards in setup.py after >=3.10 bump

📄 ingestion/setup.py:177-178 📄 ingestion/setup.py:465-466 📄 ingestion/setup.py:498
Several dependency conditions in ingestion/setup.py guard on python_version<'3.9' or python_version>='3.9', and one runtime check uses sys.version_info >= (3, 9). Since requires-python is now >=3.10, the <3.9 branches are dead code and the >=3.9 guards are always true. While these are pre-existing and harmless, they should be simplified in a follow-up to avoid confusion now that the minimum is 3.10.

🤖 Prompt for agents
Code Review: Migration to Ruff correctly cleans up redundant imports and legacy Python 3.9 guards. The PR requires updates to ensure `_prepare_container_destination` handles null columns and to rename the misleading `table` parameter in `patch_column_tags`.

1. ⚠️ Edge Case: `_prepare_container_destination` can pass None columns to `update_column_tags`
   Files: ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py:463-475, ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py:90

   `_prepare_container_destination` checks that `instance.dataModel` is not None (line 467), but does not check that `instance.dataModel.columns` is not None. If the API returns a Container whose `dataModel` exists but has `columns=None`, line 471 sets `container.dataModel.columns = None`, and line 474 calls `update_column_tags(destination.dataModel.columns, ...)`. Since `update_column_tags` iterates `for col in columns:` (line 90), this will raise `TypeError: 'NoneType' is not iterable`.
   
   The table path doesn't have this issue because `Table.columns` is a required field, but `ContainerDataModel.columns` can be None or absent.

   Suggested fix:
   def _prepare_container_destination(
       self,
       container: Container,
       instance: Container,
       column_tags: List[ColumnTag],
       operation: PatchOperation,
   ) -> Optional[Container]:
       if container.dataModel is None or instance.dataModel is None or not instance.dataModel.columns:
           logger.warning(f"Container {container.fullyQualifiedName.root} has no dataModel or columns, skipping column tag patch")
           return None
       ...

2. 💡 Quality: `patch_column_tags` parameter still named `table` for Container entities
   Files: ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py:493-498, ingestion/src/metadata/ingestion/sink/metadata_rest.py:823

   After generalizing `patch_column_tags` to accept `ClassifiableEntityType` (Table or Container), the parameter is still named `table` (line 495) and documented as such. Downstream callers like `metadata_rest.py:823` pass `table=entity` where `entity` is a Container. This is confusing and will make the code harder to maintain. Consider renaming to `entity` to match the actual semantics.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@github-actions
Copy link
Copy Markdown
Contributor

🟡 Playwright Results — all passed (18 flaky)

✅ 3955 passed · ❌ 0 failed · 🟡 18 flaky · ⏭️ 86 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 298 0 1 4
🟡 Shard 2 752 0 7 8
🟡 Shard 3 727 0 5 7
🟡 Shard 4 758 0 1 18
🟡 Shard 5 686 0 1 41
🟡 Shard 6 734 0 3 8
🟡 18 flaky test(s) (passed on retry)
  • Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when owner is added (shard 2, 1 retry)
  • Features/CuratedAssets.spec.ts › Test API Collections with display name filter (shard 2, 1 retry)
  • Features/CustomMetric.spec.ts › Column custom metric (shard 2, 1 retry)
  • Features/DomainFilterQueryFilter.spec.ts › Assets from selected domain should be visible in explore page (shard 2, 1 retry)
  • Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2, 2 retries)
  • Features/Glossary/GlossaryWorkflow.spec.ts › should start term as Draft when glossary has reviewers (shard 2, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Features/UserProfileOnlineStatus.spec.ts › Should not show online status for inactive users (shard 3, 1 retry)
  • Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
  • Flow/SchemaTable.spec.ts › schema table test (shard 3, 1 retry)
  • Pages/ClassificationConditionalRendering.spec.ts › Should show loader then render classification content on initial page load (shard 3, 1 retry)
  • Pages/DescriptionVisibility.spec.ts › Data Product truncates long description and end of text is not visible before expand (shard 4, 1 retry)
  • Pages/Entity.spec.ts › User as Owner with unsorted list (shard 5, 1 retry)
  • Features/AutoPilot.spec.ts › Agents created by AutoPilot should be deleted (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@IceS2 IceS2 merged commit 5009059 into main Apr 27, 2026
50 of 52 checks passed
@IceS2 IceS2 deleted the modernize-ingestion-pyproject branch April 27, 2026 08:05
IceS2 added a commit that referenced this pull request Apr 29, 2026
Auto-fixes from `pre-commit run --all-files` after merging main into
slice 1 (the main commits include the ruff migration #27739/#27774).

Changes:
- Drop string-quoted forward references where `from __future__ import
  annotations` is in effect (auto-fix).
- Move type-only imports under `TYPE_CHECKING` blocks (auto-fix).
- Convert `Union[A, B, ...]` to `A | B | ...` in `pipelines.py`.
- Annotate class-level mutable defaults with `ClassVar` in
  `entity_assert.py` and `table_assert.py`.
- Add `check=False` to `subprocess.run` in `cli_runner.py`.
- Mark `StructuralMismatch` and `SourceBaselineDrift` with
  `# noqa: N818` — public exception names, intentional API surface.

`pyproject.toml` per-file-ignores extended for `tests/cli_e2e_v2/**`:
- `TID252` (relative imports) — by design (connector-centric layout).
- `T201` (print) — top-level conftest's session posture banner.

Both paths listed twice (relative + ingestion-prefixed) per the
existing dual-cwd pattern in this file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants