Skip to content

fix: enable subprocess coverage tracking for CLI E2E tests#27329

Open
ulixius9 wants to merge 6 commits intomainfrom
fix/cli-e2e-subprocess-coverage
Open

fix: enable subprocess coverage tracking for CLI E2E tests#27329
ulixius9 wants to merge 6 commits intomainfrom
fix/cli-e2e-subprocess-coverage

Conversation

@ulixius9
Copy link
Copy Markdown
Member

Summary

  • CLI E2E tests run connectors via subprocess.Popen("metadata ingest") but subprocess coverage data was silently lost, meaning SonarCloud never saw the E2E execution coverage
  • Added parallel = true to coverage config so parent and child processes write to separate .coverage.<pid> files instead of colliding on a single .coverage file
  • Changed COVERAGE_PROCESS_START path from relative to absolute using GITHUB_WORKSPACE to ensure the subprocess finds the config regardless of working directory

Evidence: Metabase (zero unit tests, only E2E) shows 53.6% on SonarCloud with client.py at 17.2%. Inspection of .coverage.metabase confirms only import-time + in-process setUpClass lines are present — method bodies executed in the subprocess (API calls, response parsing) have zero coverage data.

Expected impact: All connectors in the E2E matrix should see significant SonarCloud coverage increases once subprocess execution data is captured (e.g., Metabase client.py from 17% to ~70%+).

Test plan

  • Trigger workflow manually for a single connector (e.g., metabase) with debug: false
  • Check coverage report output in CI logs — connector files should show higher coverage than before
  • Verify .coverage.* glob picks up multiple parallel data files during coverage combine
  • After SonarCloud upload, confirm coverage increase for the tested connector

🤖 Generated with Claude Code

CLI E2E tests run connectors via `subprocess.Popen("metadata ingest")`
but the subprocess coverage data was silently lost. Two issues:

1. Missing `parallel = true` in coverage config — parent pytest process
   and child subprocess both wrote to the same `.coverage` file, causing
   data collision. With parallel mode, each process writes to its own
   `.coverage.<pid>` file that `coverage combine` can merge.

2. `COVERAGE_PROCESS_START` used a relative path (`ingestion/pyproject.toml`)
   in sitecustomize.py. Resolved to absolute using `GITHUB_WORKSPACE`.

Evidence: Metabase (zero unit tests, only E2E) shows 53.6% on SonarCloud
with client.py at 17.2% — inspection of .coverage.metabase confirms only
import-time + in-process setup lines are present, with zero method body
coverage from the subprocess execution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 13, 2026 17:51
@ulixius9 ulixius9 requested review from a team, akash-jain-10, harshach and tutte as code owners April 13, 2026 17:51
@github-actions github-actions bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Apr 13, 2026
harshach
harshach previously approved these changes Apr 13, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables Python coverage collection for CLI E2E tests that execute connector code in subprocesses, so SonarCloud reflects coverage from those subprocess runs.

Changes:

  • Enabled Coverage.py parallel mode so multiple processes write separate .coverage.* data files instead of colliding.
  • Made COVERAGE_PROCESS_START point to an absolute pyproject.toml path (via GITHUB_WORKSPACE) so subprocesses can reliably find the coverage config.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
ingestion/pyproject.toml Turns on Coverage.py parallel mode to prevent multi-process data-file collisions.
.github/workflows/py-cli-e2e-tests.yml Updates sitecustomize.py injection so subprocess coverage startup uses an absolute config path.

source = ["metadata"]
relative_files = true
branch = true
parallel = true
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting parallel = true in the global [tool.coverage.run] config changes Coverage.py’s default output from a single .coverage file to per-process .coverage.* files. This workflow (and other CI/local targets) currently assumes .coverage exists (e.g., mv ingestion/.coverage ... in the python-unittests/python-integration matrix entries, and local make run_python_tests doesn’t run coverage combine). Consider either (a) scoping parallel mode to the CLI E2E job via a dedicated coverage config used only for COVERAGE_PROCESS_START, or (b) updating the unit/integration flows to coverage combine (or move/rename the .coverage.* set) before referencing ingestion/.coverage.

Suggested change
parallel = true

Copilot uses AI. Check for mistakes.
* MINOR: Fix snowflake e2e

* fix pyformat
Copilot AI review requested due to automatic review settings April 14, 2026 07:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

@github-actions
Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment on lines 34 to 35
class SnowflakeCliTest(CliCommonDB.TestSuite, SQACommonMethods):
"""
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description/title are focused on subprocess coverage collection, but this file also introduces substantial Snowflake E2E behavior changes (unskipping the suite, relaxing assertions, adding multiple new xfail’ed tests/features). To keep the coverage fix reviewable and reduce CI risk, consider splitting the Snowflake test refactor/feature additions into a separate PR (or explicitly call out why they’re required for the coverage change).

Copilot uses AI. Check for mistakes.
Comment on lines +418 to +438
def build_config_file_for_usage(self) -> None:
"""Build config file for usage ingestion"""
import yaml

self.build_config_file(E2EType.INGEST)

with open(self.test_file_path, "r", encoding="utf-8") as file:
config = yaml.safe_load(file)

config["source"]["type"] = "snowflake-usage"
config["source"]["sourceConfig"] = {
"config": {
"type": "DatabaseUsage",
"queryLogDuration": 1,
"resultLimit": 10000,
}
}

with open(self.test_file_path, "w", encoding="utf-8") as file:
yaml.dump(config, file, default_flow_style=False)

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_config_file_for_usage is added but not referenced anywhere in this test module (or the CLI E2E suite), which increases maintenance surface without exercising the behavior. Either remove it until it’s needed, or add a test that uses it so the configuration path is actually validated in CI.

Suggested change
def build_config_file_for_usage(self) -> None:
"""Build config file for usage ingestion"""
import yaml
self.build_config_file(E2EType.INGEST)
with open(self.test_file_path, "r", encoding="utf-8") as file:
config = yaml.safe_load(file)
config["source"]["type"] = "snowflake-usage"
config["source"]["sourceConfig"] = {
"config": {
"type": "DatabaseUsage",
"queryLogDuration": 1,
"resultLimit": 10000,
}
}
with open(self.test_file_path, "w", encoding="utf-8") as file:
yaml.dump(config, file, default_flow_style=False)

Copilot uses AI. Check for mistakes.
"""Test stored procedures, tags, dynamic tables, streams, constraints,
and clustering in a single ingestion workflow."""
# -- 1. Create all Snowflake objects --
# Stored procedure (requires raw connection for USE DATABASE)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the stored procedure creation “requires raw connection for USE DATABASE”, but the code doesn’t execute USE DATABASE (and uses fully-qualified names). Please update the comment to match the actual reason for using raw_connection() (or switch to the SQLAlchemy connection if raw access isn’t required).

Suggested change
# Stored procedure (requires raw connection for USE DATABASE)
# Stored procedure
# Use a raw DBAPI connection for the procedure creation statement.

Copilot uses AI. Check for mistakes.
@sonarqubecloud
Copy link
Copy Markdown

Comment on lines +707 to +714
countries_table = self.retrieve_table("e2e_snowflake.E2E_DB.E2E_TEST.COUNTRIES")
self.assertIsNotNone(countries_table)
regions_table = self.retrieve_table("e2e_snowflake.E2E_DB.E2E_TEST.REGIONS")
self.assertIsNotNone(regions_table)
pk_columns = [
col for col in regions_table.columns if col.name.root == "REGION_ID"
]
self.assertGreater(len(pk_columns), 0, "REGION_ID column should exist")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: FK constraint assertion only checks column existence, not the FK

In test_snowflake_features_ingestion, the foreign key constraint validation (lines 707-714) retrieves the countries and regions tables but only asserts that the REGION_ID column exists on the regions table. It never actually checks that the FK relationship was ingested — e.g., verifying countries_table has a constraint referencing regions_table, or checking col.constraint on the REGION_ID column in countries. This means the FK creation SQL runs but the test would pass even if FK ingestion is completely broken.

Suggested fix:

# After retrieving countries_table, verify FK is present:
fk_columns = [
    col for col in countries_table.columns
    if col.name.root == "REGION_ID" and col.constraint
]
self.assertGreater(
    len(fk_columns), 0,
    "REGION_ID in countries should have an FK constraint",
)

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Apr 14, 2026

Code Review 👍 Approved with suggestions 0 resolved / 1 findings

Enables subprocess coverage tracking for CLI E2E tests, ensuring more comprehensive execution metrics. Update the FK constraint assertion in test_snowflake_features_ingestion to verify the foreign key relationship rather than just column existence.

💡 Quality: FK constraint assertion only checks column existence, not the FK

📄 ingestion/tests/cli_e2e/test_cli_snowflake.py:707-714

In test_snowflake_features_ingestion, the foreign key constraint validation (lines 707-714) retrieves the countries and regions tables but only asserts that the REGION_ID column exists on the regions table. It never actually checks that the FK relationship was ingested — e.g., verifying countries_table has a constraint referencing regions_table, or checking col.constraint on the REGION_ID column in countries. This means the FK creation SQL runs but the test would pass even if FK ingestion is completely broken.

Suggested fix
# After retrieving countries_table, verify FK is present:
fk_columns = [
    col for col in countries_table.columns
    if col.name.root == "REGION_ID" and col.constraint
]
self.assertGreater(
    len(fk_columns), 0,
    "REGION_ID in countries should have an FK constraint",
)
🤖 Prompt for agents
Code Review: Enables subprocess coverage tracking for CLI E2E tests, ensuring more comprehensive execution metrics. Update the FK constraint assertion in `test_snowflake_features_ingestion` to verify the foreign key relationship rather than just column existence.

1. 💡 Quality: FK constraint assertion only checks column existence, not the FK
   Files: ingestion/tests/cli_e2e/test_cli_snowflake.py:707-714

   In `test_snowflake_features_ingestion`, the foreign key constraint validation (lines 707-714) retrieves the `countries` and `regions` tables but only asserts that the `REGION_ID` column exists on the `regions` table. It never actually checks that the FK relationship was ingested — e.g., verifying `countries_table` has a constraint referencing `regions_table`, or checking `col.constraint` on the `REGION_ID` column in `countries`. This means the FK creation SQL runs but the test would pass even if FK ingestion is completely broken.

   Suggested fix:
   # After retrieving countries_table, verify FK is present:
   fk_columns = [
       col for col in countries_table.columns
       if col.name.root == "REGION_ID" and col.constraint
   ]
   self.assertGreater(
       len(fk_columns), 0,
       "REGION_ID in countries should have an FK constraint",
   )

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants