fix: enable subprocess coverage tracking for CLI E2E tests#27329
fix: enable subprocess coverage tracking for CLI E2E tests#27329
Conversation
CLI E2E tests run connectors via `subprocess.Popen("metadata ingest")`
but the subprocess coverage data was silently lost. Two issues:
1. Missing `parallel = true` in coverage config — parent pytest process
and child subprocess both wrote to the same `.coverage` file, causing
data collision. With parallel mode, each process writes to its own
`.coverage.<pid>` file that `coverage combine` can merge.
2. `COVERAGE_PROCESS_START` used a relative path (`ingestion/pyproject.toml`)
in sitecustomize.py. Resolved to absolute using `GITHUB_WORKSPACE`.
Evidence: Metabase (zero unit tests, only E2E) shows 53.6% on SonarCloud
with client.py at 17.2% — inspection of .coverage.metabase confirms only
import-time + in-process setup lines are present, with zero method body
coverage from the subprocess execution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Enables Python coverage collection for CLI E2E tests that execute connector code in subprocesses, so SonarCloud reflects coverage from those subprocess runs.
Changes:
- Enabled Coverage.py parallel mode so multiple processes write separate
.coverage.*data files instead of colliding. - Made
COVERAGE_PROCESS_STARTpoint to an absolutepyproject.tomlpath (viaGITHUB_WORKSPACE) so subprocesses can reliably find the coverage config.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
ingestion/pyproject.toml |
Turns on Coverage.py parallel mode to prevent multi-process data-file collisions. |
.github/workflows/py-cli-e2e-tests.yml |
Updates sitecustomize.py injection so subprocess coverage startup uses an absolute config path. |
| source = ["metadata"] | ||
| relative_files = true | ||
| branch = true | ||
| parallel = true |
There was a problem hiding this comment.
Setting parallel = true in the global [tool.coverage.run] config changes Coverage.py’s default output from a single .coverage file to per-process .coverage.* files. This workflow (and other CI/local targets) currently assumes .coverage exists (e.g., mv ingestion/.coverage ... in the python-unittests/python-integration matrix entries, and local make run_python_tests doesn’t run coverage combine). Consider either (a) scoping parallel mode to the CLI E2E job via a dedicated coverage config used only for COVERAGE_PROCESS_START, or (b) updating the unit/integration flows to coverage combine (or move/rename the .coverage.* set) before referencing ingestion/.coverage.
| parallel = true |
* MINOR: Fix snowflake e2e * fix pyformat
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
| class SnowflakeCliTest(CliCommonDB.TestSuite, SQACommonMethods): | ||
| """ |
There was a problem hiding this comment.
The PR description/title are focused on subprocess coverage collection, but this file also introduces substantial Snowflake E2E behavior changes (unskipping the suite, relaxing assertions, adding multiple new xfail’ed tests/features). To keep the coverage fix reviewable and reduce CI risk, consider splitting the Snowflake test refactor/feature additions into a separate PR (or explicitly call out why they’re required for the coverage change).
| def build_config_file_for_usage(self) -> None: | ||
| """Build config file for usage ingestion""" | ||
| import yaml | ||
|
|
||
| self.build_config_file(E2EType.INGEST) | ||
|
|
||
| with open(self.test_file_path, "r", encoding="utf-8") as file: | ||
| config = yaml.safe_load(file) | ||
|
|
||
| config["source"]["type"] = "snowflake-usage" | ||
| config["source"]["sourceConfig"] = { | ||
| "config": { | ||
| "type": "DatabaseUsage", | ||
| "queryLogDuration": 1, | ||
| "resultLimit": 10000, | ||
| } | ||
| } | ||
|
|
||
| with open(self.test_file_path, "w", encoding="utf-8") as file: | ||
| yaml.dump(config, file, default_flow_style=False) | ||
|
|
There was a problem hiding this comment.
build_config_file_for_usage is added but not referenced anywhere in this test module (or the CLI E2E suite), which increases maintenance surface without exercising the behavior. Either remove it until it’s needed, or add a test that uses it so the configuration path is actually validated in CI.
| def build_config_file_for_usage(self) -> None: | |
| """Build config file for usage ingestion""" | |
| import yaml | |
| self.build_config_file(E2EType.INGEST) | |
| with open(self.test_file_path, "r", encoding="utf-8") as file: | |
| config = yaml.safe_load(file) | |
| config["source"]["type"] = "snowflake-usage" | |
| config["source"]["sourceConfig"] = { | |
| "config": { | |
| "type": "DatabaseUsage", | |
| "queryLogDuration": 1, | |
| "resultLimit": 10000, | |
| } | |
| } | |
| with open(self.test_file_path, "w", encoding="utf-8") as file: | |
| yaml.dump(config, file, default_flow_style=False) |
| """Test stored procedures, tags, dynamic tables, streams, constraints, | ||
| and clustering in a single ingestion workflow.""" | ||
| # -- 1. Create all Snowflake objects -- | ||
| # Stored procedure (requires raw connection for USE DATABASE) |
There was a problem hiding this comment.
The comment says the stored procedure creation “requires raw connection for USE DATABASE”, but the code doesn’t execute USE DATABASE (and uses fully-qualified names). Please update the comment to match the actual reason for using raw_connection() (or switch to the SQLAlchemy connection if raw access isn’t required).
| # Stored procedure (requires raw connection for USE DATABASE) | |
| # Stored procedure | |
| # Use a raw DBAPI connection for the procedure creation statement. |
|
| countries_table = self.retrieve_table("e2e_snowflake.E2E_DB.E2E_TEST.COUNTRIES") | ||
| self.assertIsNotNone(countries_table) | ||
| regions_table = self.retrieve_table("e2e_snowflake.E2E_DB.E2E_TEST.REGIONS") | ||
| self.assertIsNotNone(regions_table) | ||
| pk_columns = [ | ||
| col for col in regions_table.columns if col.name.root == "REGION_ID" | ||
| ] | ||
| self.assertGreater(len(pk_columns), 0, "REGION_ID column should exist") |
There was a problem hiding this comment.
💡 Quality: FK constraint assertion only checks column existence, not the FK
In test_snowflake_features_ingestion, the foreign key constraint validation (lines 707-714) retrieves the countries and regions tables but only asserts that the REGION_ID column exists on the regions table. It never actually checks that the FK relationship was ingested — e.g., verifying countries_table has a constraint referencing regions_table, or checking col.constraint on the REGION_ID column in countries. This means the FK creation SQL runs but the test would pass even if FK ingestion is completely broken.
Suggested fix:
# After retrieving countries_table, verify FK is present:
fk_columns = [
col for col in countries_table.columns
if col.name.root == "REGION_ID" and col.constraint
]
self.assertGreater(
len(fk_columns), 0,
"REGION_ID in countries should have an FK constraint",
)
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
Code Review 👍 Approved with suggestions 0 resolved / 1 findingsEnables subprocess coverage tracking for CLI E2E tests, ensuring more comprehensive execution metrics. Update the FK constraint assertion in 💡 Quality: FK constraint assertion only checks column existence, not the FK📄 ingestion/tests/cli_e2e/test_cli_snowflake.py:707-714 In Suggested fix🤖 Prompt for agentsOptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |



Summary
subprocess.Popen("metadata ingest")but subprocess coverage data was silently lost, meaning SonarCloud never saw the E2E execution coverageparallel = trueto coverage config so parent and child processes write to separate.coverage.<pid>files instead of colliding on a single.coveragefileCOVERAGE_PROCESS_STARTpath from relative to absolute usingGITHUB_WORKSPACEto ensure the subprocess finds the config regardless of working directoryEvidence: Metabase (zero unit tests, only E2E) shows 53.6% on SonarCloud with
client.pyat 17.2%. Inspection of.coverage.metabaseconfirms only import-time + in-processsetUpClasslines are present — method bodies executed in the subprocess (API calls, response parsing) have zero coverage data.Expected impact: All connectors in the E2E matrix should see significant SonarCloud coverage increases once subprocess execution data is captured (e.g., Metabase
client.pyfrom 17% to ~70%+).Test plan
debug: falsecoverage reportoutput in CI logs — connector files should show higher coverage than before.coverage.*glob picks up multiple parallel data files duringcoverage combine🤖 Generated with Claude Code