Fix flaky tracing test in cudf-polars#22012
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test 9f9cbe2 |
|
/ok to test 74e2bab |
|
/ok to test 944e962 |
This test was failing under some conditions. At a minimum, this previously hit an error ``` pytest --executor in-memory \ python/cudf_polars/tests/test_tracing.py::test_import_without_structlog \ python/cudf_polars/tests/test_scan.py::test_scan[csv-no_row_index-all_rows-all-no_mask-no_slice] ``` consistent with the error we occasionaly saw in CI. By avoiding monkeypatching, we seem to avoid the issue with our singledispatch implementation not being found.
944e962 to
4b66b6f
Compare
|
I dug into this issue a bit further and made sense of what's happening. tl;dr I recommended the subprocess solution and I stand by it now, I think it's the only really safe choice here; monkeypatching The best way to see the problem is by applying this diff to the problematic test: ❯ git diff
diff --git a/python/cudf_polars/tests/test_tracing.py b/python/cudf_polars/tests/test_tracing.py
index dff1c61e94..09237ee6b5 100644
--- a/python/cudf_polars/tests/test_tracing.py
+++ b/python/cudf_polars/tests/test_tracing.py
@@ -64,6 +64,8 @@ def test_trace_basic(
def test_import_without_structlog(monkeypatch: pytest.MonkeyPatch) -> None:
modules = list(sys.modules)
+ original_keys = set(sys.modules.keys())
+ print("The original len is ", len(modules))
for module in modules:
if module.startswith("cudf_polars"):
monkeypatch.delitem(sys.modules, module)
@@ -77,6 +79,11 @@ def test_import_without_structlog(monkeypatch: pytest.MonkeyPatch) -> None:
q = pl.DataFrame({"a": [1, 2, 3]}).lazy().select(pl.col("a").sum())
q.collect(engine="gpu")
+ print("The length before undo is ", len(sys.modules))
+ monkeypatch.undo()
+ print("The length after undo is ", len(sys.modules))
+ print("The new modules are ", set(sys.modules.keys()) - original_keys)
+When this runs, you see something like python/cudf_polars/tests/test_tracing.py .The original len is 1085
The length before undo is 1101
The length after undo is 1103
The new modules are {'cudf_polars.experimental.parallel', 'cudf_polars.experimental.utils', 'cudf_polars.experimental.scheduler', 'cudf_polars.experimental.groupby', 'cudf_polars.experimental.dispatch', 'cudf_polars.experimental.shuffle', 'cudf_polars.experimental.repartition', '_statistics', 'cudf_polars.experimental.base', 'cudf_polars.experimental', 'cudf_polars.experimental.expressions', 'cudf_polars.experimental.join', 'cudf_polars.experimental.statistics', 'statistics', 'cudf_polars.experimental.distinct', 'cudf_polars.experimental.sort', 'cudf_polars.experimental.io', 'cudf_polars.experimental.select'}Here's what's happening in this example. The After the test completes, we therefore wind up in a mixed state. We have all of the original non-experimental # Before the test
original_ir = sys.modules['cudf_polars.dsl.ir']
...
# In the middle of the structlog test after we delete modules and reimport
ir_in_structlog_test = sys.modules['cudf_polars.dsl.ir']
evaluate_streaming_in_structlog_test = sys.modules['cudf_polars.experimental.parallel'].evaluate_streaming
lower_ir_graph_in_structlog_test = sys.modules['cudf_polars.experimental.dispatch'].lower_ir_node
...
# After monkeypatch cleans up
ir_after_cleanup = sys.modules['cudf_polars.dsl.ir'] # Note: `ir_after_cleanup is original_ir`
evaluate_streaming_after_cleanup = sys.modules['cudf_polars.experimental.parallel'].evaluate_streaming
lower_ir_graph_after_cleanup = sys.modules['cudf_polars.experimental.dispatch'].lower_ir_nodeThen, when a subsequent test calls return evaluate_streaming( # This is `evaluate_streaming_in_structlog_test`, not `evaluate_streaming_after_cleanup`
ir, # BUT this is `ir_after_cleanup`, not `ir_in_structlog_test`
config_options
)Therefore, the |
| def test_import_without_structlog() -> None: | ||
| code = textwrap.dedent("""\ | ||
| import sys | ||
| sys.modules["structlog"] = None |
There was a problem hiding this comment.
Is this necessary? Will structlog get imported by default somehow if it is installed?
There was a problem hiding this comment.
We have to fake that structlog is not available so that cudf_polars.dsl.tracing._HAS_STRUCTLOG will return false, I suspect.
A more principled way to do this would be to install a module finder hook to raise import error when trying to find structlog, but that seems like too much effort.
|
/merge |
This test was failing under some conditions. At a minimum, this previously hit an error ``` pytest --executor in-memory \ python/cudf_polars/tests/test_tracing.py::test_import_without_structlog \ python/cudf_polars/tests/test_scan.py::test_scan[csv-no_row_index-all_rows-all-no_mask-no_slice] ``` consistent with the error we occasionaly saw in CI. By avoiding monkeypatching, we seem to avoid the issue with our singledispatch implementation not being found. Authors: - Tom Augspurger (https://github.com/TomAugspurger) Approvers: - Matthew Murray (https://github.com/Matt711) - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#22012
This test was failing under some conditions. At a minimum, this
previously hit an error
consistent with the error we occasionaly saw in CI.
By avoiding monkeypatching, we seem to avoid the issue with our singledispatch implementation not being found.