Skip to content

[SYCL] Detect ze call leaking in E2E tests #19710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: sycl
Choose a base branch
from

Conversation

jiezzhang
Copy link
Contributor

After #19328, UR_L0_LEAKS_DEBUG stop throwing exceptions when leaks are detected so LIT can't report failures. Add a leak checking in format.py to keep "--param ur_l0_leaks_debug=1" work as before.

@jiezzhang jiezzhang requested a review from a team as a code owner August 5, 2025 05:44
@jiezzhang jiezzhang requested a review from cperkinsintel August 5, 2025 05:44
def check_leak(output):
keyword_found = False
for line in output.splitlines():
if keyword_found and "LEAK" in line:
Copy link
Contributor

@cperkinsintel cperkinsintel Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, we already have a lit var %{l0_leak_check} which a bunch of the tests still use (instead of UR_L0_LEAKS_DEBUG ). It gets replaced by the env var in the final invocation, but I'm not sure if this python script here will detect.

I think that l0_leak_check directive is no longer needed and probably could be replaced by the actual env var. It's in hundreds of tests right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tests that set the env var (either directly or by using %{l0_leak_check}) already check for absence of "LEAK" (--implicit-check-not=LEAK) so we don't have to check it again here. As far as I understand this is only needed when the user decides to run all the tests with leak checking (e.g. passing --param ur_l0_leaks_debug=1 to lit).

@jiezzhang
Copy link
Contributor Author

Pretty sure the failed test is not caused by this PR. @cperkinsintel is it a known flaky issue?
SYCL :: Reduction/reduction_internal_nd_range_1dim.cpp

@igchor
Copy link
Member

igchor commented Aug 14, 2025

Pretty sure the failed test is not caused by this PR. @cperkinsintel is it a known flaky issue? SYCL :: Reduction/reduction_internal_nd_range_1dim.cpp

I don't know if it;s a known issue but it seems unrelated to the PR

@igchor
Copy link
Member

igchor commented Aug 14, 2025

@intel/llvm-reviewers-runtime could you please take a look the PR?

@aelovikov-intel
Copy link
Contributor

After #19328, UR_L0_LEAKS_DEBUG stop throwing exceptions

Why does "unification" cause that?

@igchor
Copy link
Member

igchor commented Aug 14, 2025

After #19328, UR_L0_LEAKS_DEBUG stop throwing exceptions

Why does "unification" cause that?

The leak checking is now happening in the loader, during loader teardown, and so there is no place for us to throw the exception from anymore.

@aelovikov-intel
Copy link
Contributor

The leak checking is now happening in the loader, during loader teardown, and so there is no place for us to throw the exception from anymore.

Because it's C and not C++? Can it abort?

@igchor
Copy link
Member

igchor commented Aug 14, 2025

The leak checking is now happening in the loader, during loader teardown, and so there is no place for us to throw the exception from anymore.

Because it's C and not C++? Can it abort?

Mostly because it's done in the library destructor which means we don't have an entry point from which we could return an error (we had urAdapterTeardown when leak checking was done in UR).

We could abort() but leaks are not really a critical failure so I think just parsing the output in tests is a better option.

@aelovikov-intel
Copy link
Contributor

The leak checking is now happening in the loader, during loader teardown, and so there is no place for us to throw the exception from anymore.

Because it's C and not C++? Can it abort?

Mostly because it's done in the library destructor which means we don't have an entry point from which we could return an error (we had urAdapterTeardown when leak checking was done in UR).

We could abort() but leaks are not really a critical failure so I think just parsing the output in tests is a better option.

Can we have one more env variable control to request that abort? I think that would still be much better than parsing output (that might be redirected and not available for parsing).

@igchor
Copy link
Member

igchor commented Aug 14, 2025

The leak checking is now happening in the loader, during loader teardown, and so there is no place for us to throw the exception from anymore.

Because it's C and not C++? Can it abort?

Mostly because it's done in the library destructor which means we don't have an entry point from which we could return an error (we had urAdapterTeardown when leak checking was done in UR).
We could abort() but leaks are not really a critical failure so I think just parsing the output in tests is a better option.

Can we have one more env variable control to request that abort? I think that would still be much better than parsing output (that might be redirected and not available for parsing).

@nrspruit What do you think about calling abort? Do you think there is any other way to report the leaks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants