Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST, BLD: Fix failing aarch64 wheel builds. #22418

Merged
merged 1 commit into from Oct 11, 2022

Conversation

charris
Copy link
Member

@charris charris commented Oct 10, 2022

The aarch64 wheel build tests are failing with OOM. The new test for complex128 dot for huge vectors is responsible as the usable memory is incorrectly determined and the check for sufficient memory fails. The fix here is to set NPY_AVAILABLE_MEM="4 GB" in the environment before calling the test in cibw_test_command.sh.

@charris
Copy link
Member Author

charris commented Oct 10, 2022

Hmm, the aarch64 wheel builds are reported as passing, but they are actually failing.

@mattip
Copy link
Member

mattip commented Oct 10, 2022

Which test? Does it specify a skipif(memory requirement)?

@charris
Copy link
Member Author

charris commented Oct 10, 2022

Which test?

Trying to figure that out. The one I currently suspect does use memory requirement, but I am not sure of it yet. For some reason sys.exit(not numpy.test('full')) reports success following OOM, but that is a separate problem.

@charris
Copy link
Member Author

charris commented Oct 10, 2022

Looks like test_huge_vectordot. If I unconditionally skip it, the tests run to completion.

    @pytest.mark.slow
    @pytest.mark.parametrize("dtype", [np.float64, np.complex128])
    @requires_memory(free_bytes=18e9)  # complex case needs 18GiB+
    def test_huge_vectordot(self, dtype):
        # Large vector multiplications are chunked with 32bit BLAS
        # Test that the chunking does the right thing, see also gh-22262
        pytest.skip('testing')
        data = np.ones(2**30+100, dtype=dtype)
        res = np.dot(data, data)
        assert res == 2**30+100

May be a problem with memory detection on aarch64.

@seberg
Copy link
Member

seberg commented Oct 10, 2022

Hmmm, agree that it looks like I estimated it for 32bit floats and then also added 64bit or so. But strange that it requires the unconditional skip. OTOH not sure it is worth to dig deep, it seems also likely to me that the available memory detection is simply not be reliable on aarch.

@charris
Copy link
Member Author

charris commented Oct 10, 2022

@seberg Your estimate was low, but even with that fixed it still goes OOM. For the 1.23.4 release I will probably skip it unconditionally, but I want to fool with it a bit on main to see what is going wrong.

@charris charris added 09 - Backport-Candidate PRs tagged should be backported and removed 25 - WIP labels Oct 10, 2022
@charris
Copy link
Member Author

charris commented Oct 11, 2022

OK, here is the problem:

mem is 129400066048, platform is aarch64

That's 120 GB. I wonder if we should try getting the allowed memory

@mattip
Copy link
Member

mattip commented Oct 11, 2022

I could not find the machine information in the travis documentation in the ten minutes I tried. Would it be easy to add free -h to the bash output somewhere?

@charris
Copy link
Member Author

charris commented Oct 11, 2022

@mattip I think the machine the test is running on does have 120 GB of memory, but the memory allowed to the process is much less. I also wonder why we don't get memory_error instead of OOM. There is a resource module in the standard library that might help, but on my machine it simply returns (-1, -1) for data memory, which apparently means infinite :) Another option would be to set the NPY_AVAILABLE_MEM environment variable in cibcibw_test_command.sh before the test is run. That might be a better way to do this.

@charris
Copy link
Member Author

charris commented Oct 11, 2022

Trying the environment variable approach.

The aarch64 wheel build tests are failing with OOM. The new test for
complex128 dot for huge vectors is responsible as the useable memory
is incorrectly determined and the check for sufficient memory fails.
The fix here is to define the `NPY_AVAILABLE_MEM="4 GB"` environment
variable before the test call in `cibw_test_command.sh`.
@charris
Copy link
Member Author

charris commented Oct 11, 2022

Self merging so I can backport and do the 1.23.4 release.

@charris charris merged commit 241c905 into numpy:main Oct 11, 2022
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Oct 11, 2022
@charris charris deleted the fix-memory-check branch October 11, 2022 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants