Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory allocation failed error when running 'cargo test' #343

Closed
joshwilding4444 opened this issue Aug 31, 2020 · 14 comments
Closed

Memory allocation failed error when running 'cargo test' #343

joshwilding4444 opened this issue Aug 31, 2020 · 14 comments

Comments

@joshwilding4444
Copy link

joshwilding4444 commented Aug 31, 2020

When running the current master branch as of Aug 31, 2020, when I run the tests using:
$ cargo test

I receive the following error:

... other successful tests ...
test stats::pairhmm::homopolypairhmm::tests::test_interleave_gaps_y ... ok
test stats::pairhmm::homopolypairhmm::tests::test_gap_x ... ok
memory allocation of 2147483640 bytes failederror: test failed, to rerun pass '--lib'

Caused by:
  process didn't exit successfully: /path/to/rust/bio/rust-bio/target/debug/deps/bio-cf8b33efbb5080cc (signal: 6, SIGABRT: process abort signal)

The failure to allocate memory happens at the test_interleave_gaps_y or at test_interleave_gaps_x within stats::pairhmm::homopolypairhmm::tests. This same error happens if I run all tests or if I just run the tests in stats.

Has anyone else experienced this problem when trying to run tests for the latest build?

@vsoch
Copy link

vsoch commented Aug 31, 2020

I made it much further, but failed on another test (Ubuntu 16.04)

...
test stats::pairhmm::homopolypairhmm::tests::test_hompolymer_run_in_x ... ok
test stats::pairhmm::homopolypairhmm::tests::test_hompolymer_run_in_y ... ok
test stats::pairhmm::homopolypairhmm::tests::impossible_global_alignment ... ok
test stats::pairhmm::homopolypairhmm::tests::test_gap_x_2 ... ok
test stats::pairhmm::homopolypairhmm::tests::test_interleave_gaps_x ... ok
test stats::pairhmm::homopolypairhmm::tests::test_interleave_gaps_y ... ok
test stats::pairhmm::homopolypairhmm::tests::test_phmm_vs_phhmm ... ok
test stats::pairhmm::homopolypairhmm::tests::test_gap_y ... ok
test stats::pairhmm::pairhmm::tests::impossible_global_alignment ... ok
test stats::pairhmm::pairhmm::tests::test_gap_x ... ok
test stats::pairhmm::homopolypairhmm::tests::test_gap_x ... ok
test stats::pairhmm::pairhmm::tests::test_gap_y ... ok
test stats::pairhmm::pairhmm::tests::test_interleave_gaps_y ... ok
test stats::pairhmm::homopolypairhmm::tests::test_mismatch ... ok
test stats::probs::cdf::test::test_cdf ... ok
test stats::pairhmm::pairhmm::tests::test_interleave_gaps_x ... ok
test stats::pairhmm::pairhmm::tests::test_same ... ok
test stats::pairhmm::pairhmm::tests::test_mismatch ... ok
test stats::probs::tests::test_cap_numerical_overshoot ... ok
test stats::probs::tests::test_cumsum ... ok
test stats::probs::tests::test_cap_numerical_overshoot_panic ... ok
test stats::probs::tests::test_empty_sum ... ok
test stats::probs::tests::test_simpsons_integrate ... ok
test stats::probs::tests::test_sub ... ok
test stats::probs::tests::test_sum_one_zero ... ok
test stats::probs::tests::test_trapezoidal_integrate ... ok
test stats::probs::tests::test_zero ... ok
test stats::probs::tests::test_sum ... ok
test utils::interval::tests::negative_width_range ... ok
test utils::interval::tests::range_interval_conversions ... ok
test utils::tests::test_prescan ... ok
test utils::tests::test_scan ... ok
test utils::text::tests::test_print_sequence ... ok
test utils::fastexp::tests::test_fastexp ... ok
test stats::probs::tests::test_one_minus ... ok
test stats::pairhmm::homopolypairhmm::tests::test_same ... ok
test stats::pairhmm::pairhmm::tests::test_banded ... ok
test stats::pairhmm::homopolypairhmm::tests::test_banded ... ok
error: test failed, to rerun pass '--lib'

Caused by:
  process didn't exit successfully: `/home/vanessa/Desktop/Code/rust-bio/target/debug/deps/bio-364865763329b4a8` (signal: 9, SIGKILL: kill)

@joshwilding4444
Copy link
Author

joshwilding4444 commented Aug 31, 2020

Interesting, @vsoch . My system is running Linux Mint 20 and has 8GB RAM. Running top shows that there's some memory available. What are the specifications for your machine?

@vsoch
Copy link

vsoch commented Aug 31, 2020

image

But I have a gazillion things running and open, so probably it isn't all available!

@joshwilding4444
Copy link
Author

joshwilding4444 commented Sep 1, 2020

Ok, both of our machines should have enough memory to run a few tests. I think that there is an issue with memory allocation somewhere, but I'm not sure exactly where. If you have time, @vsoch you could try running the tests without other major programs running, just to see if that makes a difference. I think that there will still be an allocation issue somewhere along the line.

I understand that memory management in Rust primarily depends on the current scope, so I don't know exactly why the tests are trying to allocate so much memory at once. I will take a look at the test suite later today to see what I can find. Does anyone else know why these tests could be running into these memory issues?

@joshwilding4444
Copy link
Author

When rerunning the tests, I find that every time the tests fail, the test fails to allocate the same amount of memory, 2147483640 bytes, every time it runs, even though the test output will stop at different points. For me, the test will always fail right after either test_interleave_gaps_x or test_interleave_gaps_y. Looking at the status of top in another window shows that the program will go up to 4g worth of memory, then stop.

@Daniel-Liu-c0deb0t
Copy link
Contributor

Is it possible to narrow down the location of the error and check if it still happens when test_interleave_gaps_x/y is ran by itself with cargo test test_interleave_gaps_x?

@vsoch
Copy link

vsoch commented Sep 3, 2020

Running by themselves:

$ cargo test test_interleave_gaps_x
    Finished test [unoptimized + debuginfo] target(s) in 0.20s
     Running target/debug/deps/bio-364865763329b4a8

running 2 tests
test stats::pairhmm::pairhmm::tests::test_interleave_gaps_x ... ok
test stats::pairhmm::homopolypairhmm::tests::test_interleave_gaps_x ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 343 filtered out

     Running target/debug/deps/mod-2eda39cc1fbd9701

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out

and

$ cargo test test_interleave_gaps_y
    Finished test [unoptimized + debuginfo] target(s) in 0.10s
     Running target/debug/deps/bio-364865763329b4a8

running 2 tests
test stats::pairhmm::pairhmm::tests::test_interleave_gaps_y ... ok
test stats::pairhmm::homopolypairhmm::tests::test_interleave_gaps_y ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 343 filtered out

     Running target/debug/deps/mod-2eda39cc1fbd9701

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out

Ok - I just tested running tests with my chrome open (failed at same point) and closed (all finished successfully), but that doesn't tell us any new information. :)

@vsoch
Copy link

vsoch commented Sep 3, 2020

Hmm, when I set the number of jobs (CPU) to use, it started working for me, and now I can't get it to fail again. This worked for me (success with chrome open)

$ cargo test --jobs 8

and my machine has

$ nproc
8

I tested from 2 to 8. And now the regular test command is no longer failing either!

@joshwilding4444
Copy link
Author

@vsoch I am now experiencing something similar. When I run the tests using:
$ cargo test --jobs 8
while I have other programs open, the tests fail in the same way, but at a slightly later point. When I run the tests without other programs open, the tests pass just fine, and now when running:
$ cargo test
or
$ cargo test --jobs 1
the tests all pass. Maybe there is a problem with how the tests are optimized or compiled?

@joshwilding4444
Copy link
Author

When running the tests again with several other programs open, the tests will fail again as before. This is true even when running:
$ cargo test --jobs 8

@vsoch did you happen to try testing with a bunch of other programs open? @Daniel-Liu-c0deb0t what does running the tests look like on your machine?

@vsoch
Copy link

vsoch commented Sep 3, 2020

My initial failure had two firefox windows (with many tabs open) plus Audacity, later I just had the browsers. Now I just opened audacity again and it's still working! lol. The bug that got away...

@Daniel-Liu-c0deb0t
Copy link
Contributor

Daniel-Liu-c0deb0t commented Sep 3, 2020

For me, all tests pass with just cargo test. I do have more than enough memory (16GB) on this computer, though.

I don't think compilation or optimization is the issue here. I guess if you want to check, you can use cargo clean && cargo test to see if rebuilding everything changes the results? Or maybe you could test cargo test --release, to get it to use a higher optimization level? For me, all tests pass even when I use those commands or use multiple jobs.

I think the real problem here is why the test is trying to allocate ~2GB of memory, as @joshwilding4444 mentioned. I see that the memory usage spikes up to ~5GB during the tests. I went through a few possibilities based on when the memory usage spiked, and believe the stats::pairhmm::homopolypairhmm tests are the culprit here. I ran a few of the tests individually, and they all allocate around 2GB memory each. Additionally, those tests take noticeably longer to run. Interestingly, this does not happen with the stats::pairhmm::pairhmm tests. IIRC from the code (I'm not very familiar with it), the tests for those should be very similar, so I believe there is a bug somewhere in the algorithm causing the large memory allocations. These tests most likely should not need 2GB of memory.

@tedil
Copy link
Member

tedil commented Sep 4, 2020

I think the real problem here is why the test is trying to allocate ~2GB of memory, as @joshwilding4444 mentioned. I see that the memory usage spikes up to ~5GB during the tests. I went through a few possibilities based on when the memory usage spiked, and believe the stats::pairhmm::homopolypairhmm tests are the culprit here. I ran a few of the tests individually, and they all allocate around 2GB memory each. Additionally, those tests take noticeably longer to run. Interestingly, this does not happen with the stats::pairhmm::pairhmm tests. IIRC from the code (I'm not very familiar with it), the tests for those should be very similar, so I believe there is a bug somewhere in the algorithm causing the large memory allocations. These tests most likely should not need 2GB of memory.

It's neither the tests fault nor a bug in the algorithm. The problem here is that a transition table is pre-allocated (and pre-computed) with 2147483640 entries of size 8, i.e. 268435455 * 8 = 2GiB. So it really does need the 2GiB of memory. (The HomopolyPairHMM has way more states and transitions than the traditional PairHMM, which is why there's the memory allocation difference between the two.)

An easy "fix": since most transitions are 0 anyways, use a sparse datastructure (which supports indexing) instead.
Or just enumerate every possible transition and have a dense transition table. I guess there are a lot of options to tackle this problem, I'd welcome any suggestions ;)

Ultimately, it's a problem of striking a balance between memory consumption and cpu time, I guess.

Edit: Only ~80 or so transitions are actually used, so, whoops, 268435455 is a bit overkill 😆

tedil added a commit that referenced this issue Sep 4, 2020
@Daniel-Liu-c0deb0t
Copy link
Contributor

Ah, I see. Glad to see this fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants