Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory when mapping new accounts append vec #5432

Closed
sakridge opened this issue Aug 6, 2019 · 6 comments

Comments

@sakridge
Copy link
Member

commented Aug 6, 2019

Problem

[2019-08-06T15:51:48.892717417Z INFO  solana::replay_stage] new fork:6494 parent:6427
thread 'solana-replay-stage' panicked at 'failed to map the data file: Os { code: 12, kind: Other, message: "Cannot allocate memory" }', src/libcore/result.rs:999:5

Overall memory usage is low:

              total        used        free      shared  buff/cache   available
Mem:           187G        1.9G        182G         41M        3.0G        184G

Proposed Solution

Believe it might be vm_max_map_count being hit.

One workaround could be increasing the map count which can help the issue:
sysctl -w vm.max_map_count=262144

Otherwise append_vecs may need to be unmapped or cleaned more aggressively when they are not used.

@mvines mvines added this to To do in TdS Stage 0 via automation Aug 6, 2019

@mvines mvines added this to the Mavericks v0.18.0 milestone Aug 6, 2019

@mvines

This comment has been minimized.

Copy link
Member

commented Aug 6, 2019

thread 'solana-replay-stage' panicked at 'failed to map the data file: Os { code: 12, kind: Other, message: "Cannot allocate memory" }', src/libcore/result.rs:999:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:39
   1: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:71
   2: std::panicking::default_hook::{{closure}}
             at src/libstd/sys_common/backtrace.rs:59
             at src/libstd/panicking.rs:197
   3: std::panicking::default_hook
             at src/libstd/panicking.rs:211
   4: solana_metrics::metrics::set_panic_hook::{{closure}}::{{closure}}
   5: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:478
   6: std::panicking::continue_panic_fmt
             at src/libstd/panicking.rs:381
   7: rust_begin_unwind
             at src/libstd/panicking.rs:308
   8: core::panicking::panic_fmt
             at src/libcore/panicking.rs:85
   9: core::result::unwrap_failed
  10: solana_runtime::append_vec::AppendVec::new
  11: solana_runtime::accounts_db::AccountStorageEntry::new
  12: solana_runtime::accounts_db::AccountsDB::create_store
  13: solana_runtime::accounts_db::AccountsDB::create_and_insert_store
  14: solana_runtime::accounts_db::AccountsDB::store
  15: solana_runtime::accounts::Accounts::store_slow
  16: solana_runtime::bank::Bank::store_account
  17: solana_runtime::bank::Bank::update_clock
  18: solana_runtime::bank::Bank::init_from_parent
  19: solana_runtime::bank::Bank::new_from_parent
  20: solana::replay_stage::ReplayStage::generate_new_bank_forks
@sakridge

This comment has been minimized.

Copy link
Member Author

commented Aug 7, 2019

Just letting a validator run shows relative constant number of append vec mappings:
?[2019-08-07T01:30:28.800041368Z INFO solana_runtime::accounts] total: 640 min_fork: 0 max_fork: 1360

matches what pmap says:
sakridge@sagan:~$ pmap -x 9970 | grep 4096 | wc 656 3936 37862

@sakridge

This comment has been minimized.

Copy link
Member Author

commented Aug 7, 2019

One other mode I saw this in is running the test_local_cluster_solana on a 96 thread GCE machine. May have to try that again.

@mvines

This comment has been minimized.

Copy link
Member

commented Aug 7, 2019

I wonder if it's related to a validator that stops making roots

@sakridge

This comment has been minimized.

Copy link
Member Author

commented Aug 7, 2019

@mvines could be, do you know of any full logs that have this crash and all the bank forking info as well?

@mvines

This comment has been minimized.

Copy link
Member

commented Aug 7, 2019

The bootstrap leader from the TdS internal dry run reproduced this actually:
bs.log.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
2 participants
You can’t perform that action at this time.