Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testnet crash under load #3692

Closed
2 of 28 tasks
damip opened this issue Mar 19, 2023 · 17 comments
Closed
2 of 28 tasks

Testnet crash under load #3692

damip opened this issue Mar 19, 2023 · 17 comments

Comments

@damip
Copy link
Member

damip commented Mar 19, 2023

  • All nodes crashed simultaneously with OOM
  • Testnet1 was in debug mode but for some reason we lost all logs after 2023-03-13T00:00:01.249244Z (including all info from system logs, dmesg etc...), probably a problem with the logrotate. Can't find when the crash happened and no info about deadlock detection
  • Testnet 2: last entry at 2023-03-19T19:46:53.928940Z (log level info, deadlock detection on) => OOM, no deadlock
  • Testnet 3: last entry at 2023-03-19T19:41:52.370952Z (log level info, deadlock detection on) => OOM, no deadlock
  • Testnet 4: last entry at 2023-03-19T19:33:38.922144Z (log level info, deadlock detection on) => OOM, no deadlock

Job-log

Findings so far

  • Memory did not increase linearly but exploded suddenly, like the other time TODO: refer to "other time" here
  • Deadlock detection was running, but...
    • No deadlocks detected in logs

Open questions

  • If there was a deadlock in the logging system, would that have been detected AND logged?
  • All nodes crashed, not only nodes that bootstrapped from others or were bootstrapping others? It's probably not a problem with bootstrap
  • All nodes crashed, not only nodes that had API calls => it's probably not an API problem
  • @modship on testnet01 log stop at 00h00 , certainly problem with logrotate
    • can we correct it and make sure 100% it is corrected and works?

Answered questions:

  • was the testnet compiled with block/operation API streaming enabled?
  • @aoudiamoncef : "Nope, and there is no production of messages if not enabled. My node always have streaming enabled" link

PRs included in testnet 20

Less-likelies?

Data

Logs of servers 1,2,3,4

OOM message from testnet3:

[Sun Mar 19 19:48:12 2023] sshd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[Sun Mar 19 19:48:12 2023] CPU: 6 PID: 1947069 Comm: sshd Not tainted 5.4.0-89-generic #100-Ubuntu
[Sun Mar 19 19:48:12 2023] Hardware name: ASUS All Series/H97M-PLUS, BIOS 3602 04/08/2018
[Sun Mar 19 19:48:12 2023] Call Trace:
[Sun Mar 19 19:48:12 2023]  dump_stack+0x6d/0x8b
[Sun Mar 19 19:48:12 2023]  dump_header+0x4f/0x1eb
[Sun Mar 19 19:48:12 2023]  oom_kill_process.cold+0xb/0x10
[Sun Mar 19 19:48:12 2023]  out_of_memory.part.0+0x1df/0x3d0
[Sun Mar 19 19:48:12 2023]  out_of_memory+0x6d/0xd0
[Sun Mar 19 19:48:12 2023]  __alloc_pages_slowpath+0xd5e/0xe50
[Sun Mar 19 19:48:12 2023]  __alloc_pages_nodemask+0x2d0/0x320
[Sun Mar 19 19:48:12 2023]  alloc_pages_current+0x87/0xe0
[Sun Mar 19 19:48:12 2023]  __page_cache_alloc+0x72/0x90
[Sun Mar 19 19:48:12 2023]  pagecache_get_page+0xbf/0x300
[Sun Mar 19 19:48:12 2023]  filemap_fault+0x6b2/0xa50
[Sun Mar 19 19:48:12 2023]  ? unlock_page_memcg+0x12/0x20
[Sun Mar 19 19:48:12 2023]  ? page_add_file_rmap+0xff/0x1a0
[Sun Mar 19 19:48:12 2023]  ? filemap_map_pages+0x24c/0x380
[Sun Mar 19 19:48:12 2023]  ext4_filemap_fault+0x32/0x50
[Sun Mar 19 19:48:12 2023]  __do_fault+0x3c/0x130
[Sun Mar 19 19:48:12 2023]  do_fault+0x24b/0x640
[Sun Mar 19 19:48:12 2023]  ? generic_file_read_iter+0xdc/0x140
[Sun Mar 19 19:48:12 2023]  __handle_mm_fault+0x4c5/0x7a0
[Sun Mar 19 19:48:12 2023]  handle_mm_fault+0xca/0x200
[Sun Mar 19 19:48:12 2023]  do_user_addr_fault+0x1f9/0x450
[Sun Mar 19 19:48:12 2023]  __do_page_fault+0x58/0x90
[Sun Mar 19 19:48:12 2023]  do_page_fault+0x2c/0xe0
[Sun Mar 19 19:48:12 2023]  page_fault+0x34/0x40
[Sun Mar 19 19:48:12 2023] RIP: 0033:0x564fc9e96f60
[Sun Mar 19 19:48:12 2023] Code: Bad RIP value.
[Sun Mar 19 19:48:12 2023] RSP: 002b:00007ffe5d3afc18 EFLAGS: 00010246
[Sun Mar 19 19:48:12 2023] RAX: 0000000000000001 RBX: 00007ffe5d3afdd0 RCX: 0000564fcb071012
[Sun Mar 19 19:48:12 2023] RDX: 0000000000000004 RSI: 0000564fcb0965c0 RDI: 0000564fcb098660
[Sun Mar 19 19:48:12 2023] RBP: 0000564fcb098660 R08: 0000564fcb098d50 R09: 0000000000000004
[Sun Mar 19 19:48:12 2023] R10: 0000000000000001 R11: ac89634381d7eeff R12: 0000000000000000
[Sun Mar 19 19:48:12 2023] R13: 0000564fcb094290 R14: 0000564fcb0954b0 R15: 0000000000000000
[Sun Mar 19 19:48:12 2023] Mem-Info:
[Sun Mar 19 19:48:12 2023] active_anon:3588266 inactive_anon:336364 isolated_anon:0
                            active_file:524 inactive_file:414 isolated_file:0
                            unevictable:4634 dirty:0 writeback:0 unstable:0
                            slab_reclaimable:23595 slab_unreclaimable:51814
                            mapped:3152 shmem:23 pagetables:9216 bounce:0
                            free:33061 free_pcp:2310 free_cma:0
[Sun Mar 19 19:48:12 2023] Node 0 active_anon:14353064kB inactive_anon:1345456kB active_file:2096kB inactive_file:1656kB unevictable:18536kB isolated(anon):0kB isolated(file):0kB mapped:12608kB dirty:0kB writeback:0kB shmem:92kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[Sun Mar 19 19:48:12 2023] Node 0 DMA free:15900kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15900kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Sun Mar 19 19:48:12 2023] lowmem_reserve[]: 0 3358 15841 15841 15841
[Sun Mar 19 19:48:12 2023] Node 0 DMA32 free:63204kB min:14312kB low:17888kB high:21464kB active_anon:3318688kB inactive_anon:2020kB active_file:108kB inactive_file:748kB unevictable:7816kB writepending:0kB present:3531356kB managed:3465820kB mlocked:7816kB kernel_stack:256kB pagetables:6424kB bounce:0kB free_pcp:976kB local_pcp:128kB free_cma:0kB
[Sun Mar 19 19:48:12 2023] lowmem_reserve[]: 0 0 12483 12483 12483
[Sun Mar 19 19:48:12 2023] Node 0 Normal free:53140kB min:53204kB low:66504kB high:79804kB active_anon:11034720kB inactive_anon:1343176kB active_file:2740kB inactive_file:2788kB unevictable:10720kB writepending:0kB present:13105152kB managed:12791180kB mlocked:10720kB kernel_stack:5552kB pagetables:30440kB bounce:0kB free_pcp:8264kB local_pcp:1336kB free_cma:0kB
[Sun Mar 19 19:48:12 2023] lowmem_reserve[]: 0 0 0 0 0
[Sun Mar 19 19:48:12 2023] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB
[Sun Mar 19 19:48:12 2023] Node 0 DMA32: 564*4kB (UMEH) 466*8kB (UMEH) 621*16kB (UMEH) 604*32kB (UMEH) 238*64kB (UMEH) 72*128kB (UMEH) 12*256kB (UMH) 2*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 63792kB
[Sun Mar 19 19:48:12 2023] Node 0 Normal: 411*4kB (UME) 2028*8kB (UMEH) 1125*16kB (UMEH) 445*32kB (UMEH) 47*64kB (UMH) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 53116kB
[Sun Mar 19 19:48:12 2023] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Sun Mar 19 19:48:12 2023] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Sun Mar 19 19:48:12 2023] 3097 total pagecache pages
[Sun Mar 19 19:48:12 2023] 162 pages in swap cache
[Sun Mar 19 19:48:12 2023] Swap cache stats: add 2060852, delete 2061262, find 92689463/92885052
[Sun Mar 19 19:48:12 2023] Free swap  = 0kB
[Sun Mar 19 19:48:12 2023] Total swap = 524284kB
[Sun Mar 19 19:48:12 2023] 4163123 pages RAM
[Sun Mar 19 19:48:12 2023] 0 pages HighMem/MovableOnly
[Sun Mar 19 19:48:12 2023] 94898 pages reserved
[Sun Mar 19 19:48:12 2023] 0 pages cma reserved
[Sun Mar 19 19:48:12 2023] 0 pages hwpoisoned
[Sun Mar 19 19:48:12 2023] Tasks state (memory values in pages):
[Sun Mar 19 19:48:12 2023] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[Sun Mar 19 19:48:12 2023] [    760]     0   760     2134       16    53248       53             0 cron
[Sun Mar 19 19:48:12 2023] [    762]   103   762     2008      112    49152      323          -900 dbus-daemon
[Sun Mar 19 19:48:12 2023] [    770]     0   770    20473       72    61440       64             0 irqbalance
[Sun Mar 19 19:48:12 2023] [    801]     0   801     4719        8    73728      754             0 systemd-logind
[Sun Mar 19 19:48:12 2023] [    823]     0   823      948        2    45056       42             0 atd
[Sun Mar 19 19:48:12 2023] [    837]     0   837     3044      150    61440      190         -1000 sshd
[Sun Mar 19 19:48:12 2023] [    954]     0   954     1838        0    53248       34             0 agetty
[Sun Mar 19 19:48:12 2023] [    968]     0   968     1457        0    49152       31             0 agetty
[Sun Mar 19 19:48:12 2023] [   1016]     0  1016    27028       66   110592     1946             0 unattended-upgr
[Sun Mar 19 19:48:12 2023] [  34504]  1001 34504     4718      123    81920      486             0 systemd
[Sun Mar 19 19:48:12 2023] [  34507]  1001 34507    42529        0   110592     1197             0 (sd-pam)
[Sun Mar 19 19:48:12 2023] [ 614357]     0 614357    60502        0   102400      572             0 accounts-daemon
[Sun Mar 19 19:48:12 2023] [ 755185]     0 755185     4683      100    69632      481             0 systemd
[Sun Mar 19 19:48:12 2023] [ 755187]     0 755187    42573        0   110592     1219             0 (sd-pam)
[Sun Mar 19 19:48:12 2023] [3592714]   107 3592714     2437        0    61440       54             0 uuidd
[Sun Mar 19 19:48:12 2023] [ 290370]     0 290370    59105        0    90112      320             0 polkitd
[Sun Mar 19 19:48:12 2023] [3748474]     0 3748474     7445        0    94208     2023             0 networkd-dispat
[Sun Mar 19 19:48:12 2023] [1232477]  1001 1232477     2422      388    49152      310             0 screen
[Sun Mar 19 19:48:12 2023] [1232478]  1001 1232478     2540      399    57344      399             0 bash
[Sun Mar 19 19:48:12 2023] [2006217]  1001 2006217     1813      387    53248      136             0 dbus-daemon
[Sun Mar 19 19:48:12 2023] [2531789]   104 2531789    56086      401    81920      469             0 rsyslogd
[Sun Mar 19 19:48:12 2023] [2020274]  1001 2020274     2253      377    49152      198             0 screen
[Sun Mar 19 19:48:12 2023] [2020275]  1001 2020275     2539      482    53248      455             0 bash
[Sun Mar 19 19:48:12 2023] [2020331]  1001 2020331   138604      408   135168      775             0 massa-client
[Sun Mar 19 19:48:12 2023] [2805618]     0 2805618     2154      409    61440       35             0 stats_net.sh
[Sun Mar 19 19:48:12 2023] [3650815]     0 3650815    70050     4500    94208        0         -1000 multipathd
[Sun Mar 19 19:48:12 2023] [ 453264]     0 453264      622      127    36864       15             0 none
[Sun Mar 19 19:48:12 2023] [3280698]     0 3280698   403567     2670   352256     2227          -900 snapd
[Sun Mar 19 19:48:12 2023] [ 668782]   100 668782     6645      558    77824      222             0 systemd-network
[Sun Mar 19 19:48:12 2023] [ 668797]   101 668797     5933      545    90112      954             0 systemd-resolve
[Sun Mar 19 19:48:12 2023] [ 668804]     0 668804    37446      609   311296      218          -250 systemd-journal
[Sun Mar 19 19:48:12 2023] [ 668899]   102 668899    22549      745    77824      193             0 systemd-timesyn
[Sun Mar 19 19:48:12 2023] [ 669907]     0 669907     4708      412    53248      263         -1000 systemd-udevd
[Sun Mar 19 19:48:12 2023] [ 901968]  1001 901968  4751671  3902221 33140736   112670             0 massa-node
[Sun Mar 19 19:48:12 2023] [ 941890]  1001 941890    53032     9799   176128     1137             0 python3
[Sun Mar 19 19:48:12 2023] [1946701]     0 1946701    58227     1050    77824        2             0 iftop
[Sun Mar 19 19:48:12 2023] [1946702]     0 1946702     1816      216    53248        6             0 tail
[Sun Mar 19 19:48:12 2023] [1946994]     0 1946994     3045      682    65536        3             0 sshd
[Sun Mar 19 19:48:12 2023] [1947000]     0 1947000     3045      739    65536        4             0 sshd
[Sun Mar 19 19:48:12 2023] [1947006]     0 1947006     3045      720    57344        6             0 sshd
[Sun Mar 19 19:48:12 2023] [1947013]     0 1947013     3045      712    69632        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947025]     0 1947025     3045      809    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947056]     0 1947056     3045      699    65536        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947057]     0 1947057     3045      834    57344        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947058]     0 1947058     3045      751    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947066]     0 1947066     3045      782    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947067]     0 1947067     3045      821    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947068]     0 1947068     3045      652    73728        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947069]     0 1947069     3045      883    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947070]     0 1947070     3045      809    65536        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947071]     0 1947071     3045      810    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947072]     0 1947072     3045      783    65536        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947073]     0 1947073     3045      818    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947074]     0 1947074     3045      717    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947075]     0 1947075     3045      713    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947076]     0 1947076     3045      520    65536        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947077]     0 1947077     3045      794    57344        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947078]     0 1947078     3045      794    69632        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947079]     0 1947079     3045      749    65536        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947080]     0 1947080     3045      816    65536        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947081]     0 1947081     3045      839    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947083]     0 1947083     3045      754    65536        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947084]     0 1947084     3045      728    57344        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947085]     0 1947085     3045      804    65536        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947086]     0 1947086     3045      739    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947104]     0 1947104     3045      753    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947125]   109 1947125     3045      498    65536        4             0 sshd
[Sun Mar 19 19:48:12 2023] [1947126]   109 1947126     3045      516    65536        3             0 sshd
[Sun Mar 19 19:48:12 2023] [1947153]   109 1947153     3045      490    57344        3             0 sshd
[Sun Mar 19 19:48:12 2023] [1947212]   109 1947212     3045      533    69632        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947213]   109 1947213     3045      608    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947214]   109 1947214     3045      227    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947215]   109 1947215     3045      227    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947216]   109 1947216     3045      226    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947217]   109 1947217     3045      228    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947219]   109 1947219     3045      227    65536        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947220]   109 1947220     3045      227    65536        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947221]   109 1947221     3045      227    57344        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947222]   109 1947222     3045      505    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947223]   109 1947223     3045      494    65536        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947224]   109 1947224     3045      519    57344        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947225]   109 1947225     3045      511    65536        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947226]   109 1947226     3045      488    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947227]   109 1947227     3045      531    69632        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947228]   109 1947228     3045      522    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947229]   109 1947229     3045      490    65536        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947230]   109 1947230     3045      529    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947231]   109 1947231     3045      498    65536        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947232]   109 1947232     3045      515    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947233]   109 1947233     3045      508    65536        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947234]   109 1947234     3045      545    61440        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947235]   109 1947235     3045      507    57344        0             0 sshd
[Sun Mar 19 19:48:12 2023] [1947236]     0 1947236     3045      356    73728        0             0 sshd
[Sun Mar 19 19:48:12 2023] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1001.slice/session-86297.scope,task=massa-node,pid=901968,uid=1001
[Sun Mar 19 19:48:12 2023] Out of memory: Killed process 901968 (massa-node) total-vm:19006684kB, anon-rss:15608884kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:32364kB oom_score_adj:0
[Sun Mar 19 19:48:12 2023] oom_reaper: reaped process 901968 (massa-node), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

@dr-chain
Copy link
Contributor

I will be stating something obvious here - It seems we have a memory leak issue. 🤔

@damip
Copy link
Member Author

damip commented Mar 20, 2023

Memory did not increase linearly but exploded suddenly, like the other time

@damip
Copy link
Member Author

damip commented Mar 20, 2023

Logs of servers 1,2,3,4: https://we.tl/t-hFT8JtAM7I

@damip
Copy link
Member Author

damip commented Mar 20, 2023

Good news: deadlock detection was running on all servers at log level Warn
Bad news: none have detected a deadlock in their logs

@damip damip changed the title General testnet crash Testnet crash under load Mar 20, 2023
@damip
Copy link
Member Author

damip commented Mar 20, 2023

Notes:

  • all nodes crashed, not only nodes that bootstrapped from others or were bootstrapping others => it's probably not a problem with bootstrap
  • all nodes crashed, not only nodes that had API calls => it's probably not an API problem

@dr-chain
Copy link
Contributor

dr-chain commented Mar 20, 2023

Will it be wrong to think in this way?

  • Since the memory usage spiked, some condition, that was newly satisfied, was never reset and it kept on satisfying.
  • Since this does not happen when you newly start the testnet, it is a condition that is supposed to check something later for the blockchain.
  • The condition can be even a trigger coming from an error-correction subroutine (if any).

As I am not well versed in the codebase yet, maybe someone can help me validate this hypothesis?

@damip
Copy link
Member Author

damip commented Mar 21, 2023

Figure_1

@damip
Copy link
Member Author

damip commented Mar 21, 2023

Zoom on the crash:

Figure_1_zoom

@aoudiamoncef
Copy link
Contributor

image
image

@aoudiamoncef
Copy link
Contributor

image
image
image

@aoudiamoncef
Copy link
Contributor

aoudiamoncef commented Mar 21, 2023

image
image
image

@aoudiamoncef
Copy link
Contributor

aoudiamoncef commented Mar 21, 2023

image
image
image
image

@dr-chain
Copy link
Contributor

From the graph, it seems that something abnormal happens on the Network Traffic and 5 minutes later the node crashes.

@aoudiamoncef
Copy link
Contributor

From the graph, it seems that something abnormal happens on the Network Traffic and 5 minutes later the node crashes.

It seems that there is a reception and a propagation in the network of a abnormal amount of data. This data filled all available RAM and caused a pick of CPU usage.

This scenario should be handled as it's basic, but a protection doesn't worked as expected ?

@damip
Copy link
Member Author

damip commented Mar 21, 2023

Zoom on the crash:

Figure_1_zoom

@dr-chain @aoudiamoncef in this case, CPU goes up before any abnormal network activity

@dr-chain
Copy link
Contributor

dr-chain commented Mar 21, 2023

@damip Interesting. Is this observation consistent with all nodes? I mean do we have the same results on all nodes?
Because if it is only at one node, that means the problem is not only created on one node but also propagated over the whole network to result in the crash. (since potentially the network module does not monitor the traffic for "bad" data)

If the node in the screenshot is the block creator of one of the last blocks then we can potentially reduce our investigation radius.

@AurelienFT
Copy link
Member

We don't reproduce this behavior since TEST.21. Now the memory is stable except : #3803

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants