Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClusterSlots uses too much memory #17789

Closed
sakridge opened this issue Jun 7, 2021 · 8 comments · Fixed by #17899
Closed

ClusterSlots uses too much memory #17789

sakridge opened this issue Jun 7, 2021 · 8 comments · Fixed by #17899
Assignees
Labels
stale [bot only] Added to stale content; results in auto-close after a week.

Comments

@sakridge
Copy link
Member

sakridge commented Jun 7, 2021

Problem

const CLUSTER_SLOTS_TRIM_SIZE: usize = 524_288; // 512K
pub type SlotPubkeys = HashMap<Pubkey, u64>;

#[derive(Default)]
pub struct ClusterSlots {
    cluster_slots: RwLock<BTreeMap<Slot, Arc<RwLock<SlotPubkeys>>>>,
hashbrown::raw::RawTableInner$LT$A$GT$::prepare_resize::hd9de4d7705dc4632
  in /home/stephen_solana_com/solana/target/release/solana-validator
5.89GB consumed over 1807948 calls from:
    0x55b0a435cb61
      in /home/stephen_solana_com/solana/target/release/solana-validator
    _$LT$hashbrown..map..HashMap$LT$K$C$V$C$S$C$A$GT$$u20$as$u20$core..iter..traits..collect..Extend$LT$$LP$K$C$V$RP$$GT$$GT$::extend::hc3281369a9ad10d2
      in /home/stephen_solana_com/solana/target/release/solana-validator
    solana_core::cluster_slots::ClusterSlots::update::h8fcd014c2713807e
      in /home/stephen_solana_com/solana/target/release/solana-validator
    solana_core::cluster_slots_service::ClusterSlotsService::run::h4f4d6ebf13410164
      in /home/stephen_solana_com/solana/target/release/solana-validator
    std::sys_common::backtrace::__rust_begin_short_backtrace::h253287d3ac949d1d
      in /home/stephen_solana_com/solana/target/release/solana-validator
    core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h5d585c3e7664364f
      in /home/stephen_solana_com/solana/target/release/solana-validator
    _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::hc444a77f8dd8d825
      at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546
      in /home/stephen_solana_com/solana/target/release/solana-validator
    _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::h8b68a0a9a2093dfc
      at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546
    std::sys::unix::thread::Thread::new::thread_start::hb95464447f61f48d
      at library/std/src/sys/unix/thread.rs:71
    start_thread
      in /lib/x86_64-linux-gnu/libpthread.so.0
    __clone
      in /lib/x86_64-linux-gnu/libc.so.6

Proposed Solution

Can we reduce CLUSTER_SLOTS_TRIM_SIZE ?

Or find a more efficient way to store this information. Instead use a map of pubkey -> BitVec or compressed BitVec?

Page out to disk in files or blockstore?

Consumers:
replay stage -> propagation stats
repair service -> pruning slots

@sakridge
Copy link
Member Author

sakridge commented Jun 8, 2021

@carllin @behzadnouri What do you guys think?

@behzadnouri
Copy link
Contributor

Yeah, let me look into this, since it was brought up in #14366 (comment) as well.

CLUSTER_SLOTS_TRIM_SIZE doesn't do much since it only limits the size of the outer map. Inner ones still can get pretty large. It is only intended to mitigate the case where a node receives bogus epoch slots.

But, bit-vec idea is promising. I will give it a shot.

@behzadnouri behzadnouri self-assigned this Jun 8, 2021
@carllin
Copy link
Contributor

carllin commented Jun 8, 2021

Yeah I was also thinking maybe you could probably page out:

  1. Much smaller slots
  2. Validators with low/0 stake as they probably aren't used for propagation stats, only rarely if they are selected for repair (Could represent all the 0 staked validators with one representative from that 0 staked set and occasionally shuffle that representative for instance).

@behzadnouri
Copy link
Contributor

Testing on TDS seems like dropping unstaked nodes does not help much. Cluster slots still has ~520k inner hashmaps. So the slots are spread over a range of size ~520k which sounds very suspicious to me.
Some snapshot of the slots with highest number of staked nodes look like:

[(80435455, 1755), (80435451, 1752), (80435450, 1752), (80435467, 1747), (80435452, 1747), (80435454, 1744), (80435475, 1743), (80435463, 1743), (80435471, 1739), (80435470, 1739), ....]

and with lowest number of staked nodes (from the same snapshot):

[(80556216, 1), (80556212, 1), (80556211, 1), (80556210, 1), (80556209, 1), (80556208, 1), (80556207, 1), (80556206, 1), (80556205, 1), (80556204, 1), ...]

@behzadnouri
Copy link
Contributor

This seems to be because of crds values from nodes with different shred_version are creeping in. Excluding them from get_epoch_slots will fix this. I will send out a patch soon.

behzadnouri added a commit to behzadnouri/solana that referenced this issue Jun 11, 2021
Inspecting TDS gossip table shows that crds values of nodes with
different shred-versions are creeping in. Their epoch-slots are
accumulated in ClusterSlots causing bogus slots very far from current
root which are not purged and so cause ClusterSlots keep consuming more
memory:
solana-labs#17789
solana-labs#14366 (comment)
solana-labs#14366 (comment)

This commit updates ClusterInfo::get_epoch_slots, and discards entries
from nodes with unknown or different shred-version.

Follow up commits will patch gossip not to waste bandwidth and memory
over crds values of nodes with different shred-version.
behzadnouri added a commit to behzadnouri/solana that referenced this issue Jun 11, 2021
Inspecting TDS gossip table shows that crds values of nodes with
different shred-versions are creeping in. Their epoch-slots are
accumulated in ClusterSlots causing bogus slots very far from current
root which are not purged and so cause ClusterSlots keep consuming more
memory:
solana-labs#17789
solana-labs#14366 (comment)
solana-labs#14366 (comment)

This commit updates ClusterInfo::get_epoch_slots, and discards entries
from nodes with unknown or different shred-version.

Follow up commits will patch gossip not to waste bandwidth and memory
over crds values of nodes with different shred-version.
behzadnouri added a commit to behzadnouri/solana that referenced this issue Jun 11, 2021
Inspecting TDS gossip table shows that crds values of nodes with
different shred-versions are creeping in. Their epoch-slots are
accumulated in ClusterSlots causing bogus slots very far from current
root which are not purged and so cause ClusterSlots keep consuming more
memory:
solana-labs#17789
solana-labs#14366 (comment)
solana-labs#14366 (comment)

This commit updates ClusterInfo::get_epoch_slots, and discards entries
from nodes with unknown or different shred-version.

Follow up commits will patch gossip not to waste bandwidth and memory
over crds values of nodes with different shred-version.
behzadnouri added a commit that referenced this issue Jun 13, 2021
…on (#17899)

Inspecting TDS gossip table shows that crds values of nodes with
different shred-versions are creeping in. Their epoch-slots are
accumulated in ClusterSlots causing bogus slots very far from current
root which are not purged and so cause ClusterSlots keep consuming more
memory:
#17789
#14366 (comment)
#14366 (comment)

This commit updates ClusterInfo::get_epoch_slots, and discards entries
from nodes with unknown or different shred-version.

Follow up commits will patch gossip not to waste bandwidth and memory
over crds values of nodes with different shred-version.
mergify bot pushed a commit that referenced this issue Jun 13, 2021
…on (#17899)

Inspecting TDS gossip table shows that crds values of nodes with
different shred-versions are creeping in. Their epoch-slots are
accumulated in ClusterSlots causing bogus slots very far from current
root which are not purged and so cause ClusterSlots keep consuming more
memory:
#17789
#14366 (comment)
#14366 (comment)

This commit updates ClusterInfo::get_epoch_slots, and discards entries
from nodes with unknown or different shred-version.

Follow up commits will patch gossip not to waste bandwidth and memory
over crds values of nodes with different shred-version.

(cherry picked from commit 985280e)
mergify bot added a commit that referenced this issue Jun 13, 2021
…on (#17899) (#17916)

Inspecting TDS gossip table shows that crds values of nodes with
different shred-versions are creeping in. Their epoch-slots are
accumulated in ClusterSlots causing bogus slots very far from current
root which are not purged and so cause ClusterSlots keep consuming more
memory:
#17789
#14366 (comment)
#14366 (comment)

This commit updates ClusterInfo::get_epoch_slots, and discards entries
from nodes with unknown or different shred-version.

Follow up commits will patch gossip not to waste bandwidth and memory
over crds values of nodes with different shred-version.

(cherry picked from commit 985280e)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
@behzadnouri
Copy link
Contributor

reopening this since cluser_slots is still consuming 7GB on mainnet:
https://discord.com/channels/428295358100013066/439194979856809985/865513541820743720

@behzadnouri behzadnouri reopened this Jul 16, 2021
@behzadnouri
Copy link
Contributor

Checking on testnet, all the erroneous epoch-slots are from: 77VrLXNcVnpPHcMCcfkNU8SpuPzoCbGfp3Kqc2iV1dgu
which is using same identity for both mainnet and testnet; and so causing cross contamination between clusters.

Probably the node does not halt because NodeInstances are not propagated due to different shred-versions:
https://github.com/solana-labs/solana/blob/faf99f476/gossip/src/cluster_info.rs#L2321-L2326

However EpochSlots are cross-contaminating which is confusing why. Maybe ContactInfo is updated after shred-version of EpochSlots is checked and so it bypasses the shred-version check. But not sure why the same does not happen with NodeInstance.

On testnet:

Identity                                      Vote Account                            Commission  Last Vote       Root Slot    Skip Rate  Credits  Version  Active Stake
77VrLXNcVnpPHcMCcfkNU8SpuPzoCbGfp3Kqc2iV1dgu  213LE1Jpb4KwfAeqnWY9dzr3TpYoadGk2oQmHXML2AdZ  100%  88898340 ( -2)  88898293 ( -1) 100.00%   124619    1.7.9  5000.98401984 SOL (0.01%)

IP Address        |Age(ms)| Node identifier                              | Version |Gossip| TPU  |TPUfwd| TVU  |TVUfwd|Repair|ServeR|ShredVer
------------------+-------+----------------------------------------------+---------+------+------+------+------+------+------+------+--------
157.90.210.250    |   338 | 77VrLXNcVnpPHcMCcfkNU8SpuPzoCbGfp3Kqc2iV1dgu |  1.7.9  | 8000 | 8003 | 8004 | 8001 | 8002 | 8006 | 8007 | 18122

On mainnet:

Identity                                      Vote Account                            Commission  Last Vote       Root Slot    Skip Rate  Credits  Version  Active Stake
77VrLXNcVnpPHcMCcfkNU8SpuPzoCbGfp3Kqc2iV1dgu  8RCqWaeUgVvPEuHxa3SchprM1jm2dyJTbEYGmfMBmiyQ   10%  91087005 ( -2)  91086940 ( -4)      -    269272   1.6.20  200.908629969 SOL (0.00%)

IP Address        |Age(ms)| Node identifier                              | Version |Gossip| TPU  |TPUfwd| TVU  |TVUfwd|Repair|ServeR|ShredVer
------------------+-------+----------------------------------------------+---------+------+------+------+------+------+------+------+--------
65.21.94.24       |   260 | 77VrLXNcVnpPHcMCcfkNU8SpuPzoCbGfp3Kqc2iV1dgu | 1.6.20  | 8000 | 8003 | 8004 | 8001 | 8002 | 8006 | 8007 | 13490

@behzadnouri
Copy link
Contributor

#19190 should mitigate above situation where a node is using same identity key for testnet and mainnet clusters. Also, announced on discord for the nodes to change their identity keys:
https://discord.com/channels/428295358100013066/594138785558691840/874801862853939200

behzadnouri added a commit to behzadnouri/solana that referenced this issue Aug 16, 2021
Cross cluster gossip contamination is causing cluster-slots hash map to
contain a lot of bogus values and consume too much memory:
solana-labs#17789

If a node is using the same identity key across clusters, then these
erroneous values might not be filtered out by shred-versions check,
because one of the variants of the contact-info will have matching
shred-version:
solana-labs#17789 (comment)

The cluster-slots hash-map is bounded and trimmed at the lower end by
the current root. This commit also discards slots epochs ahead of the
root.
behzadnouri added a commit to behzadnouri/solana that referenced this issue Aug 16, 2021
Cross cluster gossip contamination is causing cluster-slots hash map to
contain a lot of bogus values and consume too much memory:
solana-labs#17789

If a node is using the same identity key across clusters, then these
erroneous values might not be filtered out by shred-versions check,
because one of the variants of the contact-info will have matching
shred-version:
solana-labs#17789 (comment)

The cluster-slots hash-map is bounded and trimmed at the lower end by
the current root. This commit also discards slots epochs ahead of the
root.
behzadnouri added a commit to behzadnouri/solana that referenced this issue Aug 16, 2021
Cross cluster gossip contamination is causing cluster-slots hash map to
contain a lot of bogus values and consume too much memory:
solana-labs#17789

If a node is using the same identity key across clusters, then these
erroneous values might not be filtered out by shred-versions check,
because one of the variants of the contact-info will have matching
shred-version:
solana-labs#17789 (comment)

The cluster-slots hash-map is bounded and trimmed at the lower end by
the current root. This commit also discards slots epochs ahead of the
root.
behzadnouri added a commit that referenced this issue Aug 17, 2021
Cross cluster gossip contamination is causing cluster-slots hash map to
contain a lot of bogus values and consume too much memory:
#17789

If a node is using the same identity key across clusters, then these
erroneous values might not be filtered out by shred-versions check,
because one of the variants of the contact-info will have matching
shred-version:
#17789 (comment)

The cluster-slots hash-map is bounded and trimmed at the lower end by
the current root. This commit also discards slots epochs ahead of the
root.
mergify bot pushed a commit that referenced this issue Aug 19, 2021
Cross cluster gossip contamination is causing cluster-slots hash map to
contain a lot of bogus values and consume too much memory:
#17789

If a node is using the same identity key across clusters, then these
erroneous values might not be filtered out by shred-versions check,
because one of the variants of the contact-info will have matching
shred-version:
#17789 (comment)

The cluster-slots hash-map is bounded and trimmed at the lower end by
the current root. This commit also discards slots epochs ahead of the
root.

(cherry picked from commit 563aec0)
mergify bot added a commit that referenced this issue Aug 19, 2021
Cross cluster gossip contamination is causing cluster-slots hash map to
contain a lot of bogus values and consume too much memory:
#17789

If a node is using the same identity key across clusters, then these
erroneous values might not be filtered out by shred-versions check,
because one of the variants of the contact-info will have matching
shred-version:
#17789 (comment)

The cluster-slots hash-map is bounded and trimmed at the lower end by
the current root. This commit also discards slots epochs ahead of the
root.

(cherry picked from commit 563aec0)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
mergify bot pushed a commit that referenced this issue Sep 1, 2021
…on (#17899)

Inspecting TDS gossip table shows that crds values of nodes with
different shred-versions are creeping in. Their epoch-slots are
accumulated in ClusterSlots causing bogus slots very far from current
root which are not purged and so cause ClusterSlots keep consuming more
memory:
#17789
#14366 (comment)
#14366 (comment)

This commit updates ClusterInfo::get_epoch_slots, and discards entries
from nodes with unknown or different shred-version.

Follow up commits will patch gossip not to waste bandwidth and memory
over crds values of nodes with different shred-version.

(cherry picked from commit 985280e)

# Conflicts:
#	core/src/cluster_info.rs
mergify bot added a commit that referenced this issue Sep 1, 2021
…on (backport #17899) (#19551)

* excludes epoch-slots from nodes with unknown or different shred version (#17899)

Inspecting TDS gossip table shows that crds values of nodes with
different shred-versions are creeping in. Their epoch-slots are
accumulated in ClusterSlots causing bogus slots very far from current
root which are not purged and so cause ClusterSlots keep consuming more
memory:
#17789
#14366 (comment)
#14366 (comment)

This commit updates ClusterInfo::get_epoch_slots, and discards entries
from nodes with unknown or different shred-version.

Follow up commits will patch gossip not to waste bandwidth and memory
over crds values of nodes with different shred-version.

(cherry picked from commit 985280e)

# Conflicts:
#	core/src/cluster_info.rs

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Dec 20, 2022
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 28, 2022
@behzadnouri behzadnouri removed the stale [bot only] Added to stale content; results in auto-close after a week. label Jan 3, 2023
@behzadnouri behzadnouri reopened this Jan 3, 2023
@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Jan 4, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale [bot only] Added to stale content; results in auto-close after a week.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants