Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gossip service panics when a message exceeds BLOB_SIZE #5430

Closed
sagar-solana opened this issue Aug 6, 2019 · 2 comments

Comments

@sagar-solana
Copy link
Contributor

commented Aug 6, 2019

Problem

assertion failed: len <= BLOB_SIZE

Proposed Solution

Decouple gossip from the size restriction on data_blobs.
Might involve introducing a new blob type

@sagar-solana sagar-solana added this to the Waimea v0.17.2 milestone Aug 6, 2019

@sagar-solana sagar-solana self-assigned this Aug 6, 2019

@mvines mvines added this to To do in TdS Stage 0 via automation Aug 6, 2019

@mvines

This comment has been minimized.

Copy link
Member

commented Aug 6, 2019

2019-08-06 08:30:23	123vij84ecQEKUvQ7gYMKxKwKF6PbYSzCzzURYA4xULY	validator	solana-gossip	panicked at 'assertion failed: len <= BLOB_SIZE', core/src/packet.rs:338:5
2019-08-06 08:29:36	H4Dgb3KyCuYWKT8yKtp8qbY49cvaqZcisa2GDnroFsv6	validator	solana-gossip	panicked at 'assertion failed: len <= BLOB_SIZE', core/src/packet.rs:338:5
2019-08-06 08:29:33	DadnDZbFH5BHHRHD7TaobaSQ7QATXgvWegHUcZ7ZGzmW	validator	solana-gossip	panicked at 'assertion failed: len <= BLOB_SIZE', core/src/packet.rs:338:5
2019-08-06 08:27:02	F7FgS6rrWckgC5X4cP5WtRRp3U1u12nnuTRXbWYaKn1u	validator	solana-gossip	panicked at 'assertion failed: len <= BLOB_SIZE', core/src/packet.rs:338:5
2019-08-06 08:26:37	2yDwZer11v2TTj86WeHzRDpE4HJVbyJ3fJ8H4AkUtWTc	validator	solana-gossip	panicked at 'assertion failed: len <= BLOB_SIZE', core/src/packet.rs:338:5
2019-08-06 08:24:58	DJvMQcb3ZtXC49LsaMvAo4x1rzCxjNfBfZtvkUeR4mAx	validator	solana-gossip	panicked at 'assertion failed: len <= BLOB_SIZE', core/src/packet.rs:338:5
2019-08-06 08:19:50	Hhk3n7X1pq4iCuh1XpdDm81pYNhqTUcphNXSq2qTRiCk	validator	solana-gossip	panicked at 'assertion failed: len <= BLOB_SIZE', core/src/packet.rs:338:5
2019-08-06 08:19:29	8diJdQj3y4QbkjfnXr95SXoktiJ1ad965ZkeFsmutfyz	validator	solana-gossip	panicked at 'assertion failed: len <= BLOB_SIZE', core/src/packet.rs:338:5
2019-08-06 08:18:31	9mbQ9mrRjBGiUVnz9Tdf7PuAkWomw3GLZ6Sup3R93Gw8	validator	solana-gossip	panicked at 'assertion failed: len <= BLOB_SIZE', core/src/packet.rs:338:5
2019-08-06 08:16:05	47UuTGPAQZX2HnVcfxKk8b1BtA4rRTduVaHnvxzQe6AJ	validator	solana-gossip	panicked at 'assertion failed: len <= BLOB_SIZE', core/src/packet.rs:338:5
2019-08-06 08:15:49	9J8WcnXxo3ArgEwktfk9tsrf4Rp8h5uPUgnQbQHLvtkd	validator	solana-gossip	panicked at 'assertion failed: len <= BLOB_SIZE', core/src/packet.rs:338:5
2019-08-06 08:15:33	5NH47Zk9NAzfbtqNpUtn8CQgNZeZE88aa2NRpfe7DyTD	validator	solana-gossip	panicked at 'assertion failed: len <= BLOB_SIZE', core/src/packet.rs:338:5
2019-08-06 08:15:32	4vgoKb76Z2vj9V9z7hoQpZkkwJrkL1z35LWNd9EXSi2o	validator	solana-gossip	panicked at 'assertion failed: len <= BLOB_SIZE', core/src/packet.rs:338:5
2019-08-06 08:15:19	55nmQ8gdWpNW5tLPoBPsqDkLm1W24cmY5DbMMXZKSP8U	validator	solana-gossip	panicked at 'assertion failed: len <= BLOB_SIZE', core/src/packet.rs:338:5
2019-08-06 08:15:07	2X5JSTLN9m2wm3ejCxfWRNMieuC2VMtaMWSoqLPbC4Pq	validator	solana-gossip	panicked at 'assertion failed: len <= BLOB_SIZE', core/src/packet.rs:338:5
2019-08-06 08:14:55	8FaFEcUFgvJns6RAU4dso3aTm2qfzZMt2xXtSgCh3kn9	validator	solana-gossip	panicked at 'assertion failed: len <= BLOB_SIZE', core/src/packet.rs:338:5
@sagar-solana

This comment has been minimized.

Copy link
Contributor Author

commented Aug 7, 2019

@aeyakovenko current theory on this is infact the bloom filter but there's a reason it only happened on some nodes.

Here's what I think is happening.

When a bunch of nodes are catching up, they are seeing tons of slots in a very short amount of time. That causes them to update their EpochSlots (for repairman protocol) and Vote on every new slot they see.

These new CrdsValues are being pushed to gossip at such a high rate that the keep replacing the older ones. The older ones are now residing in purged_values and also get added to the bloom. So only during this catchup phase we have a ton of purged values that make the bloom blow up.

So looks like we need a limit on the bloom size or introduce a new blob type that bumps the limit back up. Thoughts ?

@mvines mvines moved this from To do to Blocking Dry Run 3 in TdS Stage 0 Aug 8, 2019

TdS Stage 0 automation moved this from Blocking Dry Run 3 to Done Aug 12, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
2 participants
You can’t perform that action at this time.