Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seenMessages TTL should be based on last observed time, not first #502

Closed
stbrody opened this issue Oct 26, 2022 · 7 comments · Fixed by #513
Closed

seenMessages TTL should be based on last observed time, not first #502

stbrody opened this issue Oct 26, 2022 · 7 comments · Fixed by #513

Comments

@stbrody
Copy link

stbrody commented Oct 26, 2022

The seenMessage cache prevents us from processing and re-broadcasting a pubsub message that we have already seen within the seenMsgTTL (default 2 minutes). Accessing an existing message, however, does not update the timestamp associated with the entry in the cache.

This means that if a single pubsub message is being received frequently and consistently, we'll re-broadcast it every seenMsgTTL, rather than never re-broadcasting it as is the desired behavior. If every time we received a duplicate pubsub message we pushed back the expiration time in the cache, that would give us the desired behavior of never re-broadcasting a message that we have already seen within the last seenMsgTTL seconds.

@stbrody
Copy link
Author

stbrody commented Oct 26, 2022

Please note that we have experienced a severe issue in the past with seeing a flood of pubsub messages suddenly come in all at once. On closer observation we notice that many of the messages are duplicate messages that we have seen before. We reported it a while ago against the JS implementation here, but we have seen it more recently on go-ipfs v12 as well. In the most recent occurrence, we never saw the same message duplicated within a 2 minute window, but would often see the same messages coming into our node every 2-3 minutes.

So I think it's possible that this relatively small change could have a significant benefit for us to avoid these quite disastrous pubsub floods that we have been experiencing sporadically.

@stbrody
Copy link
Author

stbrody commented Oct 26, 2022

We also discussed the more recent occurrence of the pubsub flood issue in this slack thread with some PL folks, for reference: https://filecoinproject.slack.com/archives/C025ZN5LNV8/p1661459082059149

@vyzo
Copy link
Collaborator

vyzo commented Oct 26, 2022

This makes sense, care for a patch?

@vyzo
Copy link
Collaborator

vyzo commented Oct 26, 2022

Probably as an option, it can be turned on by default.

@stbrody
Copy link
Author

stbrody commented Oct 26, 2022

@vyzo Looks like I'd have to either modify or fork the timecache library as it doesn't provide any way to update the expiration time for an existing entry (it even errors if you try to add the same key twice). Do you know who the maintainer is? Any idea if I'd be better off making a PR to the existing library vs forking it or making a new TTL cache implementation?

@vyzo
Copy link
Collaborator

vyzo commented Oct 26, 2022

It's not really maintained, we should probably move it in tree.

@smrz2001
Copy link
Contributor

smrz2001 commented Jan 6, 2023

Hey @vyzo, I just put up a PR for this issue. Please take a look when you get a chance 🙏🏼

@vyzo vyzo closed this as completed in #513 Jan 24, 2023
lukasz-zimnoch added a commit to keep-network/keep-core that referenced this issue Feb 7, 2024
By default, the libp2p seen messages cache uses the `FirstSeen` strategy
which expires an entry once TTL elapses from the time it was added.
This means that if a single message is being received frequently and
consistently, pubsub will re-broadcast it every TTL, rather than never
re-broadcasting it.

In the context of the Keep client which additionally uses app-level
retransmissions, that often leads to a strong message amplification in the
broadcast channel which cause a significant increase in the network load.

As the problem is quite common
(see libp2p/go-libp2p-pubsub#502), the libp2p team
added a new `LastSeen` strategy which behaves differently. This strategy
expires an entry once TTL elapses from the last time the message was touched
by a cache write (`Add`) or read (`Has`) operation. That gives the desired
behavior of never re-broadcasting a message that was already seen within the
last TTL period. This reduces the risk of unintended over-amplification.
tomaszslabon added a commit to keep-network/keep-core that referenced this issue Feb 8, 2024
Refs: #3770
Depends on: #3771

Recent libp2p versions (we started to use them in
#3771) introduced a way to
set the seen messages cache TTL and strategy. Here we leverage those
settings to reduce the excessive message flooding effect that sometimes
occurs on mainnet. This pull request consists of two steps

### Use longer TTL for pubsub seen messages cache

Once a message is received and validated, pubsub re-broadcasts it to
other peers and puts it into the seen messages cache. This way,
subsequent arrivals of the same message are not re-broadcasted
unnecessarily. This mechanism is important for the network to avoid
excessive message flooding. The default value used by libp2p is 2
minutes. However, Keep client messaging sessions are quite
time-consuming so, we use a longer TTL of 5 minutes to reduce flooding
risk even further. Worth noting that this time cannot be too long as the
cache may grow excessively and impact memory consumption.

### Use `LastSeen` as seen messages cache strategy

By default, the libp2p seen messages cache uses the `FirstSeen` strategy
which expires an entry once TTL elapses from when it was added. This
means that if a single message is being received frequently and
consistently, pubsub will re-broadcast it every TTL, rather than never
re-broadcasting it.

In the context of the Keep client which additionally uses app-level
retransmissions, that often leads to a strong message amplification in
the broadcast channel which causes a significant increase in the network
load.

As the problem is quite common (see
libp2p/go-libp2p-pubsub#502), the libp2p team
added a new `LastSeen` strategy which behaves differently. This strategy
expires an entry once TTL elapses from the last time the message was
touched by a cache write (`Add`) or read (`Has`) operation. That gives
the desired behavior of never re-broadcasting a message that was already
seen within the last TTL period. This reduces the risk of unintended
over-amplification.
lukasz-zimnoch added a commit to keep-network/keep-core that referenced this issue Feb 12, 2024
By default, the libp2p seen messages cache uses the `FirstSeen` strategy
which expires an entry once TTL elapses from the time it was added.
This means that if a single message is being received frequently and
consistently, pubsub will re-broadcast it every TTL, rather than never
re-broadcasting it.

In the context of the Keep client which additionally uses app-level
retransmissions, that often leads to a strong message amplification in the
broadcast channel which cause a significant increase in the network load.

As the problem is quite common
(see libp2p/go-libp2p-pubsub#502), the libp2p team
added a new `LastSeen` strategy which behaves differently. This strategy
expires an entry once TTL elapses from the last time the message was touched
by a cache write (`Add`) or read (`Has`) operation. That gives the desired
behavior of never re-broadcasting a message that was already seen within the
last TTL period. This reduces the risk of unintended over-amplification.

(cherry picked from commit 89a7b77)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants