p2p/discover: fix update logic in handleAddNode #29836

lightclient · 2024-05-24T21:27:49Z

It seems the semantic differences between addFoundNode and addInboundNode were lost in
#29572. My understanding is addFoundNode is for a node you have not contacted directly
(and are unsure if is available) whereas addInboundNode is for adding nodes that have
contacted the local node and we can verify they are active.

handleAddNode seems to be the consolidation of those two methods, yet it bumps the node in
the bucket (updating it's IP addr) even if the node was not an inbound. This PR fixes
this. It wasn't originally caught in tests like TestTable_addSeenNode because the
manipulation of the node object actually modified the node value used by the test.

New logic is added to reject non-inbound updates unless the sequence number of the
(signed) ENR increases. Inbound updates, which are published by the updated node itself,
are always accepted. If an inbound update changes the endpoint, the node will be
revalidated on an expedited schedule.

fjl · 2024-05-24T22:30:20Z

It's better to store *enode.Node because that object is immutable. We just need to protect the access to the field node.Node.

fjl · 2024-05-24T22:33:59Z

Actually, I think this problem will go away with the refactoring where node is only used in table. The race happens because we return *nodesByDistance from Table.findnodeByID, which contains *node. When we access the inner node later, there is a race with the table updating it.

fjl · 2024-05-24T23:26:00Z

handleAddNode seems to be the consolidation of those two methods, yet it bumps the node in the bucket (updating it's IP addr) even if the node was not an inbound.

I don't think it's a problem. bumpInBucket is safe to invoke even for non-inbound nodes.

lightclient · 2024-05-25T06:57:45Z

Is it correct to update the IP of non inbound add? This cause the test TestTable_addSeenNode to fail.

fjl · 2024-05-25T11:24:48Z

I've thought about it a bunch more, and it used to be correct to not update the endpoint for found nodes, especially since the endpoint information is not authenticated in discv4. However, it's different in discv5 with ENRs. If we find a newer record, we want to put it into the table. I think we should resolve this by adding a check for the sequence number. When we encounter a newer record, update it, regardless of req.isInbound. As a special case we can also update the endpoint if req.isInbound && node.Seq() == 0 for discv4.

fjl · 2024-05-25T11:25:40Z

Pretty happy you looked into this because I didn't see these issues.

Mazzika1 · 2024-05-25T11:26:10Z

Thank you

lightclient · 2024-05-25T13:06:28Z

Sure! I can update it.

fjl · 2024-05-26T10:17:52Z

Node refactoring is here #29844. It should fix the race, and this PR can just be about the handleAddNode thing.

lightclient · 2024-05-27T12:39:13Z

Updated.

lightclient · 2024-05-28T11:57:47Z

p2p/discover/table.go

+	// it is allowed to update its own entry.
+	n = b.entries[i]
+	isUpdate := newRecord.Seq() > n.Seq()
+	isDiscv4Update := n.Seq() == 0 && newRecord.Seq() == 0 && isInbound


For discv5 the message has already been authenticated by now, correct? So that means only the actual node could double sign sequence 0?

Is it an issue that now in discv5 inbound contacts can update their endpoint by replaying sequence 0?

I don't think it's an issue. We could always be even stricter and check if it's an unsigned record. However, that seems excessive. We previously accepted all inbound updates, even ones with a lower sequence number, and in fact this is something I am still considering to add back. Endpoint information provided by the node itself can probably always be considered valid.

One thing I'm still exploring here is the interaction with revalidation. If a node changes endpoint via this update path, we should also move it to the fast reval queue again. This requires rethinking the queue management a bit.

… update

It seems the semantic differences between addFoundNode and addInboundNode were lost in ethereum#29572. My understanding is addFoundNode is for a node you have not contacted directly (and are unsure if is available) whereas addInboundNode is for adding nodes that have contacted the local node and we can verify they are active. handleAddNode seems to be the consolidation of those two methods, yet it bumps the node in the bucket (updating it's IP addr) even if the node was not an inbound. This PR fixes this. It wasn't originally caught in tests like TestTable_addSeenNode because the manipulation of the node object actually modified the node value used by the test. New logic is added to reject non-inbound updates unless the sequence number of the (signed) ENR increases. Inbound updates, which are published by the updated node itself, are always accepted. If an inbound update changes the endpoint, the node will be revalidated on an expedited schedule. Co-authored-by: Felix Lange <fjl@twurst.com>

It seems the semantic differences between addFoundNode and addInboundNode were lost in (and are unsure if is available) whereas addInboundNode is for adding nodes that have contacted the local node and we can verify they are active. handleAddNode seems to be the consolidation of those two methods, yet it bumps the node in the bucket (updating it's IP addr) even if the node was not an inbound. This PR fixes this. It wasn't originally caught in tests like TestTable_addSeenNode because the manipulation of the node object actually modified the node value used by the test. New logic is added to reject non-inbound updates unless the sequence number of the (signed) ENR increases. Inbound updates, which are published by the updated node itself, are always accepted. If an inbound update changes the endpoint, the node will be revalidated on an expedited schedule. Co-authored-by: Felix Lange <fjl@twurst.com>

lightclient requested review from fjl and zsfelfoldi as code owners May 24, 2024 21:27

fjl changed the title ~~p2p/discover: copy enode in node wrapper to avoid race~~ p2p/discover: fix update logic in handleAddNode May 26, 2024

lightclient force-pushed the discover-race-fix branch from ae7d078 to 8e3b58f Compare May 27, 2024 12:37

lightclient commented May 28, 2024

View reviewed changes

lightclient and others added 6 commits May 28, 2024 18:22

p2p/discover: don't update endpoint for unauthenticated nodes in discv4

7afcb39

p2p/discover: be more strict about sequence updates

413bdb0

p2p/discover: reset isValidatedLive on inbound endpoint update

41141b5

p2p/discover: add more node update tests

9d0301d

p2p/discover: allow inbound updates with any seq

ddcabb1

p2p/discover: move node to fast revalidation list on inbound endpoint…

3148497

… update

fjl force-pushed the discover-race-fix branch from a7101a1 to 3148497 Compare May 28, 2024 16:54

fjl added 6 commits May 28, 2024 19:02

p2p/discover: update comment

6cab7dd

p2p/discover: fix test

8570ef0

p2p/discover: update comment

0d8ef1f

p2p/discover: gofmt

6699805

p2p/discover: update comment in test

8d3df60

p2p/discover: add test for revalidation endpoint change

2c06042

fjl merged commit cc22e0c into ethereum:master May 28, 2024
2 of 3 checks passed

fjl added this to the 1.14.4 milestone May 28, 2024

fjl mentioned this pull request May 28, 2024

p2p/discover: refactor node and endpoint representation #29844

Merged

This was referenced Jun 5, 2024

ethereum 1.14.4 Homebrew/homebrew-core#173743

Merged

ethereum 1.14.5 Homebrew/homebrew-core#173874

Merged

pratikspatil024 mentioned this pull request Jun 13, 2024

p2p: cherry-pick commits from geth for peering issues maticnetwork/bor#1267

Merged

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

p2p/discover: fix update logic in handleAddNode #29836

p2p/discover: fix update logic in handleAddNode #29836

lightclient commented May 24, 2024 •

edited by fjl

Loading

fjl commented May 24, 2024

fjl commented May 24, 2024

fjl commented May 24, 2024

lightclient commented May 25, 2024

fjl commented May 25, 2024 •

edited

Loading

fjl commented May 25, 2024

Mazzika1 commented May 25, 2024

lightclient commented May 25, 2024

fjl commented May 26, 2024 •

edited

Loading

lightclient commented May 27, 2024

lightclient May 28, 2024

fjl May 28, 2024 •

edited

Loading

fjl May 28, 2024 •

edited

Loading

p2p/discover: fix update logic in handleAddNode #29836

p2p/discover: fix update logic in handleAddNode #29836

Conversation

lightclient commented May 24, 2024 • edited by fjl Loading

fjl commented May 24, 2024

fjl commented May 24, 2024

fjl commented May 24, 2024

lightclient commented May 25, 2024

fjl commented May 25, 2024 • edited Loading

fjl commented May 25, 2024

Mazzika1 commented May 25, 2024

lightclient commented May 25, 2024

fjl commented May 26, 2024 • edited Loading

lightclient commented May 27, 2024

lightclient May 28, 2024

Choose a reason for hiding this comment

fjl May 28, 2024 • edited Loading

Choose a reason for hiding this comment

fjl May 28, 2024 • edited Loading

Choose a reason for hiding this comment

lightclient commented May 24, 2024 •

edited by fjl

Loading

fjl commented May 25, 2024 •

edited

Loading

fjl commented May 26, 2024 •

edited

Loading

fjl May 28, 2024 •

edited

Loading

fjl May 28, 2024 •

edited

Loading