Skip to content
This repository has been archived by the owner on Jun 12, 2023. It is now read-only.

Prevent relayed validators from heartbeating #1078

Merged
merged 1 commit into from May 16, 2022

Conversation

PaulVMo
Copy link
Contributor

@PaulVMo PaulVMo commented Sep 23, 2021

Relayed validators hurt the performance of the consensus group because they are not directly reachable by other CG members. In order to prevent them from being elected to consensus group, this change adds a check in the heartbeat logic for a public listening address for the validator in the peerbook and skips the heartbeat if there is not one.

@abhay
Copy link
Contributor

abhay commented Sep 23, 2021

I don't disagree with the spirit of this but you still can't trust a validator not to pretend and submit heartbeats anyway on a separate process or by forking the codebase. It's certainly more work but possible.

I wonder if there's an alternate approach (maybe a bigger change) that the current CG doesn't accept the transaction unless the submitter is dialable? Or some kind of reverse heartbeat for external validation?

@evanmcc @Vagabond ?

@PaulVMo
Copy link
Contributor Author

PaulVMo commented Sep 23, 2021

Thanks Abhay. I totally agree that this needs a more robust approach for the long-term - perhaps what you are suggesting regarding the CG not accepting the heartbeat.

In the near term, I believe this will be helpful as the relayed validators are likely unknowingly misconfigured, not nefariously so. Sure they could fork the miner and heartbeat anyway, but wouldn't it just be easier to fix your network config at that point?

@Vagabond
Copy link
Member

I'd suggest a better approach would be to make relcast close connections from peers who only have a relay address in their peerbook. I think this will help accidentally misconfigured hosts but is not adversarially resistant.

@abhay
Copy link
Contributor

abhay commented Sep 23, 2021

I'd suggest a better approach would be to make relcast close connections from peers who only have a relay address in their peerbook. I think this will help accidentally misconfigured hosts but is not adversarially resistant.

I like this but it doesn't solve for the misconfigured accounts that use a (technically public) IP space for their private networks. IIRC the 16.0.0.0/8 range was used by a large wallet for a long period of time until they fixed it.

I think @PaulVMo liked to call them Mr. 305 or something? Or maybe that's just a rapper.

@PaulVMo
Copy link
Contributor Author

PaulVMo commented Sep 23, 2021

I'd suggest a better approach would be to make relcast close connections from peers who only have a relay address in their peerbook. I think this will help accidentally misconfigured hosts but is not adversarially resistant.

Does this approach prevent them from getting elected in the first place or does it cause them to fail the DKG? I do not really like approaches that rely on them failing DKG because it hurts validator earnings.

Thankfully, Mr 328 has fixed their 16.x and 12.x validators. Neither approach would have stopped that poor configuration. A more complex check like a reverse heartbeat would.

@Vagabond
Copy link
Member

Perhaps the heartbeat should include the IP they must be dialed on and we'd bypass the peerbook when trying to dial them? I feel like this stuff gets pretty complicated fast because it's tough to verify.

@abhay
Copy link
Contributor

abhay commented Sep 23, 2021

I do not really like approaches that rely on them failing DKG because it hurts validator earnings.

Said slightly differently to make it less about rewards (but I understand the sentiment), one validator's misconfiguration shouldn't penalize other unwitting and correctly configured validator.

@Vagabond
Copy link
Member

Overall my opinion here is to build out better signals of connectivity (light hotspots using the node as a validator, # of valid transactions the node proposes, etc) than trying to solve this problem at this level. Verifying 'connectivity' is a very subjective and tricky thing to do fairly and reliably.

@PaulVMo
Copy link
Contributor Author

PaulVMo commented Sep 23, 2021

Overall my opinion here is to build out better signals of connectivity (light hotspots using the node as a validator, # of valid transactions the node proposes, etc) than trying to solve this problem at this level. Verifying 'connectivity' is a very subjective and tricky thing to do fairly and reliably.

I agree with this. In the mean time however (since this sound like a lot of work), what's the harm in having validators self-police this by not heartbeating if they are relayed? Sure this can be cheated. But, what validator who is too lazy to setup their networking correcting is going to take the time to fork the miner and bypass this check?

@Vagabond
Copy link
Member

I don't object to this PR and I think you're right that it will help. I do object to some of the more complicated things being proposed like 'reverse heartbeating' or 'verified connectivity'.

@Vagabond
Copy link
Member

One wrinkle with this PR is we need an escape hatch when running tests.

@PaulVMo
Copy link
Contributor Author

PaulVMo commented Sep 23, 2021

One wrinkle with this PR is we need an escape hatch when running tests.

Could I do something like -ifdef(TEST). set some flag to true -else. set it false -endif. And then check that value in addition to the IP address? Or is there another way to test for TEST directly?

@Vagabond
Copy link
Member

Probably the best way is to define two versions of the function, depending on the value of TEST. Erlang macro ifdefs cannot appear in the middle of a function.

@PaulVMo
Copy link
Contributor Author

PaulVMo commented Sep 23, 2021

Please let me know what you think of my fix for bypassing the check when TEST is set. I don't love that it still does the work to get the listen addresses even when TEST is true; however, I thought this was more readable than have two versions of the entire function.

@OrmEmbaar
Copy link

Thanks for this PR, Paul. I agree that it is a reasonable measure that will encourage fundamentally honest yet misconfigured validators to rectify their network config. We have routinely seen how poorly configured validators can spread their malaise to other members of the CG and anything to address that is welcome.

I'm not familiar with Erlang or the Helium codebase, but I am reassured that this is a small PR. The risk/reward seems worth it.

@ericjohncarlson
Copy link

I agree with this PR as well. As stated above, it's a band-aid and there can be more elegant solutions downstream, but for now it will at least create a giant "OFFLINE" tag in explorer and hopefully prompt questions in Discord or Github to help identify and assist validator operators who are relayed.

@deasydoesit
Copy link

I also agree with this PR.

Currently, there are 36 validators on the network with only a p2p-circuit address. This collection of validators has a median performance+dkg/tenure ratio of 0.29, while the network wide median is 1.31.

At best, the relationship between having a p2p-circuit address and significantly better-than-average penalty performance is a coincidence. At worst, however, the relationship may suggest that a penalty benefit can be achieved by running undialable validators. Such a vector could allow an exploitative operator the opportunity to increase earnings at the expense of the health of the network.

But even assuming it is indeed a coincidence, undialable validators operate at the determinant to other peers in CG by refusing necessary connections for performing CG functions, such as completion of RBC, and thus degrade network performance.

While the remedy proposed here is a temporary fix, it is a certainly better than nothing, and will help good faith operators running misconfigured validators to change their behavior, while also potentially elucidating bad faith actors.

@PaulVMo
Copy link
Contributor Author

PaulVMo commented May 3, 2022

Rebased to include light hotspot changes

@PaulVMo
Copy link
Contributor Author

PaulVMo commented May 9, 2022

Friendly reminder on this @evanmcc and @Vagabond. I think this would be a great addition to avoid lazy/misconfigured validations with the light hotspot activation. Relayed validators are useless to hotspots. This will be a signal to fix their configuration.

@madninja madninja merged commit 1644659 into helium:master May 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants