Prevent relayed validators from heartbeating #1078
Conversation
I don't disagree with the spirit of this but you still can't trust a validator not to pretend and submit heartbeats anyway on a separate process or by forking the codebase. It's certainly more work but possible. I wonder if there's an alternate approach (maybe a bigger change) that the current CG doesn't accept the transaction unless the submitter is dialable? Or some kind of reverse heartbeat for external validation? |
Thanks Abhay. I totally agree that this needs a more robust approach for the long-term - perhaps what you are suggesting regarding the CG not accepting the heartbeat. In the near term, I believe this will be helpful as the relayed validators are likely unknowingly misconfigured, not nefariously so. Sure they could fork the miner and heartbeat anyway, but wouldn't it just be easier to fix your network config at that point? |
I'd suggest a better approach would be to make relcast close connections from peers who only have a relay address in their peerbook. I think this will help accidentally misconfigured hosts but is not adversarially resistant. |
I like this but it doesn't solve for the misconfigured accounts that use a (technically public) IP space for their private networks. IIRC the 16.0.0.0/8 range was used by a large wallet for a long period of time until they fixed it. I think @PaulVMo liked to call them Mr. 305 or something? Or maybe that's just a rapper. |
Does this approach prevent them from getting elected in the first place or does it cause them to fail the DKG? I do not really like approaches that rely on them failing DKG because it hurts validator earnings. Thankfully, Mr 328 has fixed their 16.x and 12.x validators. Neither approach would have stopped that poor configuration. A more complex check like a reverse heartbeat would. |
Perhaps the heartbeat should include the IP they must be dialed on and we'd bypass the peerbook when trying to dial them? I feel like this stuff gets pretty complicated fast because it's tough to verify. |
Said slightly differently to make it less about rewards (but I understand the sentiment), one validator's misconfiguration shouldn't penalize other unwitting and correctly configured validator. |
Overall my opinion here is to build out better signals of connectivity (light hotspots using the node as a validator, # of valid transactions the node proposes, etc) than trying to solve this problem at this level. Verifying 'connectivity' is a very subjective and tricky thing to do fairly and reliably. |
I agree with this. In the mean time however (since this sound like a lot of work), what's the harm in having validators self-police this by not heartbeating if they are relayed? Sure this can be cheated. But, what validator who is too lazy to setup their networking correcting is going to take the time to fork the miner and bypass this check? |
I don't object to this PR and I think you're right that it will help. I do object to some of the more complicated things being proposed like 'reverse heartbeating' or 'verified connectivity'. |
One wrinkle with this PR is we need an escape hatch when running tests. |
Could I do something like -ifdef(TEST). set some flag to true -else. set it false -endif. And then check that value in addition to the IP address? Or is there another way to test for TEST directly? |
Probably the best way is to define two versions of the function, depending on the value of TEST. Erlang macro ifdefs cannot appear in the middle of a function. |
1aaa7d4
to
f5f0a3f
Compare
Please let me know what you think of my fix for bypassing the check when TEST is set. I don't love that it still does the work to get the listen addresses even when TEST is true; however, I thought this was more readable than have two versions of the entire function. |
f5f0a3f
to
2d927bc
Compare
2d927bc
to
b24ad62
Compare
Thanks for this PR, Paul. I agree that it is a reasonable measure that will encourage fundamentally honest yet misconfigured validators to rectify their network config. We have routinely seen how poorly configured validators can spread their malaise to other members of the CG and anything to address that is welcome. I'm not familiar with Erlang or the Helium codebase, but I am reassured that this is a small PR. The risk/reward seems worth it. |
I agree with this PR as well. As stated above, it's a band-aid and there can be more elegant solutions downstream, but for now it will at least create a giant "OFFLINE" tag in explorer and hopefully prompt questions in Discord or Github to help identify and assist validator operators who are relayed. |
b24ad62
to
53e1231
Compare
I also agree with this PR. Currently, there are At best, the relationship between having a But even assuming it is indeed a coincidence, undialable validators operate at the determinant to other peers in CG by refusing necessary connections for performing CG functions, such as completion of RBC, and thus degrade network performance. While the remedy proposed here is a temporary fix, it is a certainly better than nothing, and will help good faith operators running misconfigured validators to change their behavior, while also potentially elucidating bad faith actors. |
53e1231
to
c65b0b6
Compare
Rebased to include light hotspot changes |
Relayed validators hurt the performance of the consensus group because they are not directly reachable by other CG members. In order to prevent them from being elected to consensus group, this change adds a check in the heartbeat logic for a public listening address for the validator in the peerbook and skips the heartbeat if there is not one.