Conversation
Treat disconnected state the same as full-sync DHT
fn is_autonomous_node(&mut self) -> bool { | ||
self.is_full_sync_DHT || !self.connection_ready() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder about race conditions.
- connection_ready() could flip in between when it is checked and when the message is actually handled
- there is a delay between send_wire_message putting a message on the queue and when it actually gets sent
if these are valid concerns, i wonder if there is a unified way to deal with the online and the offline case that doesn't require a condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maackle if valid concerns, would they also apply to the existing code in the boundary between full sync and sharded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, currently, if you were connected and then you loose connection, it will just stop working.
With these changes, a query that is send just when loosing connection might not work, but if the user continues to use a hApp UI which then makes the DNA issue more queries, those would then get routed back to the local instance which can then respond.
If the node then reconnects to sim2h again, the server will request authoring and gossip lists and make sure that new data gets shared in the DHT. So, it should be all fine.. But we also should add tests for all of this. That would have prevented the regression for this is a fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i approve this provisionally (if we can get some tests that would be great)
i agree that what @maackle was pointing to re: actions being safe under various network states and even transitions between network states happening concurrently to some action, is really important, seems like a followup though
the current state of things is:
0 peers (offline) = shard
1-sharding peers = full sync
sharding+ peers = shard
this doesn't make sense to me at a really high level, it completely breaks hApps when they lose connection to the network :(
it is compounded by the lack of wss
support on sim2h currently, if anyone wants to test code that is supposed to run on a private network they can't use ws
and they can't run the happ offline, so users doing testing need to spin up their own sim2h locally which is clunky :(
i think the intent of the "full sync below this number of peers" logic must include zero peers also, given that full sync already works offline (even if technically it doesn't work during the moment we transition to/from offline) :)
PR summary
Treat disconnected state the same as full-sync DHT
testing/benchmarking notes
( if any manual testing or benchmarking was/should be done, add notes and/or screenshots here )
followups
( any new tickets/concerns that were discovered or created during this work but aren't in scope for review here )
changelog
documentation