Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Light Client refactoring #237

Merged
merged 102 commits into from
May 27, 2020
Merged

Light Client refactoring #237

merged 102 commits into from
May 27, 2020

Conversation

romac
Copy link
Member

@romac romac commented Apr 24, 2020

Closes: #230
Closes: #229
Closes: #174

Here's take 2 of the light client spike.

This version follows pretty much the same decomposition as in the ADR (minus the Demuxer for lack of time), and I find It much more readable and understandable than the former one. It also drops the predicate library in favor of plain functions returning Result<(), Error> packed in a trait.

The big downside currently is that the components are now coupled (ie. the Scheduler has a reference to the Verifier and the Rpc components).

I haven’t found a way to decouple them without having to extract some of the logic into what would be the Demuxer router/event-loop.

I believe that the only way to get both decoupling + simple control-flow would be to go with async/await and have components communicate via channels with the Demuxer, and wait for specific responses. I will try to sketch something about that this weekend so that we can discuss both versions on Monday.


  • Referenced an issue explaining the need for the change
  • Updated all relevant documentation in docs
  • Updated all code comments where relevant
  • Wrote tests
  • Updated CHANGES.md

@brapse
Copy link
Contributor

brapse commented Apr 27, 2020

Just to ensure that I understand what we are trying to solve here. The previous light-spike defined the light-client process as an iterator where each iteration would handle a single event processed by a single component and multiple iterations were necessary to complete a “flow”. This granular process made it very testable but hard to read as your brain had to keep track of multiple iteration to see the whole flow.

This is definitely a valid criticism. We defined it inductively which can be a challenge to read. If it takes three iterations through three different components to get anything meaningful done then we constantly have three things in our head.

In this version we get rid of the iterative nature and just call a function which passes control through the different components. It seems similar although clearer than the current version on master. As you mentioned, we keep the components coupled here which I would say is a significant downside.

I don’t know if async would help here or if we need more complex channel communication. Instead I think we just need to solve the granularity problem directly. With a demuxer (as outlined in ADR-006 which handles state as well routing control between components. This way we can have clear separation of concerns and execute entire flows as a single synchronous function calls.

image

I hope this approach might be the best of both worlds in terms of testability and readability while using minimal language features.

@romac
Copy link
Member Author

romac commented Apr 27, 2020

Here's take 3: https://github.com/informalsystems/tendermint-rs/tree/romac/light-spike-chan/light-spike

This version uses async channels to decouple components, but can still be run synchronously on a single thread.

Doing this gave me an idea for decoupling components without relying on async-await, so expect yet another take on this soon.

@romac
Copy link
Member Author

romac commented Apr 27, 2020

Take 4: https://github.com/informalsystems/tendermint-rs/tree/romac/light-spike-sync-decouple/light-spike

This version removes all async/await from take 3, and replaces the BiChan with a Router trait which can then be passed to the various components for them to call when they need to query other components. This trait is implemented for the Demuxer, but could be implemented for a standalone struct for the purpose of mocking/testing/etc.

@romac
Copy link
Member Author

romac commented Apr 28, 2020

Just updated this PR to take 4.

@romac romac force-pushed the romac/light-spike-bis branch 2 times, most recently from e233d8d to 0e58ab3 Compare May 5, 2020 16:21
@romac romac requested a review from brapse May 25, 2020 14:00
@romac romac mentioned this pull request May 25, 2020
19 tasks

pub struct ProdIo {
rpc_clients: HashMap<PeerId, rpc::Client>,
peer_map: HashMap<PeerId, tendermint::net::Address>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to have the abstraction of peerIDs here from the onset 👍

#[pre(light_store.latest(VerifiedStatus::Verified).is_some())]
#[post(valid_schedule(ret, target_height, next_height, light_store))]
pub fn schedule(
light_store: &dyn LightStore,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to pass the whole store if all we need is the latest trusted height?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to match the spec, as @josef-widder suggested the scheduler could perform some optimizations as to what height to verify next based on what headers of nearby heights are already available in the store, but this is not implemented yet.

// -----------------------------------------------------------------------------
// Everything below is a temporary workaround for the lack of `provider` field
// in the light blocks serialized in the JSON fixtures.
// -----------------------------------------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @greg-szabo @Shivani912 this is relevant to testing / serialization

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be implementation specific to me. Go code has a different approach to identify a node as a primary/secondary provider and does not deal with addresses. But also, this approach looks cleaner and reduces a lot of data from test files. Will need some more work around testing of Go code though. Need more thoughts here! @greg-szabo @liamsi @melekes

prost-amino = "0.5.0"
contracts = "0.4.0"
sled = "0.31.0"
serde_cbor = "0.11.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet another binary encoding 😱 😄 I assume this is the most reasonable choice to use in combination with sled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit I didn't give it too much thought. I initially considered just serializing keys and values to JSON and then to bytes, but that seemed a bit wasteful, so I figured using a dedicated binary encoding was better. In the end, aside from the extra dependencies, the choice of encoding does not matter since it is internal to the light store and specific to the choice of sled as a database. But I'd be happy to discuss alternatives, either for the binary encoding, or for sled :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal but couldnt prost work here for proto3?

Copy link
Member

@liamsi liamsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks amazing ❤️

Copy link
Contributor

@brapse brapse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great. Let's do the rest in the follow ups 👍

@brapse brapse merged commit 1ba8a2c into master May 27, 2020
@brapse brapse deleted the romac/light-spike-bis branch May 27, 2020 13:10
@romac romac mentioned this pull request Jun 9, 2020
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants