Stream standards #31

djrtwo · 2020-04-20T18:52:37Z

It is desirable in many contexts to support streaming as a first class citizen in this API. Streaming/event type functionality is already seen to be useful in eth1 and is also currently being used by consumers of both the prysm API and lighthouse websockets.

[Actual protocol aside] The two methods discussed for supporting this in the API are specific stream endpoints or a single event stream.

stream endpoints
- designates /stream versions of a resource endpoint -- e.g. /resources/stream
- pro: allows for explicit, granular control in the API to designate which resources are valuable to stream
- con: adds overhead in maintaining as there are now poll and stream versions of (potentially) many endpoints
event stream
- an alternative brought up by @mcdee is to have one stream endpoint that sends events to the client. The client can then decide if/how to query info related to the events (e.g. client receives EVENT new_block 0xfffffff and then calls /beacon/chain/block/0xffffff to retrieve the alerted about block.
- pro: isolates stream functionality to one very specific place. arguably easier to maintain for client than several separate stream endpoints
- pro: keeps core API simpler by not having duplicate endpoints for stream vs not
- con: must define/maintain a separate list of events to be supported
- does stream allow for subscription for subsets of events? if so, server must maintain routing table.

The text was updated successfully, but these errors were encountered:

mkalinin · 2020-04-23T15:47:42Z

Clients will likely have to support filtering/routing if they follow either of these two approaches. For instance, eth_newFIlter provides that ability and new API should preserve that I guess. Streaming is more or less turns into a pub/sub pattern in our case.

There are basically not many types of objects that users would be able to subscribe to:

Beacon blocks
Beacon operations
Shard blocks
User transactions

IMO, handling them all by a single endpoint as suggested by @mcdee is more convenient. There could be an issue with muxing the data coming from different streams in parallel in the user's code. Having a single stream would get rid of this potential issue.

rolfyone · 2020-04-24T02:31:48Z

I would have thought it'd be a lot easier on the server side to have clients subscribing to specific services, that way you can gather metrics etc for what people are / aren't using, and only give events on specific topics to those that need them, not broadcast to every client.

mcdee · 2020-04-24T07:39:05Z

Streaming of data could be considered easier for clients, but not so much for servers. The problem with specific streams for each item is that there would be a lot of them, and they each require some logic. Taking an example where 10 deposits have been made and the user wants to be notified when they are added to state, there would need to be:

a streaming endpoint for validator information (all information? Just an identifier?)
with some sort of filtering (to avoid obtaining updates for all validators when we only care about 10)
and the ability to dynamically update the filter (in case the user added another deposit, a deposit proved to be invalid, etc.)

All possible, of course, but a fair amount of effort, and duplicated nearly-but-not-quite-the-same code (logic for a validator filter isn't the same as a block filter, for example).

If instead we had an streaming endpoint for server events the user would need to listen to the endpoint for "head updated" messages, check to see if the head is the start of a new epoch, and pull its validator information from the REST endpoint (either individually or en masse, depending on where we end up with the validator info endpoint). This requires less code for the server (only one streaming endpoint) and potentially less sophistication from the client (a single stream to which to listen, less state needed in case of a restart, no need to update filters on the fly, etc.)

A streaming endpoint also stops us using some of the features of the HTTP infrastructure that would otherwise be very useful, such as caching. If 1,000 clients were all streaming the latest block then the server would need to send 1,000 blocks each time a new block were received. With the streaming method all that would be sent would be the block root, reducing load. If the server had a cache in front of it, it would serve the first client request for the block and the other 999 would be served by the cache.

I started off thinking that streaming everything as a first-class citizen was important, but I'm more inclined now to believe that a model that streams only the minimum required (indicators such as "block received", transactions such as "voluntary exit" and "attestation", along with relevant data), with a single filter to only receive the events that the client wants to hear about, will provide both better functionality for the client and less work for the server (and devs).

mpetrunic mentioned this issue Aug 16, 2020

Add eventstream api #73

Merged

djrtwo closed this as completed in #73 Aug 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream standards #31

Stream standards #31

djrtwo commented Apr 20, 2020

mkalinin commented Apr 23, 2020

rolfyone commented Apr 24, 2020

mcdee commented Apr 24, 2020

Stream standards #31

Stream standards #31

Comments

djrtwo commented Apr 20, 2020

mkalinin commented Apr 23, 2020

rolfyone commented Apr 24, 2020

mcdee commented Apr 24, 2020