Skip to content

Provide an event reader API optimized for reading events for a single routing key #4087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
claudiofahey opened this issue Aug 9, 2019 · 1 comment
Labels

Comments

@claudiofahey
Copy link
Contributor

Problem description
Consider an example where hundreds of cameras are writing video to a single Pravega stream with dozens of segments. The routing key would be the camera ID. One use case is to perform analytics on the entire stream (all cameras) using Flink. This is covered perfectly by the current event reader API. However, another use case on the same stream is to read the video of just a single camera and display it on the screen or perform adhoc analysis on it. Our event reader API will not work here because it would require reading all segments, something that a simple single-threaded app cannot do.

I propose a new event reader API that accepts as input the stream name and the routing key. It will then return only those events that are in the segments that contain that routing key. This will limit the quantity of events that a reader would need to read to those of a single segment.

An alternative solution would be to use multiple streams, each with a small number of segments (e.g. cameras1-4, cameras5-8, cameras9-12). This is not ideal though because apps that did need to read all cameras would have to determine the names of all of the streams. Adding and removing cameras, as well as changing data rates, could be difficult. These are problems that Pravega solves very well when a single large stream is used.

Problem location
Pravega client event reader

Suggestions for an improvement
See above.
Bonus: Instead of limiting reads to a single routing key, sometimes a reader will want to read the segments of 2 or 3 routing keys. When # routing keys to read < # segments, this will still be better than reading the entire stream.

@tkaitchuck
Copy link
Member

One way to do this would be to have a reader side model where readers are 1:1 with segments. So every time there is a scaling event the number of readers changes. This would require a different API to manage this sort of dynamic creation of readers.

On the writer side it is possible to create a model were keys map directly to segments. (This would obviously be bad if there are a very large number of keys)

Taken together it would essentially provide a dynamic group of streams that are all related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants