Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(storage): expose ability to peek on stream readers #14901

Merged
merged 1 commit into from
Sep 4, 2019

Conversation

GeorgeMac
Copy link
Contributor

@GeorgeMac GeorgeMac commented Sep 2, 2019

Expose ability to Peek on reads.ResultSetStreamReader and reads.GroupResultSetStreamReader. The purpose being to allows creators of this type to pre-emptively peek at a frame before creating a cursor.

When creating multiple stream readers like these for multiple partitions and joining them into a single sequential or merged result set, a peek on each partition could be issued concurrently by the caller. This would allow us to quickly discard empty results, potentially reducing latency on reads.

  • CHANGELOG.md updated with a link to the PR (not the Issue)
  • Well-formatted commit messages
  • Rebased/mergeable
  • Tests pass
  • http/swagger.yml updated (if modified Go structs or API)
  • Documentation updated or issue created (provide link to issue/pr)
  • Signed CLA (if not already signed)

Copy link
Contributor

@stuartcarnie stuartcarnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

My understanding is the effects of latency will be reduced as we are priming the upstream requests concurrently. This can be summarized as follows:

  1. Client makes 1:N upstream requests and creates 1:N ResultSetStreamReader instances
  2. Client maintains a pool of M goroutines to concurrently prime each ResultSetStreamReader, which occurs by calling Peek.
  3. Client waits until all readers have been primed and then creates a merged ResultSet or GroupResultSet

The existing behavior of the merged ResultSet and GroupResultSet types will take care of any streams that are empty. Specifically, when Next is called and returns a null, EOF is assumed and the stream is discarded. These Next calls should return immediately, given they were primed in the earlier process.

@stuartcarnie
Copy link
Contributor

The next step would be to merge this and update the IDPE Store type to parallelize the calls to Peek. Store is used by queryd to make the requests to the storage nodes.

@GeorgeMac
Copy link
Contributor Author

@stuartcarnie my thoughts exactly 💯

@GeorgeMac GeorgeMac marked this pull request as ready for review September 4, 2019 09:14
@GeorgeMac GeorgeMac merged commit 8109d16 into master Sep 4, 2019
@GeorgeMac GeorgeMac deleted the gm/reads-peek branch September 4, 2019 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants