Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add jsonl streaming support #14898

Closed
wants to merge 3 commits into from
Closed

Conversation

nemanja-tosic
Copy link
Contributor

Issue link

https://issues.hibernatingrhinos.com/issue/RDBC-622

Additional description

Support for JSON streaming, more specifically line-delimited JSON, for query streaming. The intent is to reduce the amount of work done by the clients when parsing JSON stream. So far while testing locally, it is at least 2x faster to process a JSONL response vs a JSON response in a stream.

Type of change

  • New feature

How risky is the change?

  • Low

Backward compatibility

  • Ensured. Opt-in functionality.

Is it platform specific issue?

  • No

Documentation update

  • No documentation update is needed, or at least i didn't see any documentation for stream response formatting

Testing

  • It has been verified by manual testing
  • Would implement unit test but need help to set one up due to the dependencies AsyncBlittableJsonTextWriter needs.

Is there any existing behavior change of other features due to this change?

  • No

UI work

  • No UI work is needed

@github-actions github-actions bot added the v5.4 label Sep 9, 2022
Copy link
Member

@ayende ayende left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about if we make the format like this:

  • line oriented jsonl
  • each line is an object that has either {"Item": {}} for data, {"Error": {}} for error or {"Stats": {}} for query statistics.

That match the format we have as well as make it easy to distinguish between different types of values we want to return.


public void WriteQueryStatistics(long resultEtag, bool isStale, string indexName, long totalResults, DateTime timestamp)
{
throw new NotSupportedException();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For streaming, we sent this at the start of the operation. That can be done via the first item (that the client will know to ignore? Not sure that I like that, to be honest. Maybe we can send that as headers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about sending stats at the start of the operation, also was not sure what to make of it. I think the headers approach might be better now that you mention it. I guess the choice is down to what is easier for the clients to implement?

To add a second use-case as a potential tie-breaker, i was hoping to be able to pipe results directly into an another stream - say we have filtering/projections on raven and we want to respond directly back to a client over a RESTful API.

Another thought that comes to mind is that if we have to introduce a wrapper around individual items, then introducing headers as a second way to provide data might not be a good choice needed, as we need to process items anyway.


public ValueTask WriteErrorAsync(string error)
{
throw new NotSupportedException();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that errors should be in done in a way that would be breaking for the client.
What we want to ensure is that there is no way that they will accept that as a valid value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be a mix of errors and results, or if there is an error then the only thing in the response is the error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an error, then we report that and abort the rest of the operation

@nemanja-tosic
Copy link
Contributor Author

How about if we make the format like this:

  • line oriented jsonl
  • each line is an object that has either {"Item": {}} for data, {"Error": {}} for error or {"Stats": {}} for query statistics.

That match the format we have as well as make it easy to distinguish between different types of values we want to return.

Do we need a wrapper for included results as well?

@ayende
Copy link
Member

ayende commented Sep 9, 2022

For the included results, how would you tell if the value has an Error property or Statistics property or if it is one of ours?

@nemanja-tosic
Copy link
Contributor Author

For the included results, how would you tell if the value has an Error property or Statistics property or if it is one of ours?

Not following the question, although I've remembered that include is not supported for streaming so i guess nothing needs to be done there?

Otherwise, if we are in agreement, i would do what you suggested here

How about if we make the format like this:

line oriented jsonl

  • each line is an object that has either {"Item": {}} for data, {"Error": {}} for error or {"Stats": {}} for query statistics.
  • That match the format we have as well as make it easy to distinguish between different types of values we want to return.

as i believe it would give us everything we need, which would be:

  • the response is in JSONL format, which helps parse results
  • we are able to differentiate between Items, Errors and Stats

Copy link
Member

@ayende ayende left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@ravendb-bot
Copy link
Contributor

Hi!

Thank you for contributing into RavenDB project. Unfortunetely we detected that not all committers signed our CLA.

Following contributors are not in our database:

Please visit http://ravendb.net/contributors/cla/sign to sign CLA.

@nemanja-tosic
Copy link
Contributor Author

Signed the CLA, used a different email the first time around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants