Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Streamed Json Encoding #3891

Merged

Conversation

ChristopherDavenport
Copy link
Member

Fixes #3890

Questions: Is this a good chunk size? should we make it configurable in other versions? how does this relate to the benches for cutoff?

This should see a reduction in buffer flushes of 2000x

@diesalbla
Copy link
Contributor

diesalbla commented Nov 19, 2020

Questions: Is this a good chunk size? should we make it configurable in other versions? how does this relate to the benches for cutoff?

Could or should the chunk size, turned into a parameter of the functions, even at a later PR?

@rossabaker rossabaker added this to the 0.21.10 milestone Nov 20, 2020
@rossabaker
Copy link
Member

Part of the untyped contract of encoders is that it flushes per chunk. It's the only way the user can control when partial results become available.

Instead of an arbitrary limit, could we intersperse each Chunk[Json] with a comma, and then intersperse those chunks as we do now? It would still be slow if each Json were its own chunk, but reasonable, and easy to control, if the Jsons came in larger chunks.

@ChristopherDavenport
Copy link
Member Author

That's what the most recent commit is. Each incoming chunk gets written, interpolated with the commas, and then rechunked.

The first chunk gets special treatment because it has different logic than the comma interpolation. Sorry, original PR description is now wrong with what the code implements.

As currently written outputs chunks
Chunk1 [ and first json
RestOfChunk1 , and then json in the same as the input chunk
RestofChunks The same as RestOfChunk1 except for the very last one which does not get the ,
ChunkContaining ], I really don't like this one, but don't have a good way to work it into the last chunk.

Copy link
Member

@rossabaker rossabaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah, this looks different than what I looked at. This is good, and should obviate the need for configuration (which was a good idea by @diesalbla in the old impl).

}
Chunk
.vector(bldr.result())
.flatMap(identity) // I know there must be a more efficient weay to do this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Chunk.concatBytes(bldr.result())?

@rossabaker rossabaker merged commit 2d56d12 into http4s:series/0.21 Nov 20, 2020
rossabaker added a commit that referenced this pull request Nov 20, 2020
case None => Pull.done
case Some((hd, tl)) =>
Pull.output(
Chunk.concatBytes(Vector(CirceInstances.openBrace, fromJsonToChunk(printer)(hd)))
) >> // Output First Json As Chunk with leading `[`
tl.repeatPull {
_.uncons.flatMap {
case None => Pull.pure(None)
case Some((hd, tl)) =>
val interspersed = {
val bldr = Vector.newBuilder[Chunk[Byte]]
bldr.sizeHint(hd.size * 2)
hd.foreach { o =>
bldr += CirceInstances.comma
bldr += fromJsonToChunk(printer)(o)
}
Chunk.concatBytes(bldr.result())
}
Pull.output(interspersed) >> Pull.pure(Some(tl))
}
}.pull
.echo
}.stream ++ Stream.chunk(closeBrace)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChristopherDavenport @rossabaker In the case of encoding an empty list using streams, I notice the closeBrace is printed always at then, but the openBrace is only printed if it has elements. If not, it will use Pull.done and the encoding will only be: ], right?

.echo
}.stream ++ Stream.chunk(closeBrace)

private final val openBrace: Chunk[Byte] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what purpose does 'final' serve since the val lives in an object?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember right, a final val in an object can be inlined only if it doesn't have a type annotation. I don't know how it matters here, if at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants