Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove String conversion in decoders. #671

Merged
merged 1 commit into from Nov 21, 2016

Conversation

@rpless
Copy link
Contributor

@rpless rpless commented Nov 19, 2016

Proposed fix for #511. The only thing I was unsure about was whether to use Buf.ByteArray.Owned or Buf.ByteArray.Shared for the libraries that can convert Array[Byte]. I wound up using Shared, but I'm not sure this is necessary.

@codecov-io
Copy link

@codecov-io codecov-io commented Nov 19, 2016

Current coverage is 74.34% (diff: 76.00%)

Merging #671 into master will decrease coverage by 0.32%

@@             master       #671   diff @@
==========================================
  Files            30         30          
  Lines           592        608    +16   
  Methods         566        586    +20   
  Messages          0          0          
  Branches         26         22     -4   
==========================================
+ Hits            442        452    +10   
- Misses          150        156     +6   
  Partials          0          0          

Powered by Codecov. Last update 2d479db...a5335d1

@vkostyukov
Copy link
Member

@vkostyukov vkostyukov commented Nov 19, 2016

@rpless Thanks a lot for looking into this!

Some suggestions:

  • Instead of Buf.ByteArray.Shared (which always copy), let's use this trick that allows retrieving a byte array from Netty's channel buffer. This heavily depends on the underlying implementation, but I'm fine with that for now.
  • Let run some benchmarks (both JMH and wrk) to see how this affects throughput (I'm excited to see the numbers!)

@vkostyukov
Copy link
Member

@vkostyukov vkostyukov commented Nov 19, 2016

Trying to be more specific here.

If you need ByteBuffer, the cheapest way (at least for Netty 3) to get it is:

val byteBuffer = ChannelBufferBuf.Owned.extract(buf).toByteBuffer()

If you need Array[Byte], the cheapest way to get it is:

val channelBuffer = ChannelBufferBuf.Owned.extract(buf)
val (byteArray, offset, length) = 
  // assert channelBuffer.hasArray()
  (channelBuffer.array(), channelBuffer.readerIndex(), channelBuffer.readableBytes()) 

@rpless rpless force-pushed the remove-json-string-conversions branch from 7f464cf to 5194731 Nov 19, 2016
@rpless
Copy link
Contributor Author

@rpless rpless commented Nov 19, 2016

I ran the wrk benchmarks for Finch + circe. The results are here.
As for jmh benchmarks, unless I missed something, it doesn't look like we have ones that exercise the the json decoders anymore.

@vkostyukov
Copy link
Member

@vkostyukov vkostyukov commented Nov 19, 2016

Nice!

@rpless Do you mind keep the server running and run wrk for 3 times. Usually, it takes some time for JVM to warmup so using the results from 3rd run seems like the closest we can get to the steady state.

Copy link
Member

@vkostyukov vkostyukov left a comment

Just a few nits from me. Really nice job!

@@ -1,6 +1,8 @@
package io.finch.argonaut

import argonaut.{CursorHistory, DecodeJson, Json}
import com.twitter.finagle.netty3.ChannelBufferBuf
import com.twitter.io.Charsets
Copy link
Member

@vkostyukov vkostyukov Nov 19, 2016

Please, use StandardCharsets from JDK. c.t.i.Charsets are deprecated in the most recent Finagle release.

)
implicit def decodeCirce[A: Decoder]: Decode.Json[A] = Decode.json({ (b, cs) =>
val attemptJson = cs match {
case Charsets.Utf8 => parseByteBuffer(ChannelBufferBuf.Owned.extract(b).toByteBuffer()).right.flatMap(_.as[A])
Copy link
Member

@vkostyukov vkostyukov Nov 19, 2016

Is scalacheck happy with that line length?

Copy link
Contributor Author

@rpless rpless Nov 19, 2016

Yes, its under the line limit by 3 character. Happy to move it down a line if you think that's more readable.

Copy link
Member

@vkostyukov vkostyukov Nov 19, 2016

Nah, it's fine. Thanks!

@rpless rpless force-pushed the remove-json-string-conversions branch from 5194731 to b0bc72a Nov 19, 2016
@rpless
Copy link
Contributor Author

@rpless rpless commented Nov 19, 2016

Updated benckmarks, results after 3 runs:

Without String Conversion (this PR)

$ wrk -t4 -c24 -d30s -s examples/src/main/scala/io/finch/wrk/wrk.lua http://localhost:8081/
Running 30s test @ http://localhost:8081/
  4 threads and 24 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.68ms    3.67ms  52.43ms   90.18%
    Req/Sec    12.20k     2.26k   23.58k    67.25%
  1458406 requests in 30.09s, 137.69MB read
Requests/sec:  48471.67
Transfer/sec:      4.58MB

With String Conversion (Current Finch)

$ wrk -t4 -c24 -d30s -s examples/src/main/scala/io/finch/wrk/wrk.lua http://localhost:8081/
Running 30s test @ http://localhost:8081/
  4 threads and 24 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.62ms    3.22ms  62.64ms   89.91%
    Req/Sec    10.84k     2.44k   22.20k    68.33%
  1296147 requests in 30.10s, 122.37MB read
Requests/sec:  43067.06
Transfer/sec:      4.07MB

@vkostyukov
Copy link
Member

@vkostyukov vkostyukov commented Nov 19, 2016

Looks like 12.5% improvement in the throughput! Sweet!

val buf = ChannelBufferBuf.Owned.extract(b)
if (buf.hasArray) Try(Json.parse(new String(buf.array(), 0, buf.readableBytes(), cs)).as[A])
else Try(Json.parse(buf.toString(cs)).as[A])
})
Copy link
Contributor

@clhodapp clhodapp Nov 19, 2016

It looks like you might be parsing twice here by accident?

Copy link
Contributor Author

@rpless rpless Nov 19, 2016

No accident. Its based on this trick that @vkostyukov mentioned in earlier. There may be a better to express this (i.e. only having on call to Json.parse once and having the if/ else only return a Try[String]).

Copy link
Contributor Author

@rpless rpless Nov 19, 2016

@clhodapp I played around with it a little. Do you think this version makes the intent more clear?

val buf = ChannelBufferBuf.Owned.extract(b)
val bufAsString = Try(
  if (buf.hasArray) new String(buf.array(), 0, buf.readableBytes(), cs)
  else buf.toString(cs)
)
bufAsString.map(Json.parse(_).as[A])

@vkostyukov
Copy link
Member

@vkostyukov vkostyukov commented Nov 19, 2016

I think Chris is right. We call Json.parse two times and drop the first
result on the floor.

On Sat, Nov 19, 2016 at 11:14 AM Ryan Plessner notifications@github.com
wrote:

@rpless commented on this pull request.

In playjson/src/main/scala/io/finch/playjson/package.scala
#671:

@@ -11,10 +13,11 @@ package object playjson {
* @tparam A the type of the data to decode into
*/
implicit def decodePlayJson[A](implicit reads: Reads[A]): Decode.Json[A] =

  • // TODO: Eliminate toString conversion
  • // See #511
  • // PlayJson can parse from byte[] automatically detecting the charset.
  • Decode.json((b, cs) => Try(Json.parse(BufText.extract(b, cs)).as[A]))
  • Decode.json({ (b, cs) => Try(Json.parse(Buf.ByteArray.Shared.extract(b)).as[A])
  •  val buf = ChannelBufferBuf.Owned.extract(b)
    
  •  if (buf.hasArray) Try(Json.parse(new String(buf.array(), 0, buf.readableBytes(), cs)).as[A])
    
  •  else Try(Json.parse(buf.toString(cs)).as[A])
    
  • })

@clhodapp https://github.com/clhodapp I played around with it a little.
Do you think this version makes the intent more clear?

val buf = ChannelBufferBuf.Owned.extract(b)val bufAsString = Try(
if (buf.hasArray) new String(buf.array(), 0, buf.readableBytes(), cs)
else buf.toString(cs)
)
bufAsString.map(Json.parse(_).as[A])


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#671, or mute the thread
https://github.com/notifications/unsubscribe-auth/ABDDKwEmt9Qm3tSF-W2xoHHcsjNTIvoyks5q_0p5gaJpZM4K3Isa
.

@rpless
Copy link
Contributor Author

@rpless rpless commented Nov 19, 2016

Ah I see it now. The line above is the old code. I'll remove it.

@rpless rpless force-pushed the remove-json-string-conversions branch from b0bc72a to c7288c6 Nov 19, 2016
Decode.json((b, cs) => Try(Json.parse(BufText.extract(b, cs)).as[A]))
Decode.json({ (b, cs) =>
val buf = ChannelBufferBuf.Owned.extract(b)
if (buf.hasArray) Try(Json.parse(new String(buf.array(), 0, buf.readableBytes(), cs)).as[A])
Copy link
Member

@vkostyukov vkostyukov Nov 19, 2016

I think, Json.parse can parse byte[] directly. No need for new String.

Copy link
Contributor Author

@rpless rpless Nov 19, 2016

Json.Parse can take a byte[], but when I convert that line to Try(Json.parse(buf.array()).as[A]) it gives me the following in the

Throw(spray.json.JsonParser$ParsingException: Unexpected character '\u0000'

I think I'm missing something about how the ChannelBufferBuf holds onto the underlying array.

cs match {
case StandardCharsets.UTF_8 =>
val buf = ChannelBufferBuf.Owned.extract(b)
if (buf.hasArray) Try(JsonParser(new String(buf.array(), 0, buf.readableBytes(), cs)).convertTo[A])
Copy link
Member

@vkostyukov vkostyukov Nov 19, 2016

No need for new String.

Copy link
Contributor Author

@rpless rpless Nov 19, 2016

A similar issue exists here. The tests raise this error.

Throw(com.fasterxml.jackson.core.JsonParseException: Illegal character ((CTRL-CHAR, code 0)): only regular white space (\r, \n, \t) is allowed between tokens

value => Return(value)
)
)
implicit def decodeCirce[A: Decoder]: Decode.Json[A] = Decode.json({ (b, cs) =>
Copy link
Member

@vkostyukov vkostyukov Nov 19, 2016

Let's remove wrapping (). Just Decode.json { ... } is fine.

// Jackson can parse from byte[] automatically detecting the encoding.
Try(mapper.readValue(BufText.extract(b, cs), ct.runtimeClass.asInstanceOf[Class[A]]))
)
): Decode.Json[A] = Decode.json({ (b, cs) =>
Copy link
Member

@vkostyukov vkostyukov Nov 19, 2016

Please, remove (): Decode.json { ... }.

// See https://github.com/finagle/finch/issues/511
// PlayJson can parse from byte[] automatically detecting the charset.
Decode.json((b, cs) => Try(Json.parse(BufText.extract(b, cs)).as[A]))
Decode.json({ (b, cs) =>
Copy link
Member

@vkostyukov vkostyukov Nov 19, 2016

Please, remove wrapping (): Decode.json { ... }.

// See https://github.com/finagle/finch/issues/511
// SprayJson can parse from byte[] if it represents a UTF-8 string.
Decode.json((b, cs) => Try(BufText.extract(b, cs).parseJson.convertTo[A]))
Decode.json({ (b, cs) =>
Copy link
Member

@vkostyukov vkostyukov Nov 19, 2016

Please, remove wrapping (): Decode.json { ... }.

@rpless rpless force-pushed the remove-json-string-conversions branch from c7288c6 to 75768b7 Nov 19, 2016
@rpless
Copy link
Contributor Author

@rpless rpless commented Nov 19, 2016

Ok, it looks like using slice like this works: buf.array().slice(0, buf.readableBytes()). Does that make sense?

@vkostyukov
Copy link
Member

@vkostyukov vkostyukov commented Nov 19, 2016

Slice works but it allocates a new array. You should be able to bypass
readableBytes to the parse function.

On Sat, Nov 19, 2016 at 12:48 PM Ryan Plessner notifications@github.com
wrote:

Ok, it looks like using slice like this works: buf.array().slice(0,
buf.readableBytes()). Does that make sense?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#671 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABDDK-0w2n0qAoIwWMcWBZ8h16aUD1ooks5q_2CrgaJpZM4K3Isa
.

@rpless
Copy link
Contributor Author

@rpless rpless commented Nov 19, 2016

@vkostyukov
Copy link
Member

@vkostyukov vkostyukov commented Nov 19, 2016

Play takes InputStream though. Can we wrap our byte array with
ByteArrayInputStream and specify the length there?

For Spray, let's do System.arraycopy it's still better than 'new String'.

On Sat, Nov 19, 2016 at 1:10 PM Ryan Plessner notifications@github.com
wrote:

Yeah, I'm not a fan of the new array allocation, but it seems like neither
Spray Json nor Play Json expose a parsing function that takes the number of
bytes to read:

https://www.playframework.com/documentation/2.5.x/api/scala/index.html#play.api.libs.json.Json$

https://github.com/spray/spray-json/blob/master/src/main/scala/spray/json/JsonParser.scala#L28
(and also
https://github.com/spray/spray-json/blob/master/src/main/scala/spray/json/JsonParser.scala#L237
)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#671 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABDDK_tX58WsqNgUMHNRIP01LhdK2MDxks5q_2WagaJpZM4K3Isa
.

@rpless
Copy link
Contributor Author

@rpless rpless commented Nov 19, 2016

Can do for Spray, but it looks like I gave you the wrong docs for Play Json. We're currently on 2.3.x which does not provide a parse with InputStream. So we can either bump the Play version or System.arraycopy.

@vkostyukov
Copy link
Member

@vkostyukov vkostyukov commented Nov 19, 2016

Arraycopy works for me. Let's update it later.

On Sat, Nov 19, 2016 at 1:50 PM Ryan Plessner notifications@github.com
wrote:

Can do for Spray, but it looks like I gave you the wrong docs for Play
Json. We're currently on 2.3.x
https://www.playframework.com/documentation/2.3.x/api/scala/index.html#play.api.libs.json.Json%24
which does not provide a parse with InputStream. So we can either bump the
Play version or System.arraycopy.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#671 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABDDK22ipWPepiBj6q_IqnkB5M6wpEX7ks5q_28_gaJpZM4K3Isa
.

@rpless rpless force-pushed the remove-json-string-conversions branch from 75768b7 to 582b59a Nov 19, 2016
Decode.json((b, cs) => Try(Json.parse(BufText.extract(b, cs)).as[A]))
Decode.json { (b, cs) =>
val buf = ChannelBufferBuf.Owned.extract(b)
if (buf.hasArray) {
Copy link
Member

@vkostyukov vkostyukov Nov 20, 2016

Sorry to be a burden, but can we extract that into a function so we can reuse it?

Also, let's check that if array.length equals readableBytes we can skip copying.

Copy link
Contributor Author

@rpless rpless Nov 20, 2016

Its not a burden. Any particular place we should extract it to? I was thinking io.finch.Decode.

Copy link
Contributor Author

@rpless rpless Nov 20, 2016

Or maybe the internal package object?

Copy link
Member

@vkostyukov vkostyukov Nov 20, 2016

Yeah, let's make it part of internal (whatever format you prefer).

@rpless rpless force-pushed the remove-json-string-conversions branch from 582b59a to a5335d1 Nov 21, 2016
@vkostyukov
Copy link
Member

@vkostyukov vkostyukov commented Nov 21, 2016

Thanks again @rpless! Merging this.

@vkostyukov vkostyukov merged commit 80f0523 into finagle:master Nov 21, 2016
3 checks passed
@rpless rpless deleted the remove-json-string-conversions branch Nov 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants