Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance problem or benchmarking issue? #56

Closed
iustin opened this issue Dec 21, 2011 · 18 comments
Closed

Performance problem or benchmarking issue? #56

iustin opened this issue Dec 21, 2011 · 18 comments

Comments

@iustin
Copy link

iustin commented Dec 21, 2011

Hi,

I'm trying to compare the performance of json (Text.JSON) and aeson, and I get surprising numbers. I apologise in advance if in fact the problem is with my benchmark setup, rather then actual performance issue.

I have the following benchmark program:

import Criterion.Main
import Control.DeepSeq
import qualified Data.ByteString.Lazy as BL
import qualified Text.JSON as J
import qualified Data.Aeson as A

instance (NFData v) => NFData (J.JSObject v) where
  rnf o = rnf (J.fromJSObject o)

instance NFData J.JSValue where
  rnf J.JSNull = ()
  rnf (J.JSBool b) = rnf b
  rnf (J.JSRational a b) = rnf a `seq` rnf b `seq` ()
  rnf (J.JSString s) = rnf (J.fromJSString s)
  rnf (J.JSArray lst) = rnf lst
  rnf (J.JSObject o) = rnf o

encodeJ = length . J.encode

encodeA = BL.length . A.encode

decodeJ :: String -> J.JSObject J.JSValue
decodeJ s =
  case J.decodeStrict s of
    J.Ok v -> v
    J.Error e -> error "fail to parse via JSON"

decodeA :: BL.ByteString -> A.Value
decodeA s = case A.decode' s of
              Nothing -> error "fail to parse via Aeson"
              Just v -> v
main = do
  js <- readFile "config.data"
  as <- BL.readFile "config.data"
  let jdata = decodeJ js
      adata = decodeA as
  defaultMain [
        bgroup "decode" [ bench "json"  $ nf decodeJ js
                        , bench "aeson" $ nf decodeA as
                        ],
        bgroup "encode" [ bench "json"  $ nf encodeJ jdata
                        , bench "aeson" $ nf encodeA adata
                        ]
       ]

Run on an about 1.1MB json input file, with ghc 6.12.1 and aeson 0.4.0 and json 0.4.3 it gives the following:

decode/json  mean: 142.8555 ms, lb 142.5682 ms, ub 143.7004 ms, ci 0.950
decode/aeson mean: 144.2851 ms, lb 142.2344 ms, ub 146.7814 ms, ci 0.950
encode/json  mean: 45.12455 ms, lb 44.05486 ms, ub 46.67985 ms, ci 0.950
encode/aeson mean: 56.76156 ms, lb 56.71679 ms, ub 56.81222 ms, ci 0.950

I would expect aeson to be faster, but the numbers are really really close, so I'm not sure what I'm doing wrong here. Any hints? It almost looks like I'm not testing the actual encoding/decoding.

Unfortunately I can't provide actual JSON file easily, but I can try and give a sanitised one if that's needed to help debug the issue.

basvandijk added a commit to basvandijk/aeson that referenced this issue Dec 21, 2011
@basvandijk
Copy link
Member

My patch dcc6b73 adds your benchmark to aeson. I do get some different numbers when running it on benchmarks/json-data/jp100.json with GHC-7.4.1-rc1.

@basvandijk
Copy link
Member

@iustin note that the sizes of the encoded json values are different:

print $ encodeJ jdata
print $ encodeA adata
59138
69430

That probably explains why aeson is slower. Now we need to figure out why they differ in size.

EDIT: I think the difference can be explained by the fact that encodeJ jdata denotes the number of characters in the encoded JSON value while encodeA adata denotes the number of bytes. Now I'm not sure this explains the difference in performance.

@iustin
Copy link
Author

iustin commented Dec 22, 2011

Bas, thanks for taking a look at this.

encodeJ indeed returns the length of the string, whereas encodeA the length of the bytestring. I added the length/BL.length just to "force" the string, since I didn't want to add an NFData instance for bytestring - maybe a proper test would do that, instead of the hack of length.

What worries me more is that on your benchmark (#57) is that you get approx 4x faster decoding time with aeson, whereas for my data file it's the same. Probably that means there are some JSON constructs that are not handled well on decode by aeson.

I started these benchmarks mostly becasuse I was expecting, I don't know, about 5x or more speed difference, even just from the use of ByteString versus String. The fact that we get roughly the same numbers makes me sad… Well, for now I can stay with json and not think yet about migration, that is all.

thanks again!

@basvandijk
Copy link
Member

What worries me more is that on your benchmark (#57) is that you get approx 4x faster decoding time with aeson, whereas for my data file it's the same.

Do note that I also used GHC-7.4.1-rc1 and you used ghc-6.12.1. That could also make a big difference.

@iustin
Copy link
Author

iustin commented Dec 22, 2011

Yeah, but I thought that it would make more or less the same for json library itself, too.

With my current compiler, against your data file, I get the following:

decode/json  mean: 14.69207 ms, lb 14.61777 ms, ub 14.78086 ms, ci 0.950
decode/aeson mean: 6.007586 ms, lb 5.868969 ms, ub 6.193423 ms, ci 0.950
encode/json  mean: 1.004071 ms, lb 997.8509 us, ub 1.012111 ms, ci 0.950
encode/aeson mean: 1.587319 ms, lb 1.562910 ms, ub 1.621568 ms, ci 0.950

So yes, it seems that file is somehow "biased" toward aeson on decoding performance (and towards json on encoding). I'll try and sanitise my own JSON file which shows similar performance, maybe it can help uncover some corner-case behaviour in aeson.

@bos
Copy link
Collaborator

bos commented Dec 22, 2011

I've looked into this a little, and cleaned up the benchmark some. You can see the improved benchmark here.

What I see (using GHC 7.2) is that decoding is far faster with aeson (4x to 6x), as I expected.

Encoding is only a little faster, though, and this is definitely strange. I can't find any comparative performance numbers from older versions, so I don't yet know whether this is a regression or what.

Regardless, now that I know about this, and have a benchmark, I should hopefully be able to do something about it. Thanks for bringing this up!

@bos
Copy link
Collaborator

bos commented Dec 23, 2011

I managed to improve encoding performance by about 15% in 9169e42. Also, in 833c8fd I noticed that the json encoder wasn't generating a lazy UTF-8 bytestring like the aeson encoder, so I fixed that to make the comparison fairer.

There's still a performance gap: aeson's encoding is about 3% slower than json's.

@bos
Copy link
Collaborator

bos commented Dec 23, 2011

I just released new versions of text and aeson: used together, they improve encoding performance by 20% compared to the previous releases of those packages.

I can't currently see a way to make encoding faster again, but we're now faster than the json package (if only by a little).

I'm going to leave this open for a little while, to remind me to look at encoding performance again.

@iustin
Copy link
Author

iustin commented Dec 23, 2011

Thanks a lot, sounds good. This makes it feasible to move to aeson without a regression in speed, which is excellent.

@hvr
Copy link
Member

hvr commented Dec 23, 2011

Jfyi, with GHC-7.4.1RC1 json is faster than aeson...

e.g. with GHC-7.2.2, the encode/en/{aeson,json} benchmark measures 1.57ms and 1.79ms respectively, wheras with GHC-7.4.1RC1 the measurements are 2.07ms and 1.51ms respectively... effectively json gets absolutely faster with GHC7.4.1RC1, wheras aeson gets absolutely slower...

I've tested with aeson-0.4.0.1 and text-0.11.1.11

(see git clone git://gist.github.com/1513833.git for my measurements)

@bos
Copy link
Collaborator

bos commented Dec 23, 2011

I just pushed new versions of aeson and text again. The new version of aeson has 33% better encoding performance than 0.4 (so about another 10% on top of yesterday's release).

Handily enough, improving aeson involved making some UTF-8 encoding and string breaking improvements to text that will benefit everyone.

@bos
Copy link
Collaborator

bos commented Dec 23, 2011

@hvr, do you think you could try with the new releases? I'll take a look when I get a chance, but it would be nice to have some help :-)

@hvr
Copy link
Member

hvr commented Dec 24, 2011

@bos I've added benchmark measurements for the text-0.11.1.12 + aeson-0.5.0.0 combination to the aforementioned gist...

It looks a bit better now, although there still seems to be a tendency for GHC-7.4 to optimize the json package slightly better than aeson, whereas GHC-7.2 optimizes aeson better than json...

@meiersi
Copy link
Contributor

meiersi commented Feb 6, 2012

I've had a go at this issue using my branch of the bytestring library, which contains the new bytestring builder. The results are encouraging. Using GHC 7.2.1 on an i7 on 64-bit Linux, I get a

  • 19% speedup for encoding English JSON values,
  • 20% speedup for encoding Japanese JSON values, and
  • 5x speedup for encoding Integers.

Moreover, I also get a

  • 20% speedup for UTF-8 encoding Japanese Text values and
  • 15% speedup for UTF-8 encoding English `Text values.

I attribute these speedups to the following three improvements:

  1. We use encodeUtf8Escaped :: B.BoundedEncoding Word8 -> Text -> B.Builder, a hand-coded routine for fused UTF-8 encoding and escaping of ASCII characters to a Builder. This function couples the index variables of the input Text value and the output BufferRange of the Builder. This saves one buffer check per character, which I see as the primary reason for the speedup in UTF-8 encoding.
    BTW: Here are preliminary haddocs for BoundedEncoding.
  2. We no longer build an intermediate Text value, but fuse UTF-8 encoding with JSON escaping. This together with the efficient traversal of Text values allows us to realize the 20% speedup for encoding JSON values.
  3. The 5x speedup for encoding Int values is due to the C-based implementation for decimal encoding of Ints and Integers that I implemented for the new Builder.

Here are the results of the benchmarks and links to the corresponding branch of the text repository and the corresponding branch of the aeson repository.

Note that JSON is still 2x faster for encoding Double values. For the new new bytestring builder this is no surprise, as it does not have special support for encoding IEEE floating point values. It just shows them and then encodes the resulting String. This is only slightly slower than the functions in blaze-textual, but means a lot less maintenance overhead. I expect production code to use the double-conversion package.

Based on these results, I suggest to let this issue wait some more until my patch to the bytestring library has found its way upstream. Actually, I still have to send it to Duncan first. This will happen in the next two weeks.

@hvr
Copy link
Member

hvr commented Feb 7, 2012

@meiersi sweet... I really hope the the new bytestring library gets released soon, as I can really use those improvements... :-)

@bos
Copy link
Collaborator

bos commented Jan 2, 2014

I've merged @meiersi's changes to the text library into the main repo; see for instance b8c8f11923d46b5423ce13a98343865c209e53df.

The new functionality will only be available from the text package if the environment contains a version of bytestring >= 0.10.4.0, which for now limits it to the perpetually upcoming GHC 7.8. I had to add the conditionality myself.

Merging the corresponding changes to the aeson library looks much messier—I'll definitely need a clean stack of rebased commits to review, and they'll have to come with backwards compatibility baked in.

@meiersi
Copy link
Contributor

meiersi commented Jan 3, 2014

That's good news. Thanks for the merge. I've prepared a pull request for text (haskell/text#63), which polishes the builder integration a bit. I've also prepared a pull request for aeson (#172) that realizes the 35% to 40% speed improvement in a backwards compatible way.

tolysz pushed a commit to tolysz/aeson that referenced this issue May 18, 2015
@bos
Copy link
Collaborator

bos commented Jul 10, 2015

With the new toEncoding support in HEAD, it's possible to encode straight from a regular Haskell value to a ByteString (okay, via a Builder) with no intermediate Value constructed.

The overall status is that we're now about as fast as we can be without hand-rolling something similar to the buffer-builder package.

@bos bos closed this as completed Jul 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants