Skip to content

Conversation

bgamari
Copy link
Collaborator

@bgamari bgamari commented Mar 23, 2016

This is a continuation of the effort moving binary to bytestring's Builder, started in #65. This branch is a rebase of the original branch. The performance story isn't so different from last year,

GHC 7.10.3
  Benchmark                                                     without patch               with patch              delta
  ---------------------------------------------                 --------------------        -------------------     -------
  bounds/[Word8]                                         :       145.39 ±  39.81 us          54.67 ±   2.85 us      -62.4%
  "Host endian/1MB of Word32 in chunks of 16"            :       240.59 ±   5.28 us         287.62 ±   6.03 us      +19.5%
  "Host endian/1MB of Word8 in chunks of 16"             :      2389.07 ±  44.81 us         733.90 ±  23.64 us      -69.3%
  "small ByteString"                                     :         0.23 ±   0.00 us           0.20 ±   0.01 us      -11.9%
  [Word8]                                                :        79.18 ±  21.08 us          47.96 ±   3.02 us      -39.4%
  "length-prefixed ByteString"                           :         7.18 ±   0.16 us           2.49 ±   0.05 us      -65.3%
  "Host endian/1MB of Word16 in chunks of 16"            :       338.87 ±   7.87 us         426.12 ±   9.44 us      +25.7%
  "large ByteString"                                     :         0.23 ±   0.00 us           0.19 ±   0.00 us      -14.5%
  "Host endian/1MB of Word64 in chunks of 16"            :       144.38 ±   3.18 us         163.12 ±   4.00 us      +13.0%

GHC 8.0.1-rc3
  Benchmark                                                     without patch               with patch              delta
  ---------------------------------------------                 --------------------        -------------------     -------
  bounds/[Word8]                                         :       129.69 ±   8.02 us          53.35 ±   1.26 us      -58.9%
  "Host endian/1MB of Word32 in chunks of 16"            :       229.63 ±   8.93 us         310.01 ±  18.97 us      +35.0%
  "Host endian/1MB of Word8 in chunks of 16"             :      1966.49 ± 137.72 us         852.61 ±  26.18 us      -56.6%
  "small ByteString"                                     :         0.23 ±   0.03 us           0.19 ±   0.01 us      -18.8%
  [Word8]                                                :        65.18 ±   1.54 us          46.18 ±   0.31 us      -29.2%
  "length-prefixed ByteString"                           :         7.75 ±   0.53 us           2.48 ±   0.07 us      -68.1%
  "Host endian/1MB of Word16 in chunks of 16"            :       319.70 ±  12.23 us         465.37 ±  16.16 us      +45.6%
  "large ByteString"                                     :         0.23 ±   0.02 us           0.18 ±   0.01 us      -18.3%
  "Host endian/1MB of Word64 in chunks of 16"            :       145.81 ±   6.18 us         167.79 ±   6.97 us      +15.1%

There are a few places where performance could certainly be improved, although this appears to be a net improvement. As expected, the regression in Host endian/1MB of Word8 in chunks of 16 persists, due to GHC #10012. It's still quite unclear what should be done about this.

Regardless, I have observed at least one instance in the wild where the intelligence that bytestring's Builder applies to deciding whether to add a chunk or copy would help remarkably (as binary's Builder ends up generating lazy bytestrings with many unreasonably small chunks).

Given the fact that the benchmarks seem to be a net improvement, I think we should consider merging this and examine how to improve performance further (which will likely require compiler work)

@kolmodin
Copy link
Member

The performance numbers are presented in a little different way than last year. The numbers present are from benching with the patch applied (I assume), but no comparison to without the patch?

Other than that, just like last year, since the patch is doing mostly good I agree that it's probably a good idea to merge it.

-----------------------------------------------------------------------------
-- |
-- Module : Data.Binary.Builder
-- Module : Data.Binary.Builder.Base
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The github UI is not helpful here. Was this file moved?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, just a rebase error it seems. Thanks for catching this.

@bgamari
Copy link
Collaborator Author

bgamari commented Mar 26, 2016

The performance numbers are presented in a little different way than last year. The numbers present are from benching with the patch applied (I assume), but no comparison to without the patch?

That is correct; I believe the only difference between the table I presented last time and this one is the addition of standard deviations. The methodology is similar, however.

@bgamari bgamari force-pushed the bytestring-builder branch from 93a49d0 to e61024b Compare March 26, 2016 09:30
------------------------------------------------------------------------

--
-- We rely on the fromIntegral to do the right masking for us.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment could be moved further down where we the fromIntegrals are.

Does the fromIntegral do any masking here? Looks like it's just converting between Int and Word.

@kolmodin
Copy link
Member

That is correct; I believe the only difference between the table I presented last time and this one is the addition of standard deviations. The methodology is similar, however.

Previously there was a comparison with/without the patch so the performance difference could be seen.
In this table there are only numbers with the patch (+standard deviation), but not without.

@bgamari
Copy link
Collaborator Author

bgamari commented Mar 27, 2016

Previously there was a comparison with/without the patch so the performance difference could be seen.
In this table there are only numbers with the patch (+standard deviation), but not without.

The table in this patch is similar: the first column is the benchmark name, the second is benchmark results without the patch, the third is with the patch, the last column is the percent change.

I've added to column headings to the table to make this more clear.

@bgamari bgamari force-pushed the bytestring-builder branch from e61024b to 3ad2041 Compare March 27, 2016 09:22
@bgamari bgamari force-pushed the bytestring-builder branch from 3ad2041 to 922592e Compare March 27, 2016 09:24
@kolmodin
Copy link
Member

Oh. There was no indication that I could scroll the text sideways (chrome on Mac), sorry. Now I see it all.

@bgamari
Copy link
Collaborator Author

bgamari commented Mar 30, 2016

@kolmodin do you expect to merge this? If so it would be great if it could happen soon so I can bump the version shipped with GHC 8.0.1-rc3.

@kolmodin
Copy link
Member

I ran the GenericBenchmark which showed a 10% increase in time spent in the "encode" benchmark.

The generic benchmark encodes/decodes many cabal PackageDescriptions which has a lot of Strings. I suspected slow strings to be an issue. I moved over instance Binary Char to use the putCharUtf8 type which you exported (but otherwise didn't use). This improved the performance and it's now 1-2% faster than the initial code without this patch.

Except for the builder benchmarks, I don't have any indication that other code has been slowed down.
I'm a little curious about data types other than Char in a more realistic setting than "serialize 1MB of Word32 in chunks" which nobody in their right mind would do. In general, your patch brings things in the right direction. Unfortunately it looked like this patch doesn't improve generic instances too much, at least not stringy ones.
I guess it's still good enough to merge. Would you be comfortable to immediately cut an GHC RC with this code?

Another idea would be to further improve strings.
Strings (being [Char]) are still encoded by calling putCharUtf8 multiple times. This is worse than letting bytestring builder encode the whole utf8 string in one go. Unfortunately we can't do that without having overlapping instances, or changing the Binary class.

We could try the old trick;

class Binary a where
   --- ... 

  putList :: [a] -> Put
  putList = defaultPutList

defaultPutList :: [a] -> Put
defaultPutList = ...

instance Binary Char where
  put = ...
  putList = {- use bytestring builder to write the whole thing in one go -}

instance Binary a => Binary [a] where
  -- ...
  put = putList

ByteString Builder says that should be more efficient.

@kolmodin
Copy link
Member

I'll work on merging this and will try to release it unless I run into something unexpected.

@bgamari
Copy link
Collaborator Author

bgamari commented Mar 31, 2016

Lennart Kolmodin notifications@github.com writes:

I'll work on merging this and will try to release it unless I run into something unexpected.

Thanks Lennart!

@kolmodin
Copy link
Member

kolmodin commented Apr 1, 2016

Why did you use the functions from Data.ByteString.Builder.Prim when the equivalent is in Data.ByteString.Builder?

@bgamari
Copy link
Collaborator Author

bgamari commented Apr 1, 2016

Lennart Kolmodin notifications@github.com writes:

Why did you use the functions from Data.ByteString.Builder.Prim when
the equivalent is in Data.ByteString.Builder?

Because in at least some cases (e.g. bytestring 0.10.0.0)
Data.ByteString.Builder does not expose host-endian variants.

However, if you want to tighten up the bounds a bit I suspect you can
just use Data.ByteString.Builder.

@kolmodin
Copy link
Member

kolmodin commented Apr 2, 2016

Because in at least some cases (e.g. bytestring 0.10.0.0)
Data.ByteString.Builder does not expose host-endian variants.

However, if you want to tighten up the bounds a bit I suspect you can
just use Data.ByteString.Builder.

Aha, ok. No, I think we can leave it then.

When using charUtf8 from Data.ByteString.Builder.Prim the generics-bench performs about the same as before this patch.

I also tried the putList trick mentioned above to be able to define how to encode lists of each individual type (does this trick have a name?). Then I could change so that String was encoded with Data.ByteString.Builder.stringUtf8 instead as list of Char.
That had a huge impact;

./analyze-criterion.py generics-bench-master.csv generics-bench-bb-char-putList.csv 
  encode                                                 :     46391.59 us       13193.91 us      -71.6%
  decode                                                 :     17764.30 us       17531.85 us      -1.3%
  "decode null"                                          :     10798.59 us       10663.63 us      -1.2%
  embarrassment/show                                     :     41642.51 us       43140.25 us      +3.6%
  embarrassment/read                                     :    659998.14 us      663419.86 us      +0.5%

I think that would motivate changing the Binary class to include putList. Naturally the same trick could be applied to other types where we would expect the same or even bigger speedups for types where the encoding length is statically known.

I'll wrap this up in a series of patches and merge it together with your work.

On a side note, while working on this I ran into something quite interesting. Try this;

  1. Add putList :: [a] -> Put to Binary with a default implementation.
  2. Don't override the default implementation of putList in any instances, and don't call it.
  3. encode in generics-bench is now 15% slower!

That is very unintuitive to me. What's going on?

@kolmodin
Copy link
Member

kolmodin commented Apr 2, 2016

I'll try to merge the changes this Sunday. Until then, you can have a look at this new branch at https://github.com/kolmodin/binary/tree/pr/bytestring-builder

@kolmodin kolmodin merged commit 922592e into haskell:master Apr 3, 2016
@kolmodin
Copy link
Member

kolmodin commented Apr 3, 2016

Merged, but still need some time to wrap things up for the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants