Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the offset parameter completely #175

Merged
merged 5 commits into from
Aug 22, 2020
Merged

Conversation

ekmett
Copy link
Member

@ekmett ekmett commented Jun 5, 2019

This patch removes the offset parameter from the ByteString constructor, saving a word per ByteString and saving a ton of arithmetic.

On GHC < 8.0 this does introduce a change in the representation offered by Data.ByteString.Internal.

On GHC 8.0+ it provides a compatibility shim in the form of a pattern synonym that offers the existing behavior, so that even consumers of the .Internal API can continue to work unaffected.

It has been tested back to GHC 7.0.

This is a breaking change to a .Internal module, in that, as users might expect that if they inject PS fp ofs len then fp and ofs will come back unmodified or that the PS constructor is available (even on old GHCs).

Feel free to bikeshed whether or not this is a major version bump.

The API for Data.ByteString.Internal is extended with fromForeignPtr0 and toForeignPtr0 mimicing the use in Data.Vector, and also currently (re-)exports plusForeignPtr to make it available on older GHCs.

It may be worth bikeshedding whether Data.ByteString.Internal should be exporting plusForeignPtr, or if it belongs in an internal module.

The base version bounds are currently unmolested, but this has not been tested on GHC 6.12.1.

I have not been able to figure out how to run the bytestring benchmarks to see what if any improvements are provided by the code.

Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some haddocks for PS and plusForeignPtr would be good – the Internal modules are public now as far as Haddock is concerned.

Feel free to bikeshed whether or not this is a major version bump.

Better safe than sorry, so +1 for a major bump.

Data/ByteString/Internal.hs Show resolved Hide resolved
Data/ByteString/Internal.hs Show resolved Hide resolved
@hsyl20
Copy link
Contributor

hsyl20 commented Dec 18, 2019

@ekmett Could you add {-# COMPLETE PS #-}?

@hsyl20
Copy link
Contributor

hsyl20 commented Dec 18, 2019

I have just tested with GHC. No regression and some test metrics (e.g. haddock.Cabal) decrease by 11% (see https://gitlab.haskell.org/hsyl20/ghc/pipelines/13837).

@ekmett
Copy link
Member Author

ekmett commented Dec 19, 2019

@hsyl20 No objection here.

@hvr
Copy link
Member

hvr commented Dec 19, 2019

@hsyl20 this looks very promising!

I'd still want to also see a run of the bytestring microbenchmarks

@sjakobi
Copy link
Member

sjakobi commented Dec 19, 2019

I'd still want to also see a run of the bytestring microbenchmarks

I agree that it would be desirable to get results from bench-bytestring, but apparently that has been broken since at least 2015: See #52.

Personally I don't feel that this PR should be blocked on such a longstanding issue.

@hsyl20 Could you maybe reproduce the (most interesting) results from GHC in this thread? It's not obvious where to find them in the link that you shared.

@hsyl20
Copy link
Contributor

hsyl20 commented Dec 19, 2019

Metrics decrease for ARMv7 build:

=====> haddock.Cabal(normal) 3592 of 7320 [0, 0, 0]
runtime/bytes allocated decreased from armv7-linux-deb9 baseline @ HEAD~2:
    Expected    haddock.Cabal (normal) runtime/bytes allocated: 599138530.6666666 +/-5%
    Lower bound haddock.Cabal (normal) runtime/bytes allocated:         569181603 
    Upper bound haddock.Cabal (normal) runtime/bytes allocated:         629095457 
    Actual      haddock.Cabal (normal) runtime/bytes allocated:         533313116 
    Deviation   haddock.Cabal (normal) runtime/bytes allocated:             -11.0 %

Metrics decrease for i386 build:

=====> haddock.Cabal(normal) 3592 of 7320 [0, 0, 0]
runtime/bytes allocated decreased from i386-linux-deb9 baseline @ HEAD~2:
    Expected    haddock.Cabal (normal) runtime/bytes allocated: 598979616.0 +/-5%
    Lower bound haddock.Cabal (normal) runtime/bytes allocated:   569030635 
    Upper bound haddock.Cabal (normal) runtime/bytes allocated:   628928597 
    Actual      haddock.Cabal (normal) runtime/bytes allocated:   533388900 
    Deviation   haddock.Cabal (normal) runtime/bytes allocated:       -11.0 %

Other tests don't exceed the thresholds.

@Bodigrim
Copy link
Contributor

Bodigrim commented May 11, 2020

Benchmarks have been fixed in master, so I gave them a go.

Benchmark                                   Before      After
--------------------------------------------------------------
/Small payload/mempty                      2.024 ns   1.912 ns
/Small payload/ensureFree 8                2.072 ns   1.902 ns
/Small payload/intHost 1                   121.7 ns   103.4 ns
/Encoding wrappers/foldMap word8 (10000)   101.0 μs   85.58 μs

/Encoding wrappers/primMapByteStringFixed word8 (10000)                              6.309 μs   5.825 μs
/Encoding wrappers/primMapLazyByteStringFixed word8 (10000)                          6.276 μs   5.854 μs
/ByteString insertion/foldMap byteStringInsert byteStringChunks4Data (10000)         311.1 μs   277.3 μs
/ByteString insertion/foldMap byteString byteStringChunks4Data (10000)               79.60 μs   105.4 μs
/ByteString insertion/foldMap byteStringCopy byteStringChunks4Data (10000)           113.9 μs   104.7 μs
/ByteString insertion/foldMap Blaze.insertByteString byteStringChunks4Data (10000)   337.3 μs   287.8 μs
/ByteString insertion/foldMap Blaze.fromByteString byteStringChunks4Data (10000)     116.8 μs   99.14 μs

/Non-bounded encodings/byteStringHex (10000)                7.805 μs   6.440 μs
/Non-bounded encodings/lazyByteStringHex (10000)            7.393 μs   6.417 μs
/Non-bounded encodings/foldMap floatDec (10000)             15.07 ms   12.13 ms
/Non-bounded encodings/foldMap doubleDec (10000)            29.33 ms   20.59 ms
/Non-bounded encodings/foldMap integerDec (small) (10000)   566.8 μs   514.2 μs
/Non-bounded encodings/foldMap integerDec (large) (10000)   6.752 ms   6.257 ms

.Prim/char7 (10000)      7.984 μs   8.442 μs
.Prim/char8 (10000)      7.828 μs   8.689 μs
.Prim/charUtf8 (10000)   18.69 μs   21.13 μs
.Prim/int8 (10000)       7.107 μs   7.614 μs
.Prim/word8 (10000)      7.092 μs   7.773 μs
.Prim/int16BE (10000)    9.797 μs   10.49 μs
.Prim/int32BE (10000)    13.03 μs   14.09 μs
.Prim/int64BE (10000)    19.73 μs   20.81 μs
.Prim/word16BE (10000)   9.800 μs   10.32 μs
.Prim/word32BE (10000)   13.08 μs   14.00 μs
.Prim/word64BE (10000)   22.68 μs   20.98 μs
.Prim/floatBE (10000)    15.67 μs   16.47 μs
.Prim/doubleBE (10000)   23.37 μs   25.34 μs
.Prim/int16LE (10000)    7.732 μs   7.580 μs
.Prim/int32LE (10000)    7.845 μs   7.726 μs
.Prim/int64LE (10000)    8.016 μs   7.948 μs
.Prim/word16LE (10000)   7.963 μs   7.597 μs
.Prim/word32LE (10000)   7.857 μs   7.650 μs
.Prim/word64LE (10000)   8.045 μs   7.911 μs
.Prim/floatLE (10000)    10.62 μs   11.08 μs
.Prim/doubleLE (10000)   13.38 μs   14.14 μs
.Prim/int16Host (10000)  7.174 μs   7.530 μs
.Prim/int32Host (10000)  7.277 μs   7.641 μs
.Prim/int64Host (10000)  7.469 μs   7.970 μs
.Prim/intHost (10000)    7.452 μs   7.885 μs
.Prim/word16Host (10000) 8.133 μs   7.768 μs
.Prim/word32Host (10000) 8.456 μs   7.687 μs
.Prim/word64Host (10000) 8.697 μs   8.654 μs
.Prim/wordHost (10000)   7.786 μs   8.004 μs
.Prim/floatHost (10000)  9.860 μs   10.05 μs
.Prim/doubleHost (10000) 13.13 μs   13.18 μs

The results look a bit ambiguous: some benchmarks are faster (up to 15-20%), some are slower for an unknown reason. Could someone else reproduce this result please?

While synthetic benchmarks are ambiguous, I think it is fair to assign more weight to GHC metrics, provided by @hsyl20, as a measure of practical impact. Also the PR simplifies a lot of stuff and is aesthetically pleasing as well :)

+1 for the major version bump.

@Bodigrim Bodigrim added the blocked: next-major-version-bump blocked until a major version bump is happening label May 12, 2020
@sjakobi sjakobi linked an issue Jun 25, 2020 that may be closed by this pull request
@sjakobi
Copy link
Member

sjakobi commented Jul 2, 2020

In #234 and #241 there have been a few comments that connect this PR with dropping support for GHC < 7.10 or GHC < 8.0. I don't really see why this PR would make that step necessary though.

The PS compatibility shim introduced here is only available with GHC >= 8.0, but it is exposed only via Data.ByteString.Internal. AFAICT this PR does not introduce an incompatibility across GHC versions in the public API.

IMHO it's quite acceptable that users of the bytestring internals have to work a bit harder to achieve compatibility across GHC versions. So in my mind it would be fine to merge this PR while continuing to support GHC versions < 8.0.

@Bodigrim
Copy link
Contributor

Bodigrim commented Jul 2, 2020

I think the point is that if we drop support of obsolete GHCs, this PR can be merged as a minor version bump.

I'm fine with either option or their combination :)

@hsyl20 hsyl20 mentioned this pull request Jul 31, 2020
@dcoutts
Copy link
Contributor

dcoutts commented Aug 1, 2020

In case anyone was wondering, I'm totally ok with changing the internal representation if other people are ok with it.

Indeed I've always wanted to go further and change the representation to use ByteArray and not ForeignPtr as it would allow most bytestrings to be unpinned, and just pin the one we need/want to pin. So if we can change the representation once, perhaps we can do it again :-)

Copy link
Contributor

@Bodigrim Bodigrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Bodigrim Bodigrim added this to the 0.11.0.0 milestone Aug 21, 2020
@Bodigrim Bodigrim merged commit 7ec0f76 into haskell:master Aug 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked: next-major-version-bump blocked until a major version bump is happening
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Starting offset not necessary
8 participants