Remove the offset parameter completely #175

ekmett · 2019-06-05T22:03:46Z

This patch removes the offset parameter from the ByteString constructor, saving a word per ByteString and saving a ton of arithmetic.

On GHC < 8.0 this does introduce a change in the representation offered by Data.ByteString.Internal.

On GHC 8.0+ it provides a compatibility shim in the form of a pattern synonym that offers the existing behavior, so that even consumers of the .Internal API can continue to work unaffected.

It has been tested back to GHC 7.0.

This is a breaking change to a .Internal module, in that, as users might expect that if they inject PS fp ofs len then fp and ofs will come back unmodified or that the PS constructor is available (even on old GHCs).

Feel free to bikeshed whether or not this is a major version bump.

The API for Data.ByteString.Internal is extended with fromForeignPtr0 and toForeignPtr0 mimicing the use in Data.Vector, and also currently (re-)exports plusForeignPtr to make it available on older GHCs.

It may be worth bikeshedding whether Data.ByteString.Internal should be exporting plusForeignPtr, or if it belongs in an internal module.

The base version bounds are currently unmolested, but this has not been tested on GHC 6.12.1.

I have not been able to figure out how to run the bytestring benchmarks to see what if any improvements are provided by the code.

Data/ByteString/Internal.hs

sjakobi

Some haddocks for PS and plusForeignPtr would be good – the Internal modules are public now as far as Haddock is concerned.

Feel free to bikeshed whether or not this is a major version bump.

Better safe than sorry, so +1 for a major bump.

Data/ByteString/Internal.hs

hsyl20 · 2019-12-18T22:00:23Z

@ekmett Could you add {-# COMPLETE PS #-}?

hsyl20 · 2019-12-18T22:06:47Z

I have just tested with GHC. No regression and some test metrics (e.g. haddock.Cabal) decrease by 11% (see https://gitlab.haskell.org/hsyl20/ghc/pipelines/13837).

ekmett · 2019-12-19T06:48:29Z

@hsyl20 No objection here.

hvr · 2019-12-19T07:43:35Z

@hsyl20 this looks very promising!

I'd still want to also see a run of the bytestring microbenchmarks

sjakobi · 2019-12-19T18:12:34Z

I'd still want to also see a run of the bytestring microbenchmarks

I agree that it would be desirable to get results from bench-bytestring, but apparently that has been broken since at least 2015: See #52.

Personally I don't feel that this PR should be blocked on such a longstanding issue.

@hsyl20 Could you maybe reproduce the (most interesting) results from GHC in this thread? It's not obvious where to find them in the link that you shared.

hsyl20 · 2019-12-19T18:37:31Z

Metrics decrease for ARMv7 build:

=====> haddock.Cabal(normal) 3592 of 7320 [0, 0, 0]
runtime/bytes allocated decreased from armv7-linux-deb9 baseline @ HEAD~2:
    Expected    haddock.Cabal (normal) runtime/bytes allocated: 599138530.6666666 +/-5%
    Lower bound haddock.Cabal (normal) runtime/bytes allocated:         569181603 
    Upper bound haddock.Cabal (normal) runtime/bytes allocated:         629095457 
    Actual      haddock.Cabal (normal) runtime/bytes allocated:         533313116 
    Deviation   haddock.Cabal (normal) runtime/bytes allocated:             -11.0 %

Metrics decrease for i386 build:

=====> haddock.Cabal(normal) 3592 of 7320 [0, 0, 0]
runtime/bytes allocated decreased from i386-linux-deb9 baseline @ HEAD~2:
    Expected    haddock.Cabal (normal) runtime/bytes allocated: 598979616.0 +/-5%
    Lower bound haddock.Cabal (normal) runtime/bytes allocated:   569030635 
    Upper bound haddock.Cabal (normal) runtime/bytes allocated:   628928597 
    Actual      haddock.Cabal (normal) runtime/bytes allocated:   533388900 
    Deviation   haddock.Cabal (normal) runtime/bytes allocated:       -11.0 %

Other tests don't exceed the thresholds.

Bodigrim · 2020-05-11T21:03:31Z

Benchmarks have been fixed in master, so I gave them a go.

Benchmark                                   Before      After
--------------------------------------------------------------
/Small payload/mempty                      2.024 ns   1.912 ns
/Small payload/ensureFree 8                2.072 ns   1.902 ns
/Small payload/intHost 1                   121.7 ns   103.4 ns
/Encoding wrappers/foldMap word8 (10000)   101.0 μs   85.58 μs

/Encoding wrappers/primMapByteStringFixed word8 (10000)                              6.309 μs   5.825 μs
/Encoding wrappers/primMapLazyByteStringFixed word8 (10000)                          6.276 μs   5.854 μs
/ByteString insertion/foldMap byteStringInsert byteStringChunks4Data (10000)         311.1 μs   277.3 μs
/ByteString insertion/foldMap byteString byteStringChunks4Data (10000)               79.60 μs   105.4 μs
/ByteString insertion/foldMap byteStringCopy byteStringChunks4Data (10000)           113.9 μs   104.7 μs
/ByteString insertion/foldMap Blaze.insertByteString byteStringChunks4Data (10000)   337.3 μs   287.8 μs
/ByteString insertion/foldMap Blaze.fromByteString byteStringChunks4Data (10000)     116.8 μs   99.14 μs

/Non-bounded encodings/byteStringHex (10000)                7.805 μs   6.440 μs
/Non-bounded encodings/lazyByteStringHex (10000)            7.393 μs   6.417 μs
/Non-bounded encodings/foldMap floatDec (10000)             15.07 ms   12.13 ms
/Non-bounded encodings/foldMap doubleDec (10000)            29.33 ms   20.59 ms
/Non-bounded encodings/foldMap integerDec (small) (10000)   566.8 μs   514.2 μs
/Non-bounded encodings/foldMap integerDec (large) (10000)   6.752 ms   6.257 ms

.Prim/char7 (10000)      7.984 μs   8.442 μs
.Prim/char8 (10000)      7.828 μs   8.689 μs
.Prim/charUtf8 (10000)   18.69 μs   21.13 μs
.Prim/int8 (10000)       7.107 μs   7.614 μs
.Prim/word8 (10000)      7.092 μs   7.773 μs
.Prim/int16BE (10000)    9.797 μs   10.49 μs
.Prim/int32BE (10000)    13.03 μs   14.09 μs
.Prim/int64BE (10000)    19.73 μs   20.81 μs
.Prim/word16BE (10000)   9.800 μs   10.32 μs
.Prim/word32BE (10000)   13.08 μs   14.00 μs
.Prim/word64BE (10000)   22.68 μs   20.98 μs
.Prim/floatBE (10000)    15.67 μs   16.47 μs
.Prim/doubleBE (10000)   23.37 μs   25.34 μs
.Prim/int16LE (10000)    7.732 μs   7.580 μs
.Prim/int32LE (10000)    7.845 μs   7.726 μs
.Prim/int64LE (10000)    8.016 μs   7.948 μs
.Prim/word16LE (10000)   7.963 μs   7.597 μs
.Prim/word32LE (10000)   7.857 μs   7.650 μs
.Prim/word64LE (10000)   8.045 μs   7.911 μs
.Prim/floatLE (10000)    10.62 μs   11.08 μs
.Prim/doubleLE (10000)   13.38 μs   14.14 μs
.Prim/int16Host (10000)  7.174 μs   7.530 μs
.Prim/int32Host (10000)  7.277 μs   7.641 μs
.Prim/int64Host (10000)  7.469 μs   7.970 μs
.Prim/intHost (10000)    7.452 μs   7.885 μs
.Prim/word16Host (10000) 8.133 μs   7.768 μs
.Prim/word32Host (10000) 8.456 μs   7.687 μs
.Prim/word64Host (10000) 8.697 μs   8.654 μs
.Prim/wordHost (10000)   7.786 μs   8.004 μs
.Prim/floatHost (10000)  9.860 μs   10.05 μs
.Prim/doubleHost (10000) 13.13 μs   13.18 μs

The results look a bit ambiguous: some benchmarks are faster (up to 15-20%), some are slower for an unknown reason. Could someone else reproduce this result please?

While synthetic benchmarks are ambiguous, I think it is fair to assign more weight to GHC metrics, provided by @hsyl20, as a measure of practical impact. Also the PR simplifies a lot of stuff and is aesthetically pleasing as well :)

+1 for the major version bump.

sjakobi · 2020-07-02T20:35:14Z

In #234 and #241 there have been a few comments that connect this PR with dropping support for GHC < 7.10 or GHC < 8.0. I don't really see why this PR would make that step necessary though.

The PS compatibility shim introduced here is only available with GHC >= 8.0, but it is exposed only via Data.ByteString.Internal. AFAICT this PR does not introduce an incompatibility across GHC versions in the public API.

IMHO it's quite acceptable that users of the bytestring internals have to work a bit harder to achieve compatibility across GHC versions. So in my mind it would be fine to merge this PR while continuing to support GHC versions < 8.0.

Bodigrim · 2020-07-02T21:00:17Z

I think the point is that if we drop support of obsolete GHCs, this PR can be merged as a minor version bump.

I'm fine with either option or their combination :)

dcoutts · 2020-08-01T18:23:12Z

In case anyone was wondering, I'm totally ok with changing the internal representation if other people are ok with it.

Indeed I've always wanted to go further and change the representation to use ByteArray and not ForeignPtr as it would allow most bytestrings to be unpinned, and just pin the one we need/want to pin. So if we can change the representation once, perhaps we can do it again :-)

Bodigrim

LGTM

ekmett force-pushed the no-offset branch from 5c9aeda to 0103293 Compare June 5, 2019 22:05

Use plusForeignPtr (Close haskell#174)

873d832

ekmett force-pushed the no-offset branch from 0103293 to 873d832 Compare June 5, 2019 22:09

This was referenced Jun 6, 2019

Starting offset not necessary #174

Closed

Remove the offset parameter from Data.Vector.Storable haskell/vector#249

Closed

ccmtaylor reviewed Jun 6, 2019

View reviewed changes

Data/ByteString/Internal.hs Outdated Show resolved Hide resolved

clean up after myself

fe492ba

googleson78 reviewed Jun 6, 2019

View reviewed changes

Data/ByteString/Internal.hs Show resolved Hide resolved

sjakobi reviewed Dec 18, 2019

View reviewed changes

Data/ByteString/Internal.hs Show resolved Hide resolved

Data/ByteString/Internal.hs Show resolved Hide resolved

haddocks and COMPLETE pragma (requested by @sjakobi and @hsyl20)

debc496

hvr mentioned this pull request Dec 19, 2019

Quit using ForeignPtr in favor of ByteArray# #193

Closed

vdukhovni mentioned this pull request Feb 25, 2020

Compute length at compile time for literal strings #191

Merged

Bodigrim added the blocked: next-major-version-bump blocked until a major version bump is happening label May 12, 2020

sjakobi linked an issue Jun 25, 2020 that may be closed by this pull request

Starting offset not necessary #174

Closed

hsyl20 mentioned this pull request Jul 31, 2020

Release planning #241

Closed

Bodigrim added 2 commits August 20, 2020 19:21

Merge branch 'master' into no-offset

3d24783

Fix PS in findIndexEnd

b53edb5

Bodigrim approved these changes Aug 21, 2020

View reviewed changes

Bodigrim added this to the 0.11.0.0 milestone Aug 21, 2020

Bodigrim merged commit 7ec0f76 into haskell:master Aug 22, 2020

vdukhovni mentioned this pull request Aug 24, 2020

Drop 7x #265

Merged

lehins mentioned this pull request Mar 16, 2021

Haskell base proposal snoyberg/snoyman.com#15

Merged

This was referenced Nov 8, 2021

Native support for bytestring >=0.11 wrengr/bytestring-lexing#11

Open

Native support for bytestring >=0.11 wrengr/bytestring-trie#17

Open

jappeace mentioned this pull request Jan 5, 2022

Problems with GHC 9.2 ocramz/xeno#53

Closed

andreasabel mentioned this pull request Feb 16, 2022

Allow bytestring 0.11? haskell-hvr/xor#1

Closed

j6carey mentioned this pull request Aug 8, 2022

Allow bytestring-0.11 in test awakesecurity/proto3-wire#88

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the offset parameter completely #175

Remove the offset parameter completely #175

ekmett commented Jun 5, 2019 •

edited

Loading

sjakobi left a comment

hsyl20 commented Dec 18, 2019

hsyl20 commented Dec 18, 2019

ekmett commented Dec 19, 2019

hvr commented Dec 19, 2019

sjakobi commented Dec 19, 2019 •

edited

Loading

hsyl20 commented Dec 19, 2019

Bodigrim commented May 11, 2020 •

edited

Loading

sjakobi commented Jul 2, 2020

Bodigrim commented Jul 2, 2020

dcoutts commented Aug 1, 2020

Bodigrim left a comment

Remove the offset parameter completely #175

Remove the offset parameter completely #175

Conversation

ekmett commented Jun 5, 2019 • edited Loading

sjakobi left a comment

Choose a reason for hiding this comment

hsyl20 commented Dec 18, 2019

hsyl20 commented Dec 18, 2019

ekmett commented Dec 19, 2019

hvr commented Dec 19, 2019

sjakobi commented Dec 19, 2019 • edited Loading

hsyl20 commented Dec 19, 2019

Bodigrim commented May 11, 2020 • edited Loading

sjakobi commented Jul 2, 2020

Bodigrim commented Jul 2, 2020

dcoutts commented Aug 1, 2020

Bodigrim left a comment

Choose a reason for hiding this comment

ekmett commented Jun 5, 2019 •

edited

Loading

sjakobi commented Dec 19, 2019 •

edited

Loading

Bodigrim commented May 11, 2020 •

edited

Loading