Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge shortbytestring package back into bytestring wrt #444 #471

Merged
merged 73 commits into from
Feb 15, 2022

Conversation

hasufell
Copy link
Member

@Bodigrim
Copy link
Contributor

Awesome! @hasufell could you please make CI happy?

@hasufell
Copy link
Member Author

hasufell commented Jan 21, 2022

@Bodigrim I'm actually not sure how... opt seems to barf on ARM: https://cloud.drone.io/haskell/bytestring/284/2/2

[ 8 of 29] Compiling Data.ByteString.Builder.RealFloat.Internal ( Data/ByteString/Builder/RealFloat/Internal.hs, /drone/src/dist-newstyle/build/aarch64-linux/ghc-8.6.5/bytestring-0.11.3.0/build/Data/ByteString/Builder/RealFloat/Internal.o )
[ 9 of 29] Compiling Data.ByteString.Lazy.Internal ( Data/ByteString/Lazy/Internal.hs, /drone/src/dist-newstyle/build/aarch64-linux/ghc-8.6.5/bytestring-0.11.3.0/build/Data/ByteString/Lazy/Internal.o )
[10 of 29] Compiling Data.ByteString.Short.Internal ( Data/ByteString/Short/Internal.hs, /drone/src/dist-newstyle/build/aarch64-linux/ghc-8.6.5/bytestring-0.11.3.0/build/Data/ByteString/Short/Internal.o )
opt-6.0: /tmp/ghc725_0/ghc_123.ll:42130:42: error: use of undefined value '@memcmp$def'
  %ln1EwJ = bitcast i32 (i8*, i8*, i64)* @memcmp$def to i64 (i8*, i8*, i64)*

@hasufell
Copy link
Member Author

hasufell commented Jan 21, 2022

This appears to be a GHC bug (or similar to that): https://gitlab.haskell.org/ghc/ghc/-/issues/18857

@Bodigrim
Copy link
Contributor

@hasufell use this instead of raw memcmp:

compareByteArrays :: BA -> BA -> Int -> Int
#if MIN_VERSION_base(4,11,0)
compareByteArrays (BA# ba1#) (BA# ba2#) (I# len#) =
I# (compareByteArrays# ba1# 0# ba2# 0# len#)
#else
compareByteArrays (BA# ba1#) (BA# ba2#) len =
fromIntegral $ accursedUnutterablePerformIO $
c_memcmp_ByteArray ba1# ba2# (fromIntegral len)
foreign import ccall unsafe "string.h memcmp"
c_memcmp_ByteArray :: ByteArray# -> ByteArray# -> CSize -> IO CInt
#endif

@hasufell
Copy link
Member Author

@Bodigrim

Couldn't match expected type ‘BA’ with actual type ‘Ptr b1’

@hasufell hasufell force-pushed the hasufell/PR/merge-shortbytestring branch from cbb1651 to 0528711 Compare January 21, 2022 21:58
Data/ByteString/Short/Internal.hs Outdated Show resolved Hide resolved
Data/ByteString/Short/Internal.hs Show resolved Hide resolved
Data/ByteString/Short/Internal.hs Outdated Show resolved Hide resolved
Data/ByteString/Short/Internal.hs Outdated Show resolved Hide resolved
Data/ByteString/Short/Internal.hs Outdated Show resolved Hide resolved
Data/ByteString/Short/Internal.hs Outdated Show resolved Hide resolved
@hasufell
Copy link
Member Author

I'll try to run the test suite with this patch https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7133

@hasufell
Copy link
Member Author

I'll try to run the test suite with this patch https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7133

The following tests cause out of bounds errors:

  --, testProperty "read . show" $
  --  \x -> (x :: ShortByteString) === read (show x)

 --, testProperty "replicate" $
 --  \n (toElem -> c) -> B.unpack (B.replicate (fromIntegral n) c) === replicate n c

  --, testProperty "unfoldrN replicate" $
  --  \n (toElem -> c) -> fst (B.unfoldrN n (\t -> Just (t, t)) c) === B.replicate n c

So:

  1. there's a bug in replicate after switching to setByteArray
  2. there seems to be a problem with read/show... I've no idea how that happened, because I didn't even touch that code

@hasufell
Copy link
Member Author

  1. there seems to be a problem with read/show... I've no idea how that happened, because I didn't even touch that code

I was able to fix this with the following patch:

--- a/Data/ByteString/Short/Internal.hs
+++ b/Data/ByteString/Short/Internal.hs
@@ -442,7 +444,7 @@ packLenChars len cs0 =
     go :: MBA s -> Int -> [Char] -> ST s ()
     go !_   !_ []     = return ()
     go !mba !i (c:cs) = do
-      writeCharArray mba i c
+      writeWord8Array mba i (BS.c2w c)
       go mba (i+1) cs

Suggesting there's a problem with writeCharArray.

@hasufell
Copy link
Member Author

hasufell commented Jan 22, 2022

Another point: I was a little lax about the complexity documentation. E.g. I'm not sure if tail is considered O(n) (since copyByteArray is a primops... it should probably be O(1), since we're documenting time not space complexity?).

@Bodigrim
Copy link
Contributor

copyByteArray is still O(n), even if it is n/512.

@hasufell
Copy link
Member Author

Any idea how this would be out-of-bounds?

replicate :: Int -> Word8 -> ShortByteString
replicate w c
    | w <= 0    = empty
    | otherwise = create w (\mba -> setByteArray mba 0 w (fromIntegral c))

setByteArray :: MBA s -> Int -> Int -> Int -> ST s ()
setByteArray (MBA# dst#) (I# off#) (I# len#) (I# c#) =
    ST $ \s -> case setByteArray# dst# off# len# c# s of
                 s -> (# s, () #)

@hasufell hasufell force-pushed the hasufell/PR/merge-shortbytestring branch from c3f4bbb to 9bce444 Compare January 22, 2022 21:06
@Bodigrim
Copy link
Contributor

@hasufell see https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7133#note_403462

Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress.

Data/ByteString/Short/Internal.hs Outdated Show resolved Hide resolved
Data/ByteString/Short/Internal.hs Outdated Show resolved Hide resolved
Data/ByteString/Short/Internal.hs Outdated Show resolved Hide resolved
Data/ByteString/Short/Internal.hs Outdated Show resolved Hide resolved
Data/ByteString/Short/Internal.hs Outdated Show resolved Hide resolved
Data/ByteString/Short/Internal.hs Outdated Show resolved Hide resolved
Data/ByteString/Short/Internal.hs Show resolved Hide resolved
Data/ByteString/Short/Internal.hs Outdated Show resolved Hide resolved
@hasufell hasufell force-pushed the hasufell/PR/merge-shortbytestring branch from 3834ed8 to b647760 Compare February 7, 2022 18:32
Data/ByteString/Short/Internal.hs Outdated Show resolved Hide resolved
Data/ByteString/Short/Internal.hs Outdated Show resolved Hide resolved
Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last comments, nearly done.

bytestring.cabal Show resolved Hide resolved
Data/ByteString/Short/Internal.hs Outdated Show resolved Hide resolved
Comment on lines 1374 to 1376
partition f = \sbs -> if
| null sbs -> (sbs, sbs)
| otherwise -> bimap pack pack . List.partition f . unpack $ sbs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't want to do this now, please record this task on the issue tracker.

Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor perf puzzle left. Cheers! :)

-> Int -- bytes written to b1
-> Int -- bytes written to b2
-> ST s (Int, Int)
go' !br !bw1 !bw2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of these counters can be computed from the other two, e.g. bw2 = br - bw1. How does this affect the benchmarks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't do benchmarking reliably on my thinkpad. It's full of CPU throttling and whatnot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I have added this task to #350.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orig:

        mostlyTrueFast:  OK (0.35s)
          78.3 μs ± 5.0 μs
        mostlyFalseFast: OK (0.28s)
          63.8 μs ± 4.9 μs
        balancedFast:    OK (0.25s)
          119  μs ± 6.8 μs
        mostlyTrueSlow:  OK (0.28s)
          2.18 ms ± 104 μs
        mostlyFalseSlow: OK (0.14s)
          2.14 ms ± 193 μs
        balancedSlow:    OK (0.14s)
          2.21 ms ± 183 μs

With optimization:

        mostlyTrueFast:  OK (0.36s)
          73.7 μs ± 5.4 μs
        mostlyFalseFast: OK (0.26s)
          61.6 μs ± 2.9 μs
        balancedFast:    OK (0.23s)
          110  μs ± 8.7 μs
        mostlyTrueSlow:  OK (0.14s)
          2.16 ms ± 181 μs
        mostlyFalseSlow: OK (0.14s)
          2.15 ms ± 192 μs
        balancedSlow:    OK (0.14s)
          2.20 ms ± 206 μs

I think the only one that's noticable is balancedFast and mostlyTrueFast and that seems to be somewhat consistent across many reruns.

@hasufell
Copy link
Member Author

Can this be merged?

@sjakobi sjakobi merged commit 731caea into haskell:master Feb 15, 2022
@sjakobi
Copy link
Member

sjakobi commented Feb 15, 2022

Thanks, @hasufell!

@Bodigrim Bodigrim added this to the 0.11.3.0 milestone Feb 15, 2022
@Bodigrim Bodigrim linked an issue Feb 15, 2022 that may be closed by this pull request
sjakobi pushed a commit that referenced this pull request Feb 15, 2022
* Merge `shortbytestring` package back into `bytestring` wrt #444

* Fix build on ARM

Reusing compareByteArrays and avoiding
excessive pointer arithmetic.

* Speed up reverse by using byteSwap64 tricks

* Remove phase control from inlines

* Improve performance of elemIndex

* Use setByteArray in replicate

* Implement intercalate manually

* Annotate partial functions with HasCallStack

* Fix build on base < 4.12.0.0

* Add uncons/unsnoc

* Correct complexities

* Exclude reverse optimization path from ARM

It seems to cause segfaults on armv7, suggesting
there are issues with 'indexWord8ArrayAsWord64#'.

All other platforms are fine and tests pass.

* Add benchmarks for ShortByteString

* Improve inlining

* Adjust haddock identifiers

* Get rid of writeCharArray#

* Haddock fixes

* Clean up tests

* Use -fexpose-all-unfoldings

* Improve reverse

* Cleanup 'reverse'

* Fix possible GC race with foreign imports

For more information, see
  #471 (comment)

* Disable asserts in shortbytestring.c

* Remove redundant import

* Add documentation about partial functions

* Fold ShortByteString prop tests into ByteString

* Restore previous INLINEs

* Improve naming of bindings

* Consolidate error handling functions

* Remove trailing whitespace

* Fix uncons in documentation

* Rename indexWord64Array to indexWord8ArrayAsWord64

* Improve error message

* Clean up incorrect documentation

* Use div/mod instead of quot/rem

* Simplify branching in reverse

* Move asserts to Haskell

* Prefix C functions

* Fix return type of c_elem_index

* Fix documentation in unfoldrN

* Make unfoldrN more efficient

* Fix maintainer field

* Fix formatting

* Implement takeEnd, dropeEnd and splitAt manually

* Fix some haddock identifiers

* Fix unfoldrN doc

* Add a primops bounds-checking job to CI

* Document and clean up createAndTrim

* Rename errorEmptyList to errorEmptySBS

* Improve documentation for findFromEndUntil

* Improve documentation and naming

* Optimize out quotRem

* Document compareByteArraysOff

* Simplify findIndexOrLength and findFromEndUntil

* Use c_count for count

* Simplify elemIndex

* Remove use of 'mempty'

* Make sure breakSubstring is inlined into isInfixOf

* Simplify stripSuffix and stripPrefix

* Fix redundant import warnings

* Improve 'take'

* Use existing bounnds check in 'drop'

* Avoid 'create' when bytestring is empty

* Optimize filter

* Remove redundant INLINABLE

* Use shorter 'createAndTrim' in 'filter'

* Simplify 'take'

* Simplify 'drop'

* Better formatting

* Add comment to explain DNDEBUG

* Refactor elemIndex

* Optimize 'partition'

* Optimize hot loop in 'partition'

(cherry picked from commit 731caea)
Bodigrim pushed a commit that referenced this pull request Feb 15, 2022
* Merge `shortbytestring` package back into `bytestring` wrt #444

* Fix build on ARM

Reusing compareByteArrays and avoiding
excessive pointer arithmetic.

* Speed up reverse by using byteSwap64 tricks

* Remove phase control from inlines

* Improve performance of elemIndex

* Use setByteArray in replicate

* Implement intercalate manually

* Annotate partial functions with HasCallStack

* Fix build on base < 4.12.0.0

* Add uncons/unsnoc

* Correct complexities

* Exclude reverse optimization path from ARM

It seems to cause segfaults on armv7, suggesting
there are issues with 'indexWord8ArrayAsWord64#'.

All other platforms are fine and tests pass.

* Add benchmarks for ShortByteString

* Improve inlining

* Adjust haddock identifiers

* Get rid of writeCharArray#

* Haddock fixes

* Clean up tests

* Use -fexpose-all-unfoldings

* Improve reverse

* Cleanup 'reverse'

* Fix possible GC race with foreign imports

For more information, see
  #471 (comment)

* Disable asserts in shortbytestring.c

* Remove redundant import

* Add documentation about partial functions

* Fold ShortByteString prop tests into ByteString

* Restore previous INLINEs

* Improve naming of bindings

* Consolidate error handling functions

* Remove trailing whitespace

* Fix uncons in documentation

* Rename indexWord64Array to indexWord8ArrayAsWord64

* Improve error message

* Clean up incorrect documentation

* Use div/mod instead of quot/rem

* Simplify branching in reverse

* Move asserts to Haskell

* Prefix C functions

* Fix return type of c_elem_index

* Fix documentation in unfoldrN

* Make unfoldrN more efficient

* Fix maintainer field

* Fix formatting

* Implement takeEnd, dropeEnd and splitAt manually

* Fix some haddock identifiers

* Fix unfoldrN doc

* Add a primops bounds-checking job to CI

* Document and clean up createAndTrim

* Rename errorEmptyList to errorEmptySBS

* Improve documentation for findFromEndUntil

* Improve documentation and naming

* Optimize out quotRem

* Document compareByteArraysOff

* Simplify findIndexOrLength and findFromEndUntil

* Use c_count for count

* Simplify elemIndex

* Remove use of 'mempty'

* Make sure breakSubstring is inlined into isInfixOf

* Simplify stripSuffix and stripPrefix

* Fix redundant import warnings

* Improve 'take'

* Use existing bounnds check in 'drop'

* Avoid 'create' when bytestring is empty

* Optimize filter

* Remove redundant INLINABLE

* Use shorter 'createAndTrim' in 'filter'

* Simplify 'take'

* Simplify 'drop'

* Better formatting

* Add comment to explain DNDEBUG

* Refactor elemIndex

* Optimize 'partition'

* Optimize hot loop in 'partition'

(cherry picked from commit 731caea)
@Bodigrim Bodigrim linked an issue Feb 15, 2022 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC: consider splitting out shortbytestring Expand ShortByteStrings API?
4 participants