-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge shortbytestring
package back into bytestring
wrt #444
#471
Merge shortbytestring
package back into bytestring
wrt #444
#471
Conversation
036b17c
to
5b22feb
Compare
5b22feb
to
b87de6c
Compare
Awesome! @hasufell could you please make CI happy? |
@Bodigrim I'm actually not sure how...
|
This appears to be a GHC bug (or similar to that): https://gitlab.haskell.org/ghc/ghc/-/issues/18857 |
@hasufell use this instead of raw bytestring/Data/ByteString/Short/Internal.hs Lines 529 to 540 in 6ff6ed4
|
|
cbb1651
to
0528711
Compare
I'll try to run the test suite with this patch https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7133 |
Reusing compareByteArrays and avoiding excessive pointer arithmetic.
The following tests cause out of bounds errors:
So:
|
I was able to fix this with the following patch: --- a/Data/ByteString/Short/Internal.hs
+++ b/Data/ByteString/Short/Internal.hs
@@ -442,7 +444,7 @@ packLenChars len cs0 =
go :: MBA s -> Int -> [Char] -> ST s ()
go !_ !_ [] = return ()
go !mba !i (c:cs) = do
- writeCharArray mba i c
+ writeWord8Array mba i (BS.c2w c)
go mba (i+1) cs Suggesting there's a problem with |
Another point: I was a little lax about the complexity documentation. E.g. I'm not sure if |
|
Any idea how this would be out-of-bounds? replicate :: Int -> Word8 -> ShortByteString
replicate w c
| w <= 0 = empty
| otherwise = create w (\mba -> setByteArray mba 0 w (fromIntegral c))
setByteArray :: MBA s -> Int -> Int -> Int -> ST s ()
setByteArray (MBA# dst#) (I# off#) (I# len#) (I# c#) =
ST $ \s -> case setByteArray# dst# off# len# c# s of
s -> (# s, () #) |
c3f4bbb
to
9bce444
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress.
3834ed8
to
b647760
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last comments, nearly done.
Data/ByteString/Short/Internal.hs
Outdated
partition f = \sbs -> if | ||
| null sbs -> (sbs, sbs) | ||
| otherwise -> bimap pack pack . List.partition f . unpack $ sbs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't want to do this now, please record this task on the issue tracker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor perf puzzle left. Cheers! :)
Data/ByteString/Short/Internal.hs
Outdated
-> Int -- bytes written to b1 | ||
-> Int -- bytes written to b2 | ||
-> ST s (Int, Int) | ||
go' !br !bw1 !bw2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of these counters can be computed from the other two, e.g. bw2 = br - bw1
. How does this affect the benchmarks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't do benchmarking reliably on my thinkpad. It's full of CPU throttling and whatnot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. I have added this task to #350.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orig:
mostlyTrueFast: OK (0.35s)
78.3 μs ± 5.0 μs
mostlyFalseFast: OK (0.28s)
63.8 μs ± 4.9 μs
balancedFast: OK (0.25s)
119 μs ± 6.8 μs
mostlyTrueSlow: OK (0.28s)
2.18 ms ± 104 μs
mostlyFalseSlow: OK (0.14s)
2.14 ms ± 193 μs
balancedSlow: OK (0.14s)
2.21 ms ± 183 μs
With optimization:
mostlyTrueFast: OK (0.36s)
73.7 μs ± 5.4 μs
mostlyFalseFast: OK (0.26s)
61.6 μs ± 2.9 μs
balancedFast: OK (0.23s)
110 μs ± 8.7 μs
mostlyTrueSlow: OK (0.14s)
2.16 ms ± 181 μs
mostlyFalseSlow: OK (0.14s)
2.15 ms ± 192 μs
balancedSlow: OK (0.14s)
2.20 ms ± 206 μs
I think the only one that's noticable is balancedFast
and mostlyTrueFast
and that seems to be somewhat consistent across many reruns.
Can this be merged? |
Thanks, @hasufell! |
* Merge `shortbytestring` package back into `bytestring` wrt #444 * Fix build on ARM Reusing compareByteArrays and avoiding excessive pointer arithmetic. * Speed up reverse by using byteSwap64 tricks * Remove phase control from inlines * Improve performance of elemIndex * Use setByteArray in replicate * Implement intercalate manually * Annotate partial functions with HasCallStack * Fix build on base < 4.12.0.0 * Add uncons/unsnoc * Correct complexities * Exclude reverse optimization path from ARM It seems to cause segfaults on armv7, suggesting there are issues with 'indexWord8ArrayAsWord64#'. All other platforms are fine and tests pass. * Add benchmarks for ShortByteString * Improve inlining * Adjust haddock identifiers * Get rid of writeCharArray# * Haddock fixes * Clean up tests * Use -fexpose-all-unfoldings * Improve reverse * Cleanup 'reverse' * Fix possible GC race with foreign imports For more information, see #471 (comment) * Disable asserts in shortbytestring.c * Remove redundant import * Add documentation about partial functions * Fold ShortByteString prop tests into ByteString * Restore previous INLINEs * Improve naming of bindings * Consolidate error handling functions * Remove trailing whitespace * Fix uncons in documentation * Rename indexWord64Array to indexWord8ArrayAsWord64 * Improve error message * Clean up incorrect documentation * Use div/mod instead of quot/rem * Simplify branching in reverse * Move asserts to Haskell * Prefix C functions * Fix return type of c_elem_index * Fix documentation in unfoldrN * Make unfoldrN more efficient * Fix maintainer field * Fix formatting * Implement takeEnd, dropeEnd and splitAt manually * Fix some haddock identifiers * Fix unfoldrN doc * Add a primops bounds-checking job to CI * Document and clean up createAndTrim * Rename errorEmptyList to errorEmptySBS * Improve documentation for findFromEndUntil * Improve documentation and naming * Optimize out quotRem * Document compareByteArraysOff * Simplify findIndexOrLength and findFromEndUntil * Use c_count for count * Simplify elemIndex * Remove use of 'mempty' * Make sure breakSubstring is inlined into isInfixOf * Simplify stripSuffix and stripPrefix * Fix redundant import warnings * Improve 'take' * Use existing bounnds check in 'drop' * Avoid 'create' when bytestring is empty * Optimize filter * Remove redundant INLINABLE * Use shorter 'createAndTrim' in 'filter' * Simplify 'take' * Simplify 'drop' * Better formatting * Add comment to explain DNDEBUG * Refactor elemIndex * Optimize 'partition' * Optimize hot loop in 'partition' (cherry picked from commit 731caea)
* Merge `shortbytestring` package back into `bytestring` wrt #444 * Fix build on ARM Reusing compareByteArrays and avoiding excessive pointer arithmetic. * Speed up reverse by using byteSwap64 tricks * Remove phase control from inlines * Improve performance of elemIndex * Use setByteArray in replicate * Implement intercalate manually * Annotate partial functions with HasCallStack * Fix build on base < 4.12.0.0 * Add uncons/unsnoc * Correct complexities * Exclude reverse optimization path from ARM It seems to cause segfaults on armv7, suggesting there are issues with 'indexWord8ArrayAsWord64#'. All other platforms are fine and tests pass. * Add benchmarks for ShortByteString * Improve inlining * Adjust haddock identifiers * Get rid of writeCharArray# * Haddock fixes * Clean up tests * Use -fexpose-all-unfoldings * Improve reverse * Cleanup 'reverse' * Fix possible GC race with foreign imports For more information, see #471 (comment) * Disable asserts in shortbytestring.c * Remove redundant import * Add documentation about partial functions * Fold ShortByteString prop tests into ByteString * Restore previous INLINEs * Improve naming of bindings * Consolidate error handling functions * Remove trailing whitespace * Fix uncons in documentation * Rename indexWord64Array to indexWord8ArrayAsWord64 * Improve error message * Clean up incorrect documentation * Use div/mod instead of quot/rem * Simplify branching in reverse * Move asserts to Haskell * Prefix C functions * Fix return type of c_elem_index * Fix documentation in unfoldrN * Make unfoldrN more efficient * Fix maintainer field * Fix formatting * Implement takeEnd, dropeEnd and splitAt manually * Fix some haddock identifiers * Fix unfoldrN doc * Add a primops bounds-checking job to CI * Document and clean up createAndTrim * Rename errorEmptyList to errorEmptySBS * Improve documentation for findFromEndUntil * Improve documentation and naming * Optimize out quotRem * Document compareByteArraysOff * Simplify findIndexOrLength and findFromEndUntil * Use c_count for count * Simplify elemIndex * Remove use of 'mempty' * Make sure breakSubstring is inlined into isInfixOf * Simplify stripSuffix and stripPrefix * Fix redundant import warnings * Improve 'take' * Use existing bounnds check in 'drop' * Avoid 'create' when bytestring is empty * Optimize filter * Remove redundant INLINABLE * Use shorter 'createAndTrim' in 'filter' * Simplify 'take' * Simplify 'drop' * Better formatting * Add comment to explain DNDEBUG * Refactor elemIndex * Optimize 'partition' * Optimize hot loop in 'partition' (cherry picked from commit 731caea)
@Bodigrim @sjakobi