Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use qsort to sort short ByteString #267

Merged
merged 8 commits into from
Aug 25, 2020
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion Data/ByteString.hs
Original file line number Diff line number Diff line change
Expand Up @@ -1514,7 +1514,12 @@ tails p | null p = [empty]

-- | /O(n)/ Sort a ByteString efficiently, using counting sort.
sort :: ByteString -> ByteString
sort (BS input l) = unsafeCreate l $ \p -> allocaArray 256 $ \arr -> do
sort (BS input l)
-- qsort outperforms counting sort for small arrays
| l <= 20 = unsafeCreate l $ \ptr -> withForeignPtr input $ \inp -> do
memcpy ptr inp (fromIntegral l)
c_sort ptr (fromIntegral l)
| otherwise = unsafeCreate l $ \p -> allocaArray 256 $ \arr -> do

_ <- memset (castPtr arr) 0 (256 * fromIntegral (sizeOf (undefined :: CSize)))
withForeignPtr input (\x -> countOccurrences arr x l)
Expand Down
4 changes: 4 additions & 0 deletions Data/ByteString/Internal.hs
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ module Data.ByteString.Internal (
c_maximum, -- :: Ptr Word8 -> CInt -> IO Word8
c_minimum, -- :: Ptr Word8 -> CInt -> IO Word8
c_count, -- :: Ptr Word8 -> CInt -> Word8 -> IO CInt
c_sort, -- :: Ptr Word8 -> CInt -> IO ()
Bodigrim marked this conversation as resolved.
Show resolved Hide resolved

-- * Chars
w2c, c2w, isSpaceWord8, isSpaceChar8,
Expand Down Expand Up @@ -758,3 +759,6 @@ foreign import ccall unsafe "static fpstring.h fps_minimum" c_minimum

foreign import ccall unsafe "static fpstring.h fps_count" c_count
:: Ptr Word8 -> CULong -> Word8 -> IO CULong

foreign import ccall unsafe "static fpstring.h fps_sort" c_sort
:: Ptr Word8 -> CULong -> IO ()
Bodigrim marked this conversation as resolved.
Show resolved Hide resolved
5 changes: 5 additions & 0 deletions bench/BenchAll.hs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ import Gauge
import Prelude hiding (words)

import qualified Data.ByteString as S
import qualified Data.ByteString.Char8 as S8
import qualified Data.ByteString.Lazy as L

import Data.ByteString.Builder
Expand Down Expand Up @@ -225,6 +226,9 @@ sanityCheckInfo =
]
]

sortInputs :: [S.ByteString]
sortInputs = map (`S.take` S.pack [122, 121 .. 32]) [10..25]

main :: IO ()
main = do
mapM_ putStrLn sanityCheckInfo
Expand Down Expand Up @@ -387,4 +391,5 @@ main = do
, bench "balancedSlow" $ partitionLazy (\x -> hashWord8 x < w 128)
]
]
, bgroup "sort" $ map (\s -> bench (S8.unpack s) $ nf S.sort s) sortInputs
]
8 changes: 8 additions & 0 deletions cbits/fpstring.c
Original file line number Diff line number Diff line change
Expand Up @@ -88,3 +88,11 @@ void * fps_memcpy_offsets(void *dst, unsigned long dst_off,
const void *src, unsigned long src_off, size_t n) {
return memcpy(dst + dst_off, src + src_off, n);
}

int fps_compare(const void *a, const void *b) {
return (int)*(unsigned char*)a - (int)*(unsigned char*)b;
}

void fps_sort(unsigned char *p, unsigned long len) {
Bodigrim marked this conversation as resolved.
Show resolved Hide resolved
return qsort(p, len, 1, fps_compare);
}
2 changes: 2 additions & 0 deletions include/fpstring.h
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@

#include <string.h>
#include <stdlib.h>

void fps_reverse(unsigned char *dest, unsigned char *from, unsigned long len);
void fps_intersperse(unsigned char *dest, unsigned char *from, unsigned long len, unsigned char c);
unsigned char fps_maximum(unsigned char *p, unsigned long len);
unsigned char fps_minimum(unsigned char *p, unsigned long len);
unsigned long fps_count(unsigned char *p, unsigned long len, unsigned char w);
void fps_sort(unsigned char *p, unsigned long len);
Bodigrim marked this conversation as resolved.
Show resolved Hide resolved