Use qsort to sort short ByteString #267

Bodigrim · 2020-08-24T22:27:48Z

This is similar in spirit to #123, but includes benchmarks to reproduce results. I also decided to use qsort from stdlib instead of implementing insertion sort manually, just to minimise changes.
(I have no idea about C, suggestions would be highly appreciated, CC @vdukhovni)

countsort

sort/zyxwvutsrq                          mean 299.7 ns  ( +- 19.84 ns  )
sort/zyxwvutsrqp                         mean 320.1 ns  ( +- 23.39 ns  )
sort/zyxwvutsrqpo                        mean 302.7 ns  ( +- 19.03 ns  )
sort/zyxwvutsrqpon                       mean 323.6 ns  ( +- 27.07 ns  )
sort/zyxwvutsrqponm                      mean 336.2 ns  ( +- 39.73 ns  )
sort/zyxwvutsrqponml                     mean 324.9 ns  ( +- 17.24 ns  )
sort/zyxwvutsrqponmlk                    mean 400.2 ns  ( +- 47.64 ns  )
sort/zyxwvutsrqponmlkj                   mean 377.8 ns  ( +- 49.46 ns  )
sort/zyxwvutsrqponmlkji                  mean 390.3 ns  ( +- 49.52 ns  )
sort/zyxwvutsrqponmlkjih                 mean 431.4 ns  ( +- 88.87 ns  )
sort/zyxwvutsrqponmlkjihg                mean 461.8 ns  ( +- 84.46 ns  )
sort/zyxwvutsrqponmlkjihgf               mean 382.9 ns  ( +- 59.26 ns  )
sort/zyxwvutsrqponmlkjihgfe              mean 363.3 ns  ( +- 30.18 ns  )
sort/zyxwvutsrqponmlkjihgfed             mean 370.6 ns  ( +- 48.68 ns  )
sort/zyxwvutsrqponmlkjihgfedc            mean 356.5 ns  ( +- 19.49 ns  )
sort/zyxwvutsrqponmlkjihgfedcb           mean 365.8 ns  ( +- 23.33 ns  )

qsort

sort/zyxwvutsrq                          mean 114.1 ns  ( +- 4.661 ns  )
sort/zyxwvutsrqp                         mean 122.7 ns  ( +- 4.730 ns  )
sort/zyxwvutsrqpo                        mean 136.0 ns  ( +- 5.738 ns  )
sort/zyxwvutsrqpon                       mean 147.4 ns  ( +- 10.86 ns  )
sort/zyxwvutsrqponm                      mean 158.3 ns  ( +- 8.222 ns  )
sort/zyxwvutsrqponml                     mean 168.8 ns  ( +- 13.26 ns  )
sort/zyxwvutsrqponmlk                    mean 216.3 ns  ( +- 11.70 ns  )
sort/zyxwvutsrqponmlkj                   mean 256.4 ns  ( +- 15.16 ns  )
sort/zyxwvutsrqponmlkji                  mean 276.9 ns  ( +- 20.38 ns  )
sort/zyxwvutsrqponmlkjih                 mean 285.3 ns  ( +- 12.04 ns  )
sort/zyxwvutsrqponmlkjihg                mean 342.3 ns  ( +- 17.22 ns  )
sort/zyxwvutsrqponmlkjihgf               mean 405.4 ns  ( +- 25.62 ns  )
sort/zyxwvutsrqponmlkjihgfe              mean 419.0 ns  ( +- 30.76 ns  )
sort/zyxwvutsrqponmlkjihgfed             mean 428.7 ns  ( +- 19.61 ns  )
sort/zyxwvutsrqponmlkjihgfedc            mean 491.1 ns  ( +- 22.98 ns  )
sort/zyxwvutsrqponmlkjihgfedcb           mean 552.5 ns  ( +- 27.65 ns  )

cbits/fpstring.c

Data/ByteString/Internal.hs

cbits/fpstring.c

include/fpstring.h

Data/ByteString/Internal.hs

vdukhovni · 2020-08-25T02:10:06Z

I am curious why the performance of this function is worthy of attention? What is the use-case for sorting the characters of short bytestrings?

Co-authored-by: Viktor Dukhovni <viktor1ghub@dukhovni.org>

Bodigrim · 2020-08-25T18:18:21Z

I actually struggle to find an example of sorting long strings, but for short ones a common use case is checking that two words are anagrams of each other:

areAnagrams xs ys = sort xs == sort ys

Words are usually short, and for the range of 5-10 symbols qsort is at least 3x faster than countsort.

vdukhovni · 2020-08-25T18:52:54Z

I actually struggle to find an example of sorting long strings, but for short ones a common use case is checking that two words are anagrams of each other:
areAnagrams xs ys = sort xs == sort ys
Words are usually short, and for the range of 5-10 symbols qsort is at least 3x faster than countsort.

Sure, when do we care about the performance of such checks?

sjakobi

👍 for the performance improvements. #123 indicated that at least one person cared about this.

I can't really comment on the implementation – it has been a long time since I last used C.

vdukhovni · 2020-08-25T19:37:56Z

👍 for the performance improvements. #123 indicated that at least one person cared about this.

I can't really comment on the implementation – it has been a long time since I last used C.

There's no meaningful C code here, just a call to qsort(). All my nitpicking comments are technicalities, the code is correct as proposed, but in theory it should be using slightly different C types in a few places, though in practice no platform I know of has actual issues with the code as-is.

vdukhovni · 2020-08-25T21:04:18Z

Data/ByteString/Internal.hs


 foreign import ccall unsafe "static fpstring.h fps_count" c_count
-    :: Ptr Word8 -> CULong -> Word8 -> IO CULong
+    :: Ptr Word8 -> CSize -> Word8 -> IO CULong


This should also return a CSize

Suggested change

:: Ptr Word8 -> CSize -> Word8 -> IO CULong

:: Ptr Word8 -> CSize -> Word8 -> IO CSize

vdukhovni · 2020-08-25T21:05:01Z

cbits/fpstring.c

@@ -73,7 +73,7 @@ unsigned char fps_minimum(unsigned char *p, unsigned long  len) {
 }

 /* count the number of occurences of a char in a string */
-unsigned long fps_count(unsigned char *p, unsigned long len, unsigned char w) {
+unsigned long fps_count(unsigned char *p, size_t len, unsigned char w) {


Suggested change

unsigned long fps_count(unsigned char *p, size_t len, unsigned char w) {

size_t fps_count(unsigned char *p, size_t len, unsigned char w) {

And also in the declartion of c below.

vdukhovni · 2020-08-25T21:06:18Z

include/fpstring.h

+void fps_intersperse(unsigned char *dest, unsigned char *from, size_t len, unsigned char c);
+unsigned char fps_maximum(unsigned char *p, size_t len);
+unsigned char fps_minimum(unsigned char *p, size_t len);
+unsigned long fps_count(unsigned char *p, size_t len, unsigned char w);


Suggested change

unsigned long fps_count(unsigned char *p, size_t len, unsigned char w);

size_t fps_count(unsigned char *p, size_t len, unsigned char w);

vdukhovni · 2020-08-25T21:09:04Z

Data/ByteString/Internal.hs

+        c_intersperse,          -- :: Ptr Word8 -> Ptr Word8 -> CSize -> Word8 -> IO ()
+        c_maximum,              -- :: Ptr Word8 -> CSize -> IO Word8
+        c_minimum,              -- :: Ptr Word8 -> CSize -> IO Word8
+        c_count,                -- :: Ptr Word8 -> CSize -> Word8 -> IO CInt


Suggested change

c_count, -- :: Ptr Word8 -> CSize -> Word8 -> IO CInt

c_count, -- :: Ptr Word8 -> CSize -> Word8 -> IO CSize

This could affect all sites that don't use fromIntegral, but that's probably already handled, because CInt isn't a priori any particular one of the normal Haskell integral types, so the fromIntegral should already be there.

Bodigrim · 2020-08-25T21:20:10Z

@vdukhovni thanks, very much appreciated. It's nice to have someone knowledgeable in C :)

Sure, when do we care about the performance of such checks?

I do not have any production application in mind. Honestly, I just do not like #123 lingering around any more.

vdukhovni

But for "bonus points" you could also update:

diff --git a/Data/ByteString/Internal.hs b/Data/ByteString/Internal.hs
index 3687603..dc98976 100644
--- a/Data/ByteString/Internal.hs
+++ b/Data/ByteString/Internal.hs
@@ -102,7 +101,7 @@ import Foreign.Storable         (Storable(..))
 
 #if MIN_VERSION_base(4,5,0) || __GLASGOW_HASKELL__ >= 703
-import Foreign.C.Types          (CInt(..), CSize(..), CULong(..))
+import Foreign.C.Types          (CInt(..), CSize(..))
 #else
-import Foreign.C.Types          (CInt, CSize, CULong)
+import Foreign.C.Types          (CInt, CSize)
 #endif
 

diff --git a/Data/ByteString/Short/Internal.hs b/Data/ByteString/Short/Internal.hs
index 3c945c7..79f1533 100644
--- a/Data/ByteString/Short/Internal.hs
+++ b/Data/ByteString/Short/Internal.hs
@@ -588,5 +588,5 @@ copyAddrToByteArray0 src dst len s =
 
 foreign import ccall unsafe "fpstring.h fps_memcpy_offsets"
-  memcpy_AddrToByteArray :: MutableByteArray# s -> CLong -> Addr# -> CLong -> CSize -> IO ()
+  memcpy_AddrToByteArray :: MutableByteArray# s -> CSize -> Addr# -> CSize -> CSize -> IO ()
 
 foreign import ccall unsafe "string.h memcpy"
@@ -609,5 +609,5 @@ copyByteArrayToAddr0 src dst len s =
 
 foreign import ccall unsafe "fpstring.h fps_memcpy_offsets"
-  memcpy_ByteArrayToAddr :: Addr# -> CLong -> ByteArray# -> CLong -> CSize -> IO ()
+  memcpy_ByteArrayToAddr :: Addr# -> CSize -> ByteArray# -> CSize -> CSize -> IO ()
 
 foreign import ccall unsafe "string.h memcpy"
@@ -636,6 +636,6 @@ copyByteArray# src src_off dst dst_off len s =
 
 foreign import ccall unsafe "fpstring.h fps_memcpy_offsets"
-  memcpy_ByteArray :: MutableByteArray# s -> CLong
-                   -> ByteArray# -> CLong -> CSize -> IO ()
+  memcpy_ByteArray :: MutableByteArray# s -> CSize
+                   -> ByteArray# -> CSize -> CSize -> IO ()
 #endif

diff --git a/cbits/fpstring.c b/cbits/fpstring.c
index 71dc750..165d296 100644
--- a/cbits/fpstring.c
+++ b/cbits/fpstring.c
@@ -85,6 +85,6 @@ size_t fps_count(unsigned char *p, size_t len, unsigned char w) {
    We cannot construct a pointer to the interior of an unpinned ByteArray#,
    except by doing an unsafe ffi call, and adjusting the pointer C-side. */
-void * fps_memcpy_offsets(void       *dst, unsigned long dst_off,
-                          const void *src, unsigned long src_off, size_t n) {
+void * fps_memcpy_offsets(void       *dst, size_t dst_off,
+                          const void *src, size_t src_off, size_t n) {
     return memcpy(dst + dst_off, src + src_off, n);
 }

vdukhovni · 2020-08-25T21:41:30Z

Mind you, some of that code goes away with my "drop 7.x" patch, which also drops #ifdef conditionals for 6.x and even 4.x in some cases. I see you're reluctant to apply the chop now, but perhaps you'll consider it at some point before the merge gets too difficult...

vdukhovni

LGTM. But the builds are now failing. There's some sort of clong function that produced explicit CLong for some of these, rather than just use fromIntegral.

Data/ByteString/Short/Internal.hs:576:38:
    Couldn't match expected type `CSize' with actual type `CLong'
    In the return type of a call of `clong'
    In the second argument of `memcpy_AddrToByteArray', namely
      `(clong dst_off)'
    In the first argument of `unIO_', namely
      `(memcpy_AddrToByteArray dst (clong dst_off) src 0 (csize len))'
Data/ByteString/Short/Internal.hs:597:44:
    Couldn't match expected type `CSize' with actual type `CLong'
    In the return type of a call of `clong'
    In the fourth argument of `memcpy_ByteArrayToAddr', namely
      `(clong src_off)'
    In the first argument of `unIO_', namely
      `(memcpy_ByteArrayToAddr dst 0 src (clong src_off) (csize len))'
Data/ByteString/Short/Internal.hs:632:30:
    Couldn't match expected type `CSize' with actual type `CLong'
    In the return type of a call of `clong'
    In the second argument of `memcpy_ByteArray', namely
      `(clong dst_off)'
    In the first argument of `unsafeIOToST', namely
      `(memcpy_ByteArray
          dst (clong dst_off) src (clong src_off) (csize len))'

vdukhovni · 2020-08-25T22:18:19Z

So also need:

diff --git a/Data/ByteString/Short/Internal.hs b/Data/ByteString/Short/Internal.hs
index 54f77d9..47525fd 100644
--- a/Data/ByteString/Short/Internal.hs
+++ b/Data/ByteString/Short/Internal.hs
@@ -575,3 +575,3 @@ copyByteArrayToAddr# = GHC.Exts.copyByteArrayToAddr#
 copyAddrToByteArray# src dst dst_off len s =
-  unIO_ (memcpy_AddrToByteArray dst (clong dst_off) src 0 (csize len)) s
+  unIO_ (memcpy_AddrToByteArray dst (csize dst_off) src 0 (csize len)) s
 
@@ -596,3 +596,3 @@ foreign import ccall unsafe "string.h memcpy"
 copyByteArrayToAddr# src src_off dst len s =
-  unIO_ (memcpy_ByteArrayToAddr dst 0 src (clong src_off) (csize len)) s
+  unIO_ (memcpy_ByteArrayToAddr dst 0 src (csize src_off) (csize len)) s
 
@@ -619,4 +619,4 @@ unIO_ io s = case unIO io s of (# s, _ #) -> s
 
-clong :: Int# -> CLong
-clong i# = fromIntegral (I# i#)
+csize :: Int# -> CSize
+csize i# = fromIntegral (I# i#)
 
@@ -631,3 +631,3 @@ copyByteArray# src src_off dst dst_off len s =
     unST_ (unsafeIOToST
-      (memcpy_ByteArray dst (clong dst_off) src (clong src_off) (csize len))) s
+      (memcpy_ByteArray dst (csize dst_off) src (csize src_off) (csize len))) s
   where

vdukhovni

Looks good. By the way, this assumes that the 'source_offset' used with the bytearray copy functions is always non-negative. I think that's a correct assumption, and the function I think is always in fact used with an offset of 0. If we really wanted negative offsets, we'd need an ssize_t and Haskell equivalent, but I do not think this is needed. Instead we could perhaps "assert >=0" in the input to "csize", and run an unoptimized build to test.

Bodigrim · 2020-08-25T23:31:49Z

@vdukhovni thanks for thorough review!

Given that csize is going to be removed in #265, I think we can leave it in peace for now.

vdukhovni · 2020-08-27T07:33:30Z

P.S. We seem to have neglected one of the changes I suggested, but failed to notice did not get included:

diff --git a/cbits/fpstring.c b/cbits/fpstring.c
index 3c739cd..f82bc00 100644
--- a/cbits/fpstring.c
+++ b/cbits/fpstring.c
@@ -74,7 +74,7 @@ unsigned char fps_minimum(unsigned char *p, size_t len) {
 
 /* count the number of occurences of a char in a string */
 size_t fps_count(unsigned char *p, size_t len, unsigned char w) {
-    unsigned long c;
+    size_t c;
     for (c = 0; len-- != 0; ++p)
         if (*p == w)
             ++c;

Not critical, but worth doing, just to be consistent all around.

Bodigrim added 3 commits August 24, 2020 23:09

Add benchmarks for sorting

4b42384

Write c_sort, which is a binding to qsort

b333cf1

Use qsort to sort tiny bytestrings

c5c5648

Bodigrim requested a review from sjakobi August 24, 2020 22:27

Bodigrim added the performance label Aug 24, 2020

vdukhovni reviewed Aug 25, 2020

View reviewed changes

cbits/fpstring.c Outdated Show resolved Hide resolved

Data/ByteString/Internal.hs Outdated Show resolved Hide resolved

cbits/fpstring.c Outdated Show resolved Hide resolved

include/fpstring.h Outdated Show resolved Hide resolved

Data/ByteString/Internal.hs Outdated Show resolved Hide resolved

Update cbits/fpstring.c

9ade2b0

Co-authored-by: Viktor Dukhovni <viktor1ghub@dukhovni.org>

sjakobi approved these changes Aug 25, 2020

View reviewed changes

Change unsigned long to size_t

ab14f68

vdukhovni reviewed Aug 25, 2020

View reviewed changes

Change unsigned long to size_t in return type

316f1a3

vdukhovni approved these changes Aug 25, 2020

View reviewed changes

Bonus points

f3e4d44

vdukhovni approved these changes Aug 25, 2020

View reviewed changes

Remove clong

81c94b2

vdukhovni approved these changes Aug 25, 2020

View reviewed changes

Bodigrim merged commit fc9409e into haskell:master Aug 25, 2020

Bodigrim deleted the sort branch August 25, 2020 23:52

Bodigrim mentioned this pull request Aug 25, 2020

Use insertion sort to sort short ByteString #123

Closed

Bodigrim added this to the 0.11.0.0 milestone Aug 25, 2020

Bodigrim mentioned this pull request Aug 27, 2020

Replace unsigned long c with size_t c in fps_count #271

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use qsort to sort short ByteString #267

Use qsort to sort short ByteString #267

Bodigrim commented Aug 24, 2020

vdukhovni commented Aug 25, 2020

Bodigrim commented Aug 25, 2020

vdukhovni commented Aug 25, 2020

sjakobi left a comment

vdukhovni commented Aug 25, 2020

vdukhovni Aug 25, 2020

vdukhovni Aug 25, 2020

vdukhovni Aug 25, 2020

vdukhovni Aug 25, 2020

Bodigrim commented Aug 25, 2020

vdukhovni left a comment

vdukhovni commented Aug 25, 2020

vdukhovni left a comment •

edited

Loading

vdukhovni commented Aug 25, 2020

vdukhovni left a comment

Bodigrim commented Aug 25, 2020

vdukhovni commented Aug 27, 2020

	:: Ptr Word8 -> CSize -> Word8 -> IO CULong
	:: Ptr Word8 -> CSize -> Word8 -> IO CSize

	unsigned long fps_count(unsigned char *p, size_t len, unsigned char w) {
	size_t fps_count(unsigned char *p, size_t len, unsigned char w) {

	c_count, -- :: Ptr Word8 -> CSize -> Word8 -> IO CInt
	c_count, -- :: Ptr Word8 -> CSize -> Word8 -> IO CSize

Use qsort to sort short ByteString #267

Use qsort to sort short ByteString #267

Conversation

Bodigrim commented Aug 24, 2020

vdukhovni commented Aug 25, 2020

Bodigrim commented Aug 25, 2020

vdukhovni commented Aug 25, 2020

sjakobi left a comment

Choose a reason for hiding this comment

vdukhovni commented Aug 25, 2020

vdukhovni Aug 25, 2020

Choose a reason for hiding this comment

vdukhovni Aug 25, 2020

Choose a reason for hiding this comment

vdukhovni Aug 25, 2020

Choose a reason for hiding this comment

vdukhovni Aug 25, 2020

Choose a reason for hiding this comment

Bodigrim commented Aug 25, 2020

vdukhovni left a comment

Choose a reason for hiding this comment

vdukhovni commented Aug 25, 2020

vdukhovni left a comment • edited Loading

Choose a reason for hiding this comment

vdukhovni commented Aug 25, 2020

vdukhovni left a comment

Choose a reason for hiding this comment

Bodigrim commented Aug 25, 2020

vdukhovni commented Aug 27, 2020

vdukhovni left a comment •

edited

Loading