Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use qsort to sort short ByteString #267

Merged
merged 8 commits into from
Aug 25, 2020
Merged

Use qsort to sort short ByteString #267

merged 8 commits into from
Aug 25, 2020

Conversation

Bodigrim
Copy link
Contributor

This is similar in spirit to #123, but includes benchmarks to reproduce results. I also decided to use qsort from stdlib instead of implementing insertion sort manually, just to minimise changes.
(I have no idea about C, suggestions would be highly appreciated, CC @vdukhovni)

countsort

sort/zyxwvutsrq                          mean 299.7 ns  ( +- 19.84 ns  )
sort/zyxwvutsrqp                         mean 320.1 ns  ( +- 23.39 ns  )
sort/zyxwvutsrqpo                        mean 302.7 ns  ( +- 19.03 ns  )
sort/zyxwvutsrqpon                       mean 323.6 ns  ( +- 27.07 ns  )
sort/zyxwvutsrqponm                      mean 336.2 ns  ( +- 39.73 ns  )
sort/zyxwvutsrqponml                     mean 324.9 ns  ( +- 17.24 ns  )
sort/zyxwvutsrqponmlk                    mean 400.2 ns  ( +- 47.64 ns  )
sort/zyxwvutsrqponmlkj                   mean 377.8 ns  ( +- 49.46 ns  )
sort/zyxwvutsrqponmlkji                  mean 390.3 ns  ( +- 49.52 ns  )
sort/zyxwvutsrqponmlkjih                 mean 431.4 ns  ( +- 88.87 ns  )
sort/zyxwvutsrqponmlkjihg                mean 461.8 ns  ( +- 84.46 ns  )
sort/zyxwvutsrqponmlkjihgf               mean 382.9 ns  ( +- 59.26 ns  )
sort/zyxwvutsrqponmlkjihgfe              mean 363.3 ns  ( +- 30.18 ns  )
sort/zyxwvutsrqponmlkjihgfed             mean 370.6 ns  ( +- 48.68 ns  )
sort/zyxwvutsrqponmlkjihgfedc            mean 356.5 ns  ( +- 19.49 ns  )
sort/zyxwvutsrqponmlkjihgfedcb           mean 365.8 ns  ( +- 23.33 ns  )

qsort

sort/zyxwvutsrq                          mean 114.1 ns  ( +- 4.661 ns  )
sort/zyxwvutsrqp                         mean 122.7 ns  ( +- 4.730 ns  )
sort/zyxwvutsrqpo                        mean 136.0 ns  ( +- 5.738 ns  )
sort/zyxwvutsrqpon                       mean 147.4 ns  ( +- 10.86 ns  )
sort/zyxwvutsrqponm                      mean 158.3 ns  ( +- 8.222 ns  )
sort/zyxwvutsrqponml                     mean 168.8 ns  ( +- 13.26 ns  )
sort/zyxwvutsrqponmlk                    mean 216.3 ns  ( +- 11.70 ns  )
sort/zyxwvutsrqponmlkj                   mean 256.4 ns  ( +- 15.16 ns  )
sort/zyxwvutsrqponmlkji                  mean 276.9 ns  ( +- 20.38 ns  )
sort/zyxwvutsrqponmlkjih                 mean 285.3 ns  ( +- 12.04 ns  )
sort/zyxwvutsrqponmlkjihg                mean 342.3 ns  ( +- 17.22 ns  )
sort/zyxwvutsrqponmlkjihgf               mean 405.4 ns  ( +- 25.62 ns  )
sort/zyxwvutsrqponmlkjihgfe              mean 419.0 ns  ( +- 30.76 ns  )
sort/zyxwvutsrqponmlkjihgfed             mean 428.7 ns  ( +- 19.61 ns  )
sort/zyxwvutsrqponmlkjihgfedc            mean 491.1 ns  ( +- 22.98 ns  )
sort/zyxwvutsrqponmlkjihgfedcb           mean 552.5 ns  ( +- 27.65 ns  )

cbits/fpstring.c Outdated Show resolved Hide resolved
Data/ByteString/Internal.hs Outdated Show resolved Hide resolved
cbits/fpstring.c Outdated Show resolved Hide resolved
include/fpstring.h Outdated Show resolved Hide resolved
Data/ByteString/Internal.hs Outdated Show resolved Hide resolved
@vdukhovni
Copy link
Contributor

I am curious why the performance of this function is worthy of attention? What is the use-case for sorting the characters of short bytestrings?

Co-authored-by: Viktor Dukhovni <viktor1ghub@dukhovni.org>
@Bodigrim
Copy link
Contributor Author

I actually struggle to find an example of sorting long strings, but for short ones a common use case is checking that two words are anagrams of each other:

areAnagrams xs ys = sort xs == sort ys

Words are usually short, and for the range of 5-10 symbols qsort is at least 3x faster than countsort.

@vdukhovni
Copy link
Contributor

I actually struggle to find an example of sorting long strings, but for short ones a common use case is checking that two words are anagrams of each other:

areAnagrams xs ys = sort xs == sort ys

Words are usually short, and for the range of 5-10 symbols qsort is at least 3x faster than countsort.

Sure, when do we care about the performance of such checks?

Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for the performance improvements. #123 indicated that at least one person cared about this.

I can't really comment on the implementation – it has been a long time since I last used C.

@vdukhovni
Copy link
Contributor

👍 for the performance improvements. #123 indicated that at least one person cared about this.

I can't really comment on the implementation – it has been a long time since I last used C.

There's no meaningful C code here, just a call to qsort(). All my nitpicking comments are technicalities, the code is correct as proposed, but in theory it should be using slightly different C types in a few places, though in practice no platform I know of has actual issues with the code as-is.


foreign import ccall unsafe "static fpstring.h fps_count" c_count
:: Ptr Word8 -> CULong -> Word8 -> IO CULong
:: Ptr Word8 -> CSize -> Word8 -> IO CULong
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also return a CSize

Suggested change
:: Ptr Word8 -> CSize -> Word8 -> IO CULong
:: Ptr Word8 -> CSize -> Word8 -> IO CSize

cbits/fpstring.c Outdated
@@ -73,7 +73,7 @@ unsigned char fps_minimum(unsigned char *p, unsigned long len) {
}

/* count the number of occurences of a char in a string */
unsigned long fps_count(unsigned char *p, unsigned long len, unsigned char w) {
unsigned long fps_count(unsigned char *p, size_t len, unsigned char w) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unsigned long fps_count(unsigned char *p, size_t len, unsigned char w) {
size_t fps_count(unsigned char *p, size_t len, unsigned char w) {

And also in the declartion of c below.

void fps_intersperse(unsigned char *dest, unsigned char *from, size_t len, unsigned char c);
unsigned char fps_maximum(unsigned char *p, size_t len);
unsigned char fps_minimum(unsigned char *p, size_t len);
unsigned long fps_count(unsigned char *p, size_t len, unsigned char w);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unsigned long fps_count(unsigned char *p, size_t len, unsigned char w);
size_t fps_count(unsigned char *p, size_t len, unsigned char w);

c_intersperse, -- :: Ptr Word8 -> Ptr Word8 -> CSize -> Word8 -> IO ()
c_maximum, -- :: Ptr Word8 -> CSize -> IO Word8
c_minimum, -- :: Ptr Word8 -> CSize -> IO Word8
c_count, -- :: Ptr Word8 -> CSize -> Word8 -> IO CInt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
c_count, -- :: Ptr Word8 -> CSize -> Word8 -> IO CInt
c_count, -- :: Ptr Word8 -> CSize -> Word8 -> IO CSize

This could affect all sites that don't use fromIntegral, but that's probably already handled, because CInt isn't a priori any particular one of the normal Haskell integral types, so the fromIntegral should already be there.

@Bodigrim
Copy link
Contributor Author

@vdukhovni thanks, very much appreciated. It's nice to have someone knowledgeable in C :)

Sure, when do we care about the performance of such checks?

I do not have any production application in mind. Honestly, I just do not like #123 lingering around any more.

Copy link
Contributor

@vdukhovni vdukhovni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for "bonus points" you could also update:

diff --git a/Data/ByteString/Internal.hs b/Data/ByteString/Internal.hs
index 3687603..dc98976 100644
--- a/Data/ByteString/Internal.hs
+++ b/Data/ByteString/Internal.hs
@@ -102,7 +101,7 @@ import Foreign.Storable         (Storable(..))
 
 #if MIN_VERSION_base(4,5,0) || __GLASGOW_HASKELL__ >= 703
-import Foreign.C.Types          (CInt(..), CSize(..), CULong(..))
+import Foreign.C.Types          (CInt(..), CSize(..))
 #else
-import Foreign.C.Types          (CInt, CSize, CULong)
+import Foreign.C.Types          (CInt, CSize)
 #endif
 

diff --git a/Data/ByteString/Short/Internal.hs b/Data/ByteString/Short/Internal.hs
index 3c945c7..79f1533 100644
--- a/Data/ByteString/Short/Internal.hs
+++ b/Data/ByteString/Short/Internal.hs
@@ -588,5 +588,5 @@ copyAddrToByteArray0 src dst len s =
 
 foreign import ccall unsafe "fpstring.h fps_memcpy_offsets"
-  memcpy_AddrToByteArray :: MutableByteArray# s -> CLong -> Addr# -> CLong -> CSize -> IO ()
+  memcpy_AddrToByteArray :: MutableByteArray# s -> CSize -> Addr# -> CSize -> CSize -> IO ()
 
 foreign import ccall unsafe "string.h memcpy"
@@ -609,5 +609,5 @@ copyByteArrayToAddr0 src dst len s =
 
 foreign import ccall unsafe "fpstring.h fps_memcpy_offsets"
-  memcpy_ByteArrayToAddr :: Addr# -> CLong -> ByteArray# -> CLong -> CSize -> IO ()
+  memcpy_ByteArrayToAddr :: Addr# -> CSize -> ByteArray# -> CSize -> CSize -> IO ()
 
 foreign import ccall unsafe "string.h memcpy"
@@ -636,6 +636,6 @@ copyByteArray# src src_off dst dst_off len s =
 
 foreign import ccall unsafe "fpstring.h fps_memcpy_offsets"
-  memcpy_ByteArray :: MutableByteArray# s -> CLong
-                   -> ByteArray# -> CLong -> CSize -> IO ()
+  memcpy_ByteArray :: MutableByteArray# s -> CSize
+                   -> ByteArray# -> CSize -> CSize -> IO ()
 #endif
 
diff --git a/cbits/fpstring.c b/cbits/fpstring.c
index 71dc750..165d296 100644
--- a/cbits/fpstring.c
+++ b/cbits/fpstring.c
@@ -85,6 +85,6 @@ size_t fps_count(unsigned char *p, size_t len, unsigned char w) {
    We cannot construct a pointer to the interior of an unpinned ByteArray#,
    except by doing an unsafe ffi call, and adjusting the pointer C-side. */
-void * fps_memcpy_offsets(void       *dst, unsigned long dst_off,
-                          const void *src, unsigned long src_off, size_t n) {
+void * fps_memcpy_offsets(void       *dst, size_t dst_off,
+                          const void *src, size_t src_off, size_t n) {
     return memcpy(dst + dst_off, src + src_off, n);
 }

@vdukhovni
Copy link
Contributor

Mind you, some of that code goes away with my "drop 7.x" patch, which also drops #ifdef conditionals for 6.x and even 4.x in some cases. I see you're reluctant to apply the chop now, but perhaps you'll consider it at some point before the merge gets too difficult...

Copy link
Contributor

@vdukhovni vdukhovni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But the builds are now failing. There's some sort of clong function that produced explicit CLong for some of these, rather than just use fromIntegral.

Data/ByteString/Short/Internal.hs:576:38:
    Couldn't match expected type `CSize' with actual type `CLong'
    In the return type of a call of `clong'
    In the second argument of `memcpy_AddrToByteArray', namely
      `(clong dst_off)'
    In the first argument of `unIO_', namely
      `(memcpy_AddrToByteArray dst (clong dst_off) src 0 (csize len))'
Data/ByteString/Short/Internal.hs:597:44:
    Couldn't match expected type `CSize' with actual type `CLong'
    In the return type of a call of `clong'
    In the fourth argument of `memcpy_ByteArrayToAddr', namely
      `(clong src_off)'
    In the first argument of `unIO_', namely
      `(memcpy_ByteArrayToAddr dst 0 src (clong src_off) (csize len))'
Data/ByteString/Short/Internal.hs:632:30:
    Couldn't match expected type `CSize' with actual type `CLong'
    In the return type of a call of `clong'
    In the second argument of `memcpy_ByteArray', namely
      `(clong dst_off)'
    In the first argument of `unsafeIOToST', namely
      `(memcpy_ByteArray
          dst (clong dst_off) src (clong src_off) (csize len))'

@vdukhovni
Copy link
Contributor

So also need:

diff --git a/Data/ByteString/Short/Internal.hs b/Data/ByteString/Short/Internal.hs
index 54f77d9..47525fd 100644
--- a/Data/ByteString/Short/Internal.hs
+++ b/Data/ByteString/Short/Internal.hs
@@ -575,3 +575,3 @@ copyByteArrayToAddr# = GHC.Exts.copyByteArrayToAddr#
 copyAddrToByteArray# src dst dst_off len s =
-  unIO_ (memcpy_AddrToByteArray dst (clong dst_off) src 0 (csize len)) s
+  unIO_ (memcpy_AddrToByteArray dst (csize dst_off) src 0 (csize len)) s
 
@@ -596,3 +596,3 @@ foreign import ccall unsafe "string.h memcpy"
 copyByteArrayToAddr# src src_off dst len s =
-  unIO_ (memcpy_ByteArrayToAddr dst 0 src (clong src_off) (csize len)) s
+  unIO_ (memcpy_ByteArrayToAddr dst 0 src (csize src_off) (csize len)) s
 
@@ -619,4 +619,4 @@ unIO_ io s = case unIO io s of (# s, _ #) -> s
 
-clong :: Int# -> CLong
-clong i# = fromIntegral (I# i#)
+csize :: Int# -> CSize
+csize i# = fromIntegral (I# i#)
 
@@ -631,3 +631,3 @@ copyByteArray# src src_off dst dst_off len s =
     unST_ (unsafeIOToST
-      (memcpy_ByteArray dst (clong dst_off) src (clong src_off) (csize len))) s
+      (memcpy_ByteArray dst (csize dst_off) src (csize src_off) (csize len))) s
   where

Copy link
Contributor

@vdukhovni vdukhovni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. By the way, this assumes that the 'source_offset' used with the bytearray copy functions is always non-negative. I think that's a correct assumption, and the function I think is always in fact used with an offset of 0. If we really wanted negative offsets, we'd need an ssize_t and Haskell equivalent, but I do not think this is needed. Instead we could perhaps "assert >=0" in the input to "csize", and run an unoptimized build to test.

@Bodigrim
Copy link
Contributor Author

@vdukhovni thanks for thorough review!

Given that csize is going to be removed in #265, I think we can leave it in peace for now.

@Bodigrim Bodigrim merged commit fc9409e into haskell:master Aug 25, 2020
@Bodigrim Bodigrim deleted the sort branch August 25, 2020 23:52
@Bodigrim Bodigrim added this to the 0.11.0.0 milestone Aug 25, 2020
@vdukhovni
Copy link
Contributor

P.S. We seem to have neglected one of the changes I suggested, but failed to notice did not get included:

diff --git a/cbits/fpstring.c b/cbits/fpstring.c
index 3c739cd..f82bc00 100644
--- a/cbits/fpstring.c
+++ b/cbits/fpstring.c
@@ -74,7 +74,7 @@ unsigned char fps_minimum(unsigned char *p, size_t len) {
 
 /* count the number of occurences of a char in a string */
 size_t fps_count(unsigned char *p, size_t len, unsigned char w) {
-    unsigned long c;
+    size_t c;
     for (c = 0; len-- != 0; ++p)
         if (*p == w)
             ++c;

Not critical, but worth doing, just to be consistent all around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants