Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide unconsN/unsnocN? #524

Open
hasufell opened this issue Jul 3, 2022 · 7 comments · May be fixed by #525
Open

Provide unconsN/unsnocN? #524

hasufell opened this issue Jul 3, 2022 · 7 comments · May be fixed by #525

Comments

@hasufell
Copy link
Member

hasufell commented Jul 3, 2022

In an attempt to improve performance of filepath functions using ShortByteString I figured that unpack slowed down a couple of functions. Moving to several calls of uncons seemed to improve performance. In particular:

 readDriveUNC :: FILEPATH -> Maybe (FILEPATH, FILEPATH)
-readDriveUNC bs = case unpack bs of
-  (s1:s2:q:s3:xs)
-    | q == _question && L.all isPathSeparator [s1,s2,s3] ->
-      case L.map toUpper xs of
-          (u:n:c:s4:_)
-            | u == _U && n == _N && c == _C && isPathSeparator s4 ->
-              let (a,b) = readDriveShareName (pack (L.drop 4 xs))
-              in Just (pack (s1:s2:_question:s3:L.take 4 xs) <> a, b)
-          _ -> case readDriveLetter (pack xs) of
-                   -- Extended-length path.
-                   Just (a,b) -> Just (pack [s1,s2,_question,s3] <> a, b)
-                   Nothing -> Nothing
-  _ -> Nothing
+readDriveUNC bs
+  | Just (s1, r1) <- uncons bs
+  , Just (s2, r2) <- uncons r1
+  , Just (q,  r3) <- uncons r2
+  , Just (s3, xs) <- uncons r3
+  , q == _question
+  , L.all isPathSeparator [s1,s2,s3] =
+      if | Just (toUpper -> u, k1) <- uncons xs
+         , Just (toUpper -> n, k2) <- uncons k1
+         , Just (toUpper -> c, k3) <- uncons k2
+         , Just (s4,           rr) <- uncons k3
+         , u == _U
+         , n == _N
+         , c == _C
+         , isPathSeparator s4 ->
+              let (a,b) = readDriveShareName rr
+              in Just (pack [s1,s2,_question,s3,u,n,c,s4] <> a, b)
+         | otherwise -> case readDriveLetter xs of
+                          -- Extended-length path.
+                          Just (a,b) -> Just (pack [s1,s2,_question,s3] <> a, b)
+                          Nothing -> Nothing
+  | otherwise = Nothing
 

https://gitlab.haskell.org/haskell/filepath/-/merge_requests/116/diffs

The 3 consecutive calls to uncons are not only awkward, but also incur 3 copies for the tail.

So I'm wondering if a function like this might be useful (at least for ShortByteString):

unconsN :: Int -> ShortByteString -> Maybe ([Word8], ShortByteString)

The obvious disadvantage here is that you'll get partial pattern matching on the Word list, because we don't have dependent types.

Providing uncons2, uncons3 and using a tuple instead might be an alternative, but less general.

The other way would be to figure out why unpack is so slow. Afaiu it's only semi lazy, e.g. unpacks the first 100 bytes strictly.

@hasufell
Copy link
Member Author

hasufell commented Jul 3, 2022

I'll see if I can provide a minimal benchmark for this.

The filepath function using those, went from:

    splitDrive (windows): 
      8.27 μs ± 299 ns

to:

    splitDrive (windows): 
      867  ns ±  21 ns

@hasufell
Copy link
Member Author

hasufell commented Jul 3, 2022

hasufell@ddf9180

All
  ShortByteString
    ShortByteString unpack/uncons comparison
      unpack and look at first 5 elements: OK (2.21s)
        15.8 ms ± 665 μs
      uncons consecutively 5 times:        OK (0.93s)
        393  μs ±  38 μs
      unconsN 5:                           OK (0.16s)
        73.2 ns ± 6.6 ns

All 3 tests passed (3.30s)

So:

  1. unpack is the slowest
  2. uncons consecutively is at least twice as fast for n = 5
  3. unconsN is the fastest

Implementation of unconsN is in the link.

hasufell added a commit to hasufell/bytestring that referenced this issue Jul 3, 2022
@hasufell hasufell linked a pull request Jul 3, 2022 that will close this issue
@sjakobi
Copy link
Member

sjakobi commented Jul 3, 2022

I think it would be good to figure out the performance problem with unpack first.

@sjakobi
Copy link
Member

sjakobi commented Jul 3, 2022

In particular I wonder whether the actual performance problem might lie with your use of pack. Instead, I think it might be better to use explicitly use ByteString.Short.drop on the original ShortByteString.

@hasufell
Copy link
Member Author

hasufell commented Jul 3, 2022

In particular I wonder whether the actual performance problem might lie with your use of pack.

See the PR. It's not about pack: https://github.com/haskell/bytestring/pull/525/files#diff-c29f395d853c89b91b13dca506d85e777afb0a0343d817021f642f10798fb1a8R234

@hasufell
Copy link
Member Author

hasufell commented Jul 3, 2022

Instead, I think it might be better to use explicitly use ByteString.Short.drop on the original ShortByteString.

I don't think so, because drop gives you no guarantees about the length of the input bytestring. So you end up re-inventing unconsN with a combination of length, drop and unpack, which is rather fragile.

@hasufell
Copy link
Member Author

hasufell commented Jul 3, 2022

I think it would be good to figure out the performance problem with unpack first.

The current implementation is:

unpackBytes :: ShortByteString -> [Word8]
unpackBytes sbs = unpackAppendBytesLazy sbs []

unpackAppendBytesLazy :: ShortByteString -> [Word8] -> [Word8]
unpackAppendBytesLazy sbs = go 0 (length sbs)
  where
    sz = 100

    go off len ws
      | len <= sz = unpackAppendBytesStrict sbs off len ws
      | otherwise = unpackAppendBytesStrict sbs off sz  remainder
                      where remainder = go (off+sz) (len-sz) ws

unpackAppendBytesStrict :: ShortByteString -> Int -> Int -> [Word8] -> [Word8]
unpackAppendBytesStrict !sbs off len = go (off-1) (off-1 + len)
  where
    go !sentinal !i !acc
      | i == sentinal = acc
      | otherwise     = let !w = indexWord8Array (asBA sbs) i
                         in go sentinal (i-1) (w:acc)

which I don't understand at all... in fact.

I changed it to

unpackBytes :: ShortByteString -> [Word8]
unpackBytes sbs = let ix = length sbs - 1
                  in List.map (unsafeIndex sbs) [0..ix]

and that seemed to speed it up considerably:

All
  ShortByteString
    ShortByteString unpack/uncons comparison
      unpack and look at first 5 elements: OK (0.29s)
        62.2 ns ± 3.5 ns
      uncons consecutively 5 times:        OK (0.96s)
        412  μs ±  35 μs
      unconsN 5:                           OK (0.17s)
        77.1 ns ± 6.0 ns

but I'm not sure if there's different memory behavior and if that does something to inlining and list fusion.

hasufell added a commit to hasufell/bytestring that referenced this issue Sep 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants