-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize infix matching for partial application #137
Conversation
When partially applied, findSubstring, findSubstrings, and isInfixOf should only do the breakSubstring preliminary work once.
Data/ByteString.hs
Outdated
where | ||
(a, b) = breakSubstring pat (unsafeDrop n src) | ||
lp = length pat | ||
ls = length src |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally make bindings like these strict to ensure that the compiler doesn't attempt to inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on board with the principle.
I'd prefer to adjust the style somewhat to make the memoising clearer (imho) and add comments, but I'm not going to insist on the style change, though I would like the comment (though changing the style would reduce the need for calling it out in a comment since it'd already be more obvious).
Data/ByteString.hs
Outdated
| null pat && null src = Just 0 | ||
| null b = Nothing | ||
| otherwise = Just (length a) | ||
where (a, b) = breakPat src |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not desperately fond of this f x = go
pattern when the go
body is not recursive, it's just the normal body.
I think I'd prefer let
and lambda.
The more important thing here is to comment that the breakPat = breakSubstring pat
bit is there to memoise the work that can be done on the pattern independent of the src.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can use let
and lambda, but without MultiWayIf
, I can't see how to avoid ending up with another name somewhere. Should I add MultiWayIf
to the language pragmas?
Data/ByteString.hs
Outdated
| (n > ls - lp) || null b = [] | ||
| otherwise = let k = n + length a | ||
in k : search (k + lp) | ||
breakPat = breakSubstring pat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, we should comment here why we do it like this, since it's important and not immediately obvious from this definition (though it is of course if you look at the docs for breakSubstring
).
Data/ByteString.hs
Outdated
breakPat = breakSubstring pat | ||
go src | ||
| null pat = [0 .. ls] | ||
| otherwise = search 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I think I'd advocate a style more like
findSubstrings pat =
let breakPat = breakSubstring pat
in \src -> ...
to 1. avoid having to give names like go
to what are just non-recusrive function bodies, and 2. to make it blindly obvious that we're memoising something outside of a lambda. In the nested where style one has to look closely at the scoping to make sense of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, that's fine, but it requires MultiWayIf
Hey sorry, I totally forgot about this, but I'll make the change now. See my response |
I copied the note on |
95b8ec2
to
7f9bf16
Compare
Apparently MultiWayIf is an "unsupported extension"? |
@dcoutts Hi just checking in again. Any advice? |
@davidspies yes, using MultiWayIf would exclude older GHCs which don't support it; since MultiWayIf is just syntactic sugar I'd suggest avoiding it. |
@@ -1,6 +1,7 @@ | |||
{-# LANGUAGE CPP #-} | |||
{-# LANGUAGE MagicHash, UnboxedTuples, | |||
NamedFieldPuns, BangPatterns #-} | |||
NamedFieldPuns, BangPatterns, | |||
MultiWayIf #-} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please avoid the non-criticial use of MultiWayIf
you can use the Haskell2010 compliant pattern below if you need more than 2 branches
case () of
_ | cond1 -> ...
| cond2 -> ...
| otherwise -> ...
Closing, because |
When partially applied, findSubstring, findSubstrings, and isInfixOf should only
do the breakSubstring preliminary work once.