Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize infix matching for partial application #137

Closed
wants to merge 3 commits into from

Conversation

davidspies
Copy link

When partially applied, findSubstring, findSubstrings, and isInfixOf should only
do the breakSubstring preliminary work once.

When partially applied, findSubstring, findSubstrings, and isInfixOf should only
do the breakSubstring preliminary work once.
where
(a, b) = breakSubstring pat (unsafeDrop n src)
lp = length pat
ls = length src
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally make bindings like these strict to ensure that the compiler doesn't attempt to inline.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this?

Copy link
Contributor

@dcoutts dcoutts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on board with the principle.

I'd prefer to adjust the style somewhat to make the memoising clearer (imho) and add comments, but I'm not going to insist on the style change, though I would like the comment (though changing the style would reduce the need for calling it out in a comment since it'd already be more obvious).

| null pat && null src = Just 0
| null b = Nothing
| otherwise = Just (length a)
where (a, b) = breakPat src
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not desperately fond of this f x = go pattern when the go body is not recursive, it's just the normal body.

I think I'd prefer let and lambda.

The more important thing here is to comment that the breakPat = breakSubstring pat bit is there to memoise the work that can be done on the pattern independent of the src.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can use let and lambda, but without MultiWayIf, I can't see how to avoid ending up with another name somewhere. Should I add MultiWayIf to the language pragmas?

| (n > ls - lp) || null b = []
| otherwise = let k = n + length a
in k : search (k + lp)
breakPat = breakSubstring pat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we should comment here why we do it like this, since it's important and not immediately obvious from this definition (though it is of course if you look at the docs for breakSubstring).

breakPat = breakSubstring pat
go src
| null pat = [0 .. ls]
| otherwise = search 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think I'd advocate a style more like

findSubstrings pat =
    let breakPat = breakSubstring pat
     in \src -> ...

to 1. avoid having to give names like go to what are just non-recusrive function bodies, and 2. to make it blindly obvious that we're memoising something outside of a lambda. In the nested where style one has to look closely at the scoping to make sense of it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, that's fine, but it requires MultiWayIf

@davidspies
Copy link
Author

Hey sorry, I totally forgot about this, but I'll make the change now. See my response

@davidspies
Copy link
Author

I copied the note on breakSubstring

@davidspies
Copy link
Author

Apparently MultiWayIf is an "unsupported extension"?

@davidspies
Copy link
Author

@dcoutts Hi just checking in again. Any advice?

@davidspies davidspies closed this Aug 17, 2018
@davidspies davidspies reopened this Aug 17, 2018
@hvr
Copy link
Member

hvr commented Aug 18, 2018

@davidspies yes, using MultiWayIf would exclude older GHCs which don't support it; since MultiWayIf is just syntactic sugar I'd suggest avoiding it.

@@ -1,6 +1,7 @@
{-# LANGUAGE CPP #-}
{-# LANGUAGE MagicHash, UnboxedTuples,
NamedFieldPuns, BangPatterns #-}
NamedFieldPuns, BangPatterns,
MultiWayIf #-}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid the non-criticial use of MultiWayIf

you can use the Haskell2010 compliant pattern below if you need more than 2 branches

case () of
   _ | cond1 -> ...
      | cond2 -> ...
      | otherwise -> ...

@Bodigrim Bodigrim added the blocked: patch-needed somebody needs to write a patch or contribute code label May 11, 2020
@Bodigrim
Copy link
Contributor

Closing, because findSubstring and findSubstrings were removed in #181. @davidspies thanks for your efforts anyways!

@Bodigrim Bodigrim closed this Aug 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked: patch-needed somebody needs to write a patch or contribute code performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants