Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Depth-First Search not really Depth-First ? #59

Closed
Raveline opened this issue Apr 18, 2017 · 3 comments
Closed

Depth-First Search not really Depth-First ? #59

Raveline opened this issue Apr 18, 2017 · 3 comments

Comments

@Raveline
Copy link
Contributor

Thanks for this great, easy-to-use, smartly thought out, library.

According to comments, the selectNodes function is a DFS implementation. I'm not completely sure (manual recursion is not my forte), but I think it's not.

Example case:

$ stack ghci
> :set -XOverloadedString
> let xml = "<html><div><p>p1</p><p>p2</p><blockquote><p>p3</p></blockquote><p>p4</p></div></html>"
> scrapeStringLike xml (texts $ "div" // "p")
Just ["p1","p2","p4","p3"]

But this XML:

<html>
  <div>
    <p>p1</p>
    <p>p2</p>
    <blockquote><p>p3</p></blockquote>
    <p>p4</p>
  </div>
</html>

Should, in DFS, be processed through in the order given by the letter in parenthesis in the example below:

Span 0 17 (A)
|
`- Span 1 16 (B)
   |
   +- Span 2 4 (C)
   |
   +- Span 5 7 (D)
   |
   +- Span 8 12 (E)
   |  |
   |  `- Span 9 11 (F)
   |
   `- Span 13 15 (G)

So the selector should yield Just ["p1","p2","p3","p4"].

Unless the previous result was the intended behaviour ? In which case I think comments and README should warn about this.

Suggested fix

In this pattern (https://github.com/fimad/scalpel/blob/master/scalpel-core/src/Text/HTML/Scalpel/Internal/Select.hs#L222), we should not have:

| otherwise = selectNodes [n] (tags, fs, ctx) $ selectNodes [n] (tags, Tree.subForest f, ctx) acc

Or we won't get depth-first; we should rather have:

| otherwise  = selectNodes [n] (tags, Tree.subForest f, ctx) $ selectNodes [n] (tags, fs, ctx) acc

This pass tests and gives the proper result with the former ghci example. I have a PR ready to fire if needed.

@fimad
Copy link
Owner

fimad commented Apr 23, 2017

Yep it is intended to be DFS, I'd be happy to take a PR that updates the behavior accordingly.

@Raveline
Copy link
Contributor Author

There you go : #60

I also added a simple test case, so if you have a better solution than mine, you can just pick the test to check yours against the bug.

@fimad
Copy link
Owner

fimad commented Apr 25, 2017

Just pushed v0.5.1 with this change. Thanks again for the fix!

@fimad fimad closed this as completed Apr 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants