Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional parsing for scripts #1093

Merged
merged 7 commits into from Dec 3, 2017
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
32 changes: 32 additions & 0 deletions src/Html/HtmlParser.fs
Expand Up @@ -423,8 +423,40 @@ module internal HtmlParser =
and script state =
match state.Peek() with
| TextParser.EndOfFile _ -> data state
| ''' -> state.Cons(); scriptSingleQuoteString state
| '"' -> state.Cons(); scriptDoubleQuoteString state
| '/' -> state.Cons(); scriptSlash state
| '<' -> state.Pop(); scriptLessThanSign state
| _ -> state.Cons(); script state
and scriptSingleQuoteString state =
match state.Peek() with
| ''' -> state.Cons(); script state
| _ -> state.Cons(); scriptSingleQuoteString state
and scriptDoubleQuoteString state =
match state.Peek() with
| '"' -> state.Cons(); script state
| _ -> state.Cons(); scriptDoubleQuoteString state
and scriptSlash state =
match state.Peek() with
| '/' -> state.Cons(); scriptSinleLineComment state
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: sinle instead of single

| '*' -> state.Cons(); scriptMultiLineComment state
| _ -> state.Cons(); scriptRegex state
and scriptMultiLineComment state =
match state.Peek() with
| '*' -> state.Cons(); scriptMultiLineCommentStar state
| _ -> state.Cons(); scriptMultiLineComment state
and scriptMultiLineCommentStar state =
match state.Peek() with
| '/' -> state.Cons(); script state
| _ -> scriptMultiLineComment state
and scriptSinleLineComment state =
match state.Peek() with
| '\n' -> state.Cons(); script state
| _ -> state.Cons(); scriptSinleLineComment state
and scriptRegex state =
match state.Peek() with
| '/' -> state.Cons(); script state
| _ -> state.Cons(); scriptRegex state
and scriptLessThanSign state =
match state.Peek() with
| '/' -> state.Pop(); scriptEndTagOpen state
Expand Down
12 changes: 12 additions & 0 deletions tests/FSharp.Data.Tests/HtmlParser.fs
Expand Up @@ -96,6 +96,18 @@ let ``Can handle attributes with no value``() =
]
node.Attributes() |> should equal expected

[<TestCase("var r =\"</script>\"")>]
[<TestCase("var r ='</script>'")>]
[<TestCase("var r =/</g")>]
Copy link

@ovatsus ovatsus Oct 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test case with "var r =/\/</g"

[<TestCase("//</script>\n")>]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the opposite case, not having the \n at the end to make sure it fails

[<TestCase("/*</script>*/")>]
[<TestCase("/*</script>**/")>]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add test cases that use /* */ and span multiple lines?

let ``Can handle special characters in scripts`` content =
let html = sprintf "<script>%s</script>" content
let node = HtmlNode.Parse html |> List.head
let expected = HtmlNode.NewElement("script", [ HtmlNode.NewText content ])
node |> should equal expected

[<Test>]
let ``Can parse tables from a simple html``() =
let html = """<html>
Expand Down