Skip to content

[client] cannot properly parse <head> when using CRLF #1067

@LorenzoLeonardini

Description

@LorenzoLeonardini

Expected Behavior

parse(
  `<html>\r\n<head>\r\n<title>Test</title>\r\n</head>\r\n<body>\r\n</body>\r\n</html>`
)

should produce:

- element (html)
    - element (head)
        - text (\r\n)
        - element (title)
            - text (Test)
        - text (\r\n)
    - element (body)
        - text (\r\n)

Actual Behavior

parse(
  `<html>\r\n<head>\r\n<title>Test</title>\r\n</head>\r\n<body>\r\n</body>\r\n</html>`
)

instead produces:

- element (html)
    - element (head)
    - element (body)
        - text (\r\n)
        - element (title)
            - text (Test)
        - text (\r\n\r\n)

Steps to Reproduce

parse(
  `<html>\r\n<head>\r\n<title>Test</title>\r\n</head>\r\n<body>\r\n</body>\r\n</html>`
)

I believe the issue is that, by replacing \r with __HTML_DOM_PARSER_CARRIAGE_RETURN_PLACEHOLDER_timestamp__, the head is no longer a valid head and therefore domParser.parseFromString flattens everything into a new body.

The issue first presented itself with version 5.0.11

Metadata

Metadata

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions