Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect CDATA[[ sections when parsing HTML #298

Merged
merged 1 commit into from
May 13, 2020

Conversation

Corion
Copy link

@Corion Corion commented May 8, 2020

This changes the HTML parser behaviour to properly respect
CDATA[[ sections and to ignore link tags in Javascript code.

The old behaviour can be restored by passing undef as the "marked_sections"
option when creating the WWW::Mechanize object:

my $mech = WWW::Mechanize->new(
    marked_sections => undef,
);

The patch also includes a (nasty) test file to check the old and
new behaviour

See also the discussion in https://perlmonks.org/?node_id=11116478 and https://gist.github.com/haukex/fd76efa16f0b07ce6a7441d9b2265b2a for more context

@coveralls
Copy link

coveralls commented May 8, 2020

Pull Request Test Coverage Report for Build 337

  • 12 of 12 (100.0%) changed or added relevant lines in 1 file are covered.
  • 41 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.04%) to 94.279%

Files with Coverage Reduction New Missed Lines %
lib/WWW/Mechanize.pm 41 93.95%
Totals Coverage Status
Change from base Build 336: 0.04%
Covered Lines: 758
Relevant Lines: 804

💛 - Coveralls

@oalders
Copy link
Member

oalders commented May 12, 2020

@Corion would you be able to rebase this? I merged a much older PR just now that touches the same code.

This changes the HTML parser behaviour to properly respect
CDATA[[ sections and to ignore link tags in Javascript code.

The old behaviour can be restored by passing undef as the "marked_sections"
option when creating the WWW::Mechanize object:

    my $mech = WWW::Mechanize->new(
        marked_sections => undef,
    );

The patch also includes a (nasty) test file to check the old and
new behaviour
@Corion
Copy link
Author

Corion commented May 13, 2020

No problem - the rebased version passes tests locally, let's see if it also passes tests on CI

@oalders
Copy link
Member

oalders commented May 13, 2020

Thanks @Corion!

@oalders oalders merged commit 46d0c20 into libwww-perl:master May 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants