Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean HTML prior to being loaded into DOMDocument/Crawler #106

Open
derklempner opened this issue Oct 22, 2019 · 0 comments
Open

Clean HTML prior to being loaded into DOMDocument/Crawler #106

derklempner opened this issue Oct 22, 2019 · 0 comments
Labels
enhancement New feature or request

Comments

@derklempner
Copy link
Contributor

Description
Sometimes malformed HTML can cause PHP DOMDocument/libxml to choke/generate a DOM representation that is different from the HTML you may be expecting. This can cause selectors to fail.

Proposed solution
Optionally clean the html before it is processed (using something like html tidy or other tool).

@steveworley steveworley added the enhancement New feature or request label Feb 4, 2020
@steveworley steveworley added this to the 0.5.0 milestone Feb 4, 2020
@stooit stooit removed this from the 0.5.0 milestone Apr 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants