Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using BeautifulSoup instead of built-in HTML parser? #26

Closed
astrofrog opened this issue Feb 5, 2017 · 5 comments
Closed

Consider using BeautifulSoup instead of built-in HTML parser? #26

astrofrog opened this issue Feb 5, 2017 · 5 comments

Comments

@astrofrog
Copy link
Contributor

I wonder whether it would be worth considering using an existing HTML parser such as BeautifulSoup to avoid having to include C code in the linkchecker package? This might lower the maintenance burden in the long term (since keeping C extensions working across platforms is not trivial).

@anarcat
Copy link
Contributor

anarcat commented Feb 6, 2017

i absolutely agree. we have tons of duct tape as well to parse that HTML that gives us weird results, #23 for example.

@ghost
Copy link

ghost commented Feb 7, 2017

Definitely. I need an immediate fix for #23 just to be able to use linkchecker, but what we've added now is an ugly patch, and I can see us needing to add more of these with the current homegrown parser.

@PetrDlouhy
Copy link
Contributor

Much of my work in #40 is related to the HTML parser and there are still two remaining problems with that which cause failed tests on Python 3 and I am unable to solve them right now.

There would be ton of special cases, which are properly solved in more widely used parser, that might not be solved in the build-in parser.

So I think, that it would be huge benefit, if it gets implemented.

@PetrDlouhy
Copy link
Contributor

I have worked on this. See #119. It would require extensive testing an possibly some improvements, though.

@cjmayo
Copy link
Contributor

cjmayo commented Aug 6, 2020

Good idea! Thanks Petr for showing it was possible.
Done.

@cjmayo cjmayo closed this as completed Aug 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants