New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates Python soup_adapter to use BeautifulSoup 4 #368

Open
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@romanvm

romanvm commented Jul 19, 2016

Also fixes the indentation according to PEP-8

Roman Miroshnychenko
Updates soup_adapter to use BeautifulSoup 4
Also fixes the indentation according to PEP-8
@googlebot

This comment has been minimized.

googlebot commented Jul 19, 2016

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please let us know the company's name.
@romanvm

This comment has been minimized.

romanvm commented Jul 19, 2016

I signed it!

@googlebot

This comment has been minimized.

googlebot commented Jul 19, 2016

CLAs look good, thanks!

Roman Miroshnychenko
@rofl0r

This comment has been minimized.

rofl0r commented Apr 11, 2017

nice work! however i personally would prefer if you had done the gratuitous whitespace changes in a separate commit, since it makes it hard to see what you changed.

@romanvm

This comment has been minimized.

romanvm commented Apr 13, 2017

Sorry but all my code editors tools are set for 4 spaces in Python. The changes are not that big: lines 29, 60, 108-112, and 117 in the changed file. Default parser html.parser is set just to suppress BS4 warnings, because it does not really parse anything.

@DemiMarie

This comment has been minimized.

DemiMarie commented Jun 3, 2017

Ping?

@samreflexive

This comment has been minimized.

samreflexive commented Apr 27, 2018

Bump!

@wumpus

This comment has been minimized.

wumpus commented Jul 26, 2018

beautifulsoup4 was released in 2014, this PR is from 2017, the last accepted PR in this project is 2016, and it's now 2018.

@romanvm

This comment has been minimized.

romanvm commented Jul 26, 2018

If you really need BS4 support, I'd recommend Sigil fork: https://github.com/Sigil-Ebook/sigil-gumbo

However, according to my benchmarks, gumbo + BS4 does not offer any speed benefits over BS4 + html5lib, probably because of ctypes bindings overhead. I even tried to write native Python bindings with Pybind11 library but speed gain was not that significant. Anyway, the library seems abandoned, and html5lib is a safe bet if you need a compliant html5 Python parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment