Skip to content

Commit

Permalink
Allow passing bytes input into the lxml.html.* processing functions a…
Browse files Browse the repository at this point in the history
…gain. This was lost in 5.1.0 by globally replacing 'basestring' with 'str'.
  • Loading branch information
scoder committed Jan 10, 2024
1 parent 6133c0e commit 6619dfd
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 1 deletion.
2 changes: 1 addition & 1 deletion src/lxml/html/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -632,7 +632,7 @@ def __init__(self, name, copy=False, source_class=HtmlMixin):
self.__doc__ = getattr(source_class, self.name).__doc__
def __call__(self, doc, *args, **kw):
result_type = type(doc)
if isinstance(doc, str):
if isinstance(doc, (str, bytes)):
if 'copy' in kw:
raise TypeError(
"The keyword 'copy' can only be used with element inputs to %s, not a string input" % self.name)
Expand Down
5 changes: 5 additions & 0 deletions src/lxml/html/tests/test_rewritelinks.txt
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,11 @@ link)``, which is awkward to test here, so we'll make a printer::
img src="/logo.gif"
td style="/quoted.png"@23

This also works directly on bytes input::

>>> print_iter(iterlinks(b'<html><body><a href="https://lxml.de/">lxml</a></body></html>'))
a href="https://lxml.de/"

An application of ``iterlinks()`` is ``make_links_absolute()``::

>>> from lxml.html import make_links_absolute
Expand Down

0 comments on commit 6619dfd

Please sign in to comment.