-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Regex HTML parsing and use beautiful soup instead. #156
base: develop
Are you sure you want to change the base?
Conversation
Builds failed as BeautifulSoup4 4.6.3 seems to remove the closing backslash. Downgrading to 4.6.0 for the time being. |
This is fine - it's consistent with everything else to avoid the XML style of void elements. Feel free to fix those tests and bump the Soup version to the latest. |
cms/html.py
Outdated
attrs["height"] = '"%s"' % thumbnail.height | ||
else: | ||
assert False | ||
return str(soup.decode(formatter=None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should six.text_type(soup)
be fine here? (Or str(soup)
whenever we remove Python 2 support.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the docs:
If you pass in formatter=None, Beautiful Soup will not modify strings at all on output. This is the fastest option, but it may lead to Beautiful Soup generating invalid HTML/XML[.]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's formatter=None at the moment as otherwise soup turns the ©
into ©
No description provided.