-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unpaired close tags and self-closing tags #360
Conversation
google#251 assumed that all tags are closed properly. This assumption doesn't stand for cases like: 1. Self-closing tags such as `<img>` don't have corresponding close tags. 2. Unpaired close tags are still valid HTML. This patch supports these cases by assuming all open tags that doesn't nest correctly or that doesn't close are automatically closed. This isn't the full HTML "adoption agency algorithm", but it should be good enough for the needs of BudouX. Fixes google#355
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
tests/test_html_processor.py
Outdated
resolver = html_processor.HTMLChunkResolver(['abxyabc', 'def'], '<wbr>') | ||
resolver.feed(input) | ||
self.assertEqual(resolver.output, expected, | ||
'WBR tags should not be inserted if NOBR.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate this test message by mentioning the IMG tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks, done.
@kojiishi I left a small comment actually. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit request about the test message
#251 assumed that all tags are closed properly.
This assumption doesn't stand for cases like:
<img>
don't have corresponding close tags.This patch supports these cases by assuming all open tags that doesn't nest correctly or that doesn't close are automatically closed.
This isn't the full HTML "adoption agency algorithm", but it should be good enough for the needs of BudouX.
Fixes #355