Fix issue #400: Fall back to the original title if there are too many words before the colon #409
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi,
This fixes issue #400.
In the example provided, the
<title>
tag says "Surgical Strikes A Message To Pakistan, More If Necessary: Army Chief General Bipin Rawat" and the<h1>
says "Surgical Strikes A Message To Pakistan, More If Necessary: Army Chief". Since Readability considers the colon as a hierarchical separator (and no H node matches exactly the title), the resulting title is "Army Chief General Bipin Rawat"This patch checks the length of the discarded text before the colon. If there are more than 5 words, it assumes something weird is happening with the title and falls back to the original title that comes from the
<title>
tag.I also added a test case for this.
Let me know if you have any questions.