Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue #400: Fall back to the original title if there are too many words before the colon #409

Merged
merged 2 commits into from Nov 27, 2017
Merged

Conversation

andreskrey
Copy link
Contributor

Hi,

This fixes issue #400.

In the example provided, the <title> tag says "Surgical Strikes A Message To Pakistan, More If Necessary: Army Chief General Bipin Rawat" and the <h1> says "Surgical Strikes A Message To Pakistan, More If Necessary: Army Chief". Since Readability considers the colon as a hierarchical separator (and no H node matches exactly the title), the resulting title is "Army Chief General Bipin Rawat"

This patch checks the length of the discarded text before the colon. If there are more than 5 words, it assumes something weird is happening with the title and falls back to the original title that comes from the <title> tag.

I also added a test case for this.

Let me know if you have any questions.

@gijsk
Copy link
Contributor

gijsk commented Nov 27, 2017

Apologies for the delay. This looks fine to me, though it also feels like maybe this section of code needs some refactoring TLC... Oh well, this fixes the problem at hand!

@gijsk gijsk merged commit eb895b9 into mozilla:master Nov 27, 2017
@andreskrey andreskrey deleted the fix-issue-400 branch November 27, 2017 16:03
@gijsk gijsk mentioned this pull request Dec 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants