Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #150 - Keep article intermediary headings. #189

Merged
merged 1 commit into from
Apr 27, 2015
Merged

Conversation

n1k0
Copy link
Contributor

@n1k0 n1k0 commented Apr 27, 2015

We were aggressively stripping out intermediary headings within extracted article contents if they had a link density greater than 1/3rd.

In real world, I believe these headings can have as many links as they want (none, a single one to link to some other part/page, a few…), so I'm removing the link density rule entirely here, while keeping the class weight check.

This fixes #150.

@gijsk
Copy link
Contributor

gijsk commented Apr 27, 2015

Yeah, I don't see the point of including some headers and removing others just depending on whether they have links. Having just half the headers is unlikely to be useful, I'd imagine.

gijsk added a commit that referenced this pull request Apr 27, 2015
Fixes #150 - Keep article intermediary headings.
@gijsk gijsk merged commit 79aa2fc into master Apr 27, 2015
@gijsk gijsk deleted the dont-remove-headings branch April 27, 2015 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LWN.net missing article headers
2 participants