feat: ma.ttias.be extractor #551

jbrayton · 2020-04-23T21:49:22Z

When parsing content for cron.weekly issues, such as the one at https://ma.ttias.be/cronweekly/issue-130/, Mercury Parser would remove headings and ordered lists that were part of the content. It also demoted h1 elements to h2, giving other h2 elements the appearance of being at the same level in the organizational hierarchy of the document. This resolves these issues as follows:

Remove id attributes from h1 and h2 elements. Those attributes would result in the elements having a low weight.
Since Mercury Parser demotes h1 elements to h2, demote h2 elements to h3.
Add a class="entry-content-asset" attribute to ul elements to avoid them being removed.

The site does not have deks or lead images, so those are not in the extractor.

When parsing content for cron.weekly issues, such as the one at https://ma.ttias.be/cronweekly/issue-130/, Mercury Parser would remove headings and ordered lists that were part of the content. This resolves that as follows: * Remove "id" attributes from "h1" and "h2" elements. Those attributes would result in the elements having a low weight. * Since Mercury Parser demotes "h1" elements to "h2", demote "h2" elements to "h3". * Add class="entry-content-asset" to "ul" elements to avoid them being removed.

Identical to postlight#551

jbrayton added 2 commits April 23, 2020 17:34

removed redundant comment.

921d9d4

jbrayton changed the title ~~Feat ma ttias be extractor~~ feat: ma.ttias.be extractor Apr 24, 2020

jbrayton mentioned this pull request Apr 27, 2020

Feat ma ttias be extractor jbrayton/mercury-parser#1

Merged

jbrayton added a commit to jbrayton/mercury-parser that referenced this pull request Apr 27, 2020

Merge pull request #1 from jbrayton/feat-ma-ttias-be-extractor

677b61f

Identical to postlight#551

Merge branch 'master' into feat-ma-ttias-be-extractor

8d9a888

johnholdun approved these changes May 9, 2022

View reviewed changes

johnholdun merged commit e217648 into postlight:master May 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: ma.ttias.be extractor #551

feat: ma.ttias.be extractor #551

Uh oh!

jbrayton commented Apr 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: ma.ttias.be extractor #551

feat: ma.ttias.be extractor #551

Uh oh!

Conversation

jbrayton commented Apr 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants