Skip to content

Conversation

@jbrayton
Copy link
Contributor

No description provided.

jbrayton added 14 commits April 23, 2020 17:34
When parsing content for cron.weekly issues, such as the one at https://ma.ttias.be/cronweekly/issue-130/, Mercury Parser would remove headings and ordered lists that were part of the content. This resolves that as follows:

* Remove "id" attributes from "h1" and "h2" elements. Those attributes would result in the elements having a low weight.
* Since Mercury Parser demotes "h1" elements to "h2", demote "h2" elements to "h3".
* Add class="entry-content-asset" to "ul" elements to avoid them being removed.
…e would send contentOnly: true on subsequent pages (page 2).

removed failover: true from preview.
jbrayton added a commit to jbrayton/mercury-parser that referenced this pull request Apr 27, 2020
html,
$,
metaCache,
contentOnly: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the logic behind removing this value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I do not recall at this point. Obviously if I thought removing that value a good change I should have included comments around it. But I did not, and I did this over a year ago now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, it looks like there's some context for this change in #553

@johnholdun johnholdun merged commit 9a961aa into postlight:master Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants