feat: Add a custom extractor for www.ndtv.com. #554

jbrayton · 2020-04-27T12:29:50Z

No description provided.

When parsing content for cron.weekly issues, such as the one at https://ma.ttias.be/cronweekly/issue-130/, Mercury Parser would remove headings and ordered lists that were part of the content. This resolves that as follows: * Remove "id" attributes from "h1" and "h2" elements. Those attributes would result in the elements having a low weight. * Since Mercury Parser demotes "h1" elements to "h2", demote "h2" elements to "h3". * Add class="entry-content-asset" to "ul" elements to avoid them being removed.

…e would send contentOnly: true on subsequent pages (page 2). removed failover: true from preview.

Identical to postlight#551

Identical to postlight#552

Feature arstechnica extractor

Identical to postlight#554

nitinthewiz · 2021-07-26T17:51:59Z

src/extractors/collect-all-pages.js

      html,
      $,
      metaCache,
-      contentOnly: true,


What is the logic behind removing this value?

Sorry, I do not recall at this point. Obviously if I thought removing that value a good change I should have included comments around it. But I did not, and I did this over a year ago now.

FYI, it looks like there's some context for this change in #553

jbrayton added 14 commits April 23, 2020 17:34

removed redundant comment.

921d9d4

feat: Add a custom extractor for engadget.com.

0d63d8e

feat: Add a custom extractor for www.ndtv.com.

032ac6c

Works, but I need to figure how to make pagination work correctly.

d6966bd

fixed pagination - would only retrieve first or second page because w…

58612cd

…e would send contentOnly: true on subsequent pages (page 2). removed failover: true from preview.

rolled back { fallback: false } option removal

3df6604

Clarified comments.

9c93f9e

rolling back yarn.lock changes

d0acfb3

Merge pull request #1 from jbrayton/feat-ma-ttias-be-extractor

677b61f

Identical to postlight#551

Merge branch 'master' into feat-engadget-parser

2cfa36b

Merge pull request #2 from jbrayton/feat-engadget-parser

3efb2a9

Identical to postlight#552

Merge branch 'master' into feature-arstechnica-extractor

3768f2e

Merge pull request #3 from jbrayton/feature-arstechnica-extractor

dc8816f

Feature arstechnica extractor

jbrayton mentioned this pull request Apr 27, 2020

Feat ndtv extractor jbrayton/mercury-parser#4

Merged

Merge branch 'master' into feat-ndtv-extractor

795d274

jbrayton added a commit to jbrayton/mercury-parser that referenced this pull request Apr 27, 2020

Merge pull request #4 from jbrayton/feat-ndtv-extractor

0ba3cce

Identical to postlight#554

nitinthewiz reviewed Jul 26, 2021

View reviewed changes

Merge remote-tracking branch 'origin/master' into feat-ndtv-extractor

106853f

johnholdun approved these changes Aug 10, 2022

View reviewed changes

johnholdun merged commit 9a961aa into postlight:master Aug 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add a custom extractor for www.ndtv.com. #554

feat: Add a custom extractor for www.ndtv.com. #554

Uh oh!

jbrayton commented Apr 27, 2020

Uh oh!

nitinthewiz Jul 26, 2021

Uh oh!

jbrayton Jul 26, 2021

Uh oh!

johnholdun Aug 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Add a custom extractor for www.ndtv.com. #554

feat: Add a custom extractor for www.ndtv.com. #554

Uh oh!

Conversation

jbrayton commented Apr 27, 2020

Uh oh!

nitinthewiz Jul 26, 2021

Choose a reason for hiding this comment

Uh oh!

jbrayton Jul 26, 2021

Choose a reason for hiding this comment

Uh oh!

johnholdun Aug 10, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants