gizmodo/lifehacker don't work (they store text in JSON-LD now) #898

vshabanov · 2021-08-09T13:48:37Z

Hi, both gizmodo and lifehacker now store article text in JSON-LD:

<script type="application/ld+json">...</script>

Could you, please, add support for JSON-LD documents?

The text was updated successfully, but these errors were encountered:

fivefilters · 2021-08-09T15:25:30Z

Hi Vladimir, I think they still contain the content, but for some reason the HTML5 parser seems not to be able to parse it propertly.

In my tests switching to the libxml parser does appear to work. Are you able to try the updated site config for gizmodo and see if you have any luck: https://github.com/fivefilters/ftr-site-config/blob/master/gizmodo.com.txt

If you don't, can you please give us a URL so we can test with it and see if there's another solution.

When I looked the content in JSON-LD does not contain any HTML markup, and at the moment the Full-Text RSS code only uses JSON-LD for other metadata, not content.

vshabanov · 2021-08-10T12:14:29Z

Oh, that's great. I've tried to search for some text but they randomly insert  comments in HTML so I haven't found it.

Both Gizmodo and Lifehacker are working now. Thank you!

fivefilters · 2021-08-11T19:03:44Z

It's pretty strange. The HTML they serve is a mess. If you disable Javascript in the browser, nothing loads (just a blank screen) and Firefox's developer tools don't seem to be able parse the HTML without Javascript for some reason, so you get a very minimal DOM tree from the result of Firefox's parsing compared to the actual source HTML returned by the server. I think that's what's happening when Full-Text RSS tries to use the HTML5 parser (HTML5PHP) too, although I haven't tested extensively. But the HTML does contain the content, and libxml can parse it, so that's the main change in the site config file that makes this work again: parser: libxml. Very odd.

vshabanov closed this as completed Aug 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gizmodo/lifehacker don't work (they store text in JSON-LD now) #898

gizmodo/lifehacker don't work (they store text in JSON-LD now) #898

vshabanov commented Aug 9, 2021 •

edited

Loading

fivefilters commented Aug 9, 2021 •

edited

Loading

vshabanov commented Aug 10, 2021

fivefilters commented Aug 11, 2021 •

edited

Loading

gizmodo/lifehacker don't work (they store text in JSON-LD now) #898

gizmodo/lifehacker don't work (they store text in JSON-LD now) #898

Comments

vshabanov commented Aug 9, 2021 • edited Loading

fivefilters commented Aug 9, 2021 • edited Loading

vshabanov commented Aug 10, 2021

fivefilters commented Aug 11, 2021 • edited Loading

vshabanov commented Aug 9, 2021 •

edited

Loading

fivefilters commented Aug 9, 2021 •

edited

Loading

fivefilters commented Aug 11, 2021 •

edited

Loading