-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gizmodo/lifehacker don't work (they store text in JSON-LD now) #898
Comments
Hi Vladimir, I think they still contain the content, but for some reason the HTML5 parser seems not to be able to parse it propertly. In my tests switching to the libxml parser does appear to work. Are you able to try the updated site config for gizmodo and see if you have any luck: https://github.com/fivefilters/ftr-site-config/blob/master/gizmodo.com.txt If you don't, can you please give us a URL so we can test with it and see if there's another solution. When I looked the content in JSON-LD does not contain any HTML markup, and at the moment the Full-Text RSS code only uses JSON-LD for other metadata, not content. |
Oh, that's great. I've tried to search for some text but they randomly insert Both Gizmodo and Lifehacker are working now. Thank you! |
It's pretty strange. The HTML they serve is a mess. If you disable Javascript in the browser, nothing loads (just a blank screen) and Firefox's developer tools don't seem to be able parse the HTML without Javascript for some reason, so you get a very minimal DOM tree from the result of Firefox's parsing compared to the actual source HTML returned by the server. I think that's what's happening when Full-Text RSS tries to use the HTML5 parser (HTML5PHP) too, although I haven't tested extensively. But the HTML does contain the content, and libxml can parse it, so that's the main change in the site config file that makes this work again: |
Hi, both gizmodo and lifehacker now store article text in JSON-LD:
Could you, please, add support for JSON-LD documents?
The text was updated successfully, but these errors were encountered: