Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading hackernews #340

Closed
Steveaxelrod007 opened this issue Jul 7, 2017 · 9 comments
Closed

Loading hackernews #340

Steveaxelrod007 opened this issue Jul 7, 2017 · 9 comments

Comments

@Steveaxelrod007
Copy link

When I try to load "https://news.ycombinator.com/rss" I get

com.rometools.rome.io.ParsingFeedException: Invalid XML: Error on line 6: The element type "hr" must be terminated by the matching end-tag "".
at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:236)
at com.rometools.rome.io.SyndFeedInput.build(SyndFeedInput.java:150)
at com.axee.safetyNet.SearchTopNews.getFeed(SearchTopNews.java:556)
at com.axee.safetyNet.SearchTopNews.lambda$goodUrls$8(SearchTopNews.java:121)
at com.axee.safetyNet.SearchTopNews$$Lambda$14/1774033198.run(Unknown Source)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.jdom2.input.JDOMParseException: Error on line 6: The element type "hr" must be terminated by the matching end-tag "".
at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:232)
at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:303)
at org.jdom2.input.SAXBuilder.build(SAXBuilder.java:1196)
at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:233)

I checked the XML and it looks fine and the XML validator says it is ok.

Thank you.

@mishako mishako mentioned this issue Jul 7, 2017
@mishako
Copy link
Member

mishako commented Jul 7, 2017

From #341:

When I try to load "http://www.reddit.com/.rss" I get

com.rometools.rome.io.ParsingFeedException: Invalid XML: Error on line 1: Premature end of file.
at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:236)
at com.rometools.rome.io.SyndFeedInput.build(SyndFeedInput.java:150)
at com.axee.safetyNet.SearchTopNews.getFeed(SearchTopNews.java:557)
at com.axee.safetyNet.SearchTopNews.lambda$goodUrls$5(SearchTopNews.java:90)
at com.axee.safetyNet.SearchTopNews$$Lambda$14/2041039871.run(Unknown Source)
at java.lang.Thread.run(Thread.java:745)

I checked the XML and it looks fine and the XML validator says it is ok.

Thank you.

@mishako
Copy link
Member

mishako commented Jul 7, 2017

Hi Steve! This looks like you get HTML pages instead of XML. Can you show us the code that does the http request?

@Steveaxelrod007
Copy link
Author

My bad on the reddis one, I used http, they want https, chrome fixed for me when I was testing, did not realize. :(

@mishako
Copy link
Member

mishako commented Jul 9, 2017

Cool! What about hackernews? Can you show us the code that does the http request?

@Steveaxelrod007
Copy link
Author

Steveaxelrod007 commented Jul 9, 2017 via email

@Steveaxelrod007
Copy link
Author

Steveaxelrod007 commented Jul 9, 2017 via email

@mishako
Copy link
Member

mishako commented Jul 11, 2017

@Steveaxelrod007 The problem is that the URL class you're using is very basic and doesn't handle many common use cases in the world of HTTP. For example it doesn't follow redirects, so you end up trying to parse the intermediate redirect page instead of the final destination.

The solution is to use any other http library.

@mishako mishako closed this as completed Jul 11, 2017
@Steveaxelrod007
Copy link
Author

Steveaxelrod007 commented Jul 12, 2017 via email

@mishako
Copy link
Member

mishako commented Jul 28, 2017

@Steveaxelrod007
Sorry, I somehow missed your question. I thought I answered it, but it looks like I didn't.

I would suggest to use jersey http client, but there is also apache http client which I think is more popular. We have an example for the http client here: #276

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants