-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve Relative URIs #104
Comments
Hi @jangernert , that is an interesting use case. One of the principles I wanted to follow with feed-rs is to perform standard transformations/processing on behalf of users. e.g. all formats should be represented by the same data model. It also extends to implementing aspects of specifications that everyone would also have to implement to be "correct" e.g. propagating top-level MediaRSS properties down through the tree. So this functionality definitely falls within feed-rs scope. Certainly the |
Unrelated to this specific issue, I'm slowly chipping away at iTunes support which does change the way feed-rs will populate its model, so I'll release that as 0.6. Would be good to include something like this in 0.6 as well given its a change to the way URLs will be constructed. |
The provided feed is just a minimal example, so I'm not sure how useful it is for you: https://gitlab.com/news-flash/news_flash_gtk/-/issues/206 |
At least this one should not be handled by
as it would require another HTTP request. edit: it could make sense to pass the content-location as an option if that is what you meant. |
In addition to enclosures This seems a bit more tricky. I don't see a way around parsing the html and then parsing all the links looking for |
That is unfortunate. I'm comfortable switching all the URLs in the RSS/Atom content to absolute by applying |
Alternatively you could provide the base URL as part of the Regarding HTML parsing in rust: I was indirectly using html5ever. But at least for my use case (html2text) it was quite slow compared to good old libxml2. |
Awesome! Could you provide a short summary of what went into #107 ? The description speak of I'm assuming content-location header as base is still up to the user of |
Hah, you beat me to it. There is a new method, e.g. if you had a feed retrieved from http://example.com/feed/rss.xml you'd pass this to There are a couple of test cases that might make it more clear in |
Oh, and the test cases you provided in the original NewsFlash issue are in the RSS2 test suite as well. |
I didn't worry about it as the additional dependency on the
Yes, correct - you'd pass that as the |
Thanks! I'll make use of the new version as soon as possible :) |
Some feeds provide relative URIs for things like enclosures. I recently implemented some basic code to resolve these URIs in my Feed Reader based on
feed-rs
. The bug report I got was refreshingly detailed and helpful. I got linked a step by step description how a big python feed parser handles this problem:https://pythonhosted.org/feedparser/resolving-relative-links.html#how-relative-uris-are-resolved
But since the first few steps rely on knowledge of the feeds XML they probably should be implemented in
feed-rs
. Later steps which use fields of the HTTP header however can't be implemented infeed-rs
and need to be handled in the library/program that makes use of it.What is your opinion on the issue?
If URIs get resolved but some of them fail because there is not enough information in the XML itself should the resulting
Link
struct indicate that it is a partial URL? Or should the calling library just watch forurl::ParseError::RelativeUrlWithoutBase
?The text was updated successfully, but these errors were encountered: