Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Images in Page Bundles Missing from RSS Feed #384

Closed
2 of 3 tasks
slopp opened this issue May 16, 2019 · 5 comments
Closed
2 of 3 tasks

Images in Page Bundles Missing from RSS Feed #384

slopp opened this issue May 16, 2019 · 5 comments

Comments

@slopp
Copy link

slopp commented May 16, 2019

Version 0.1 changed the default behavior to use post bundles:

You can create a new post as the index file of a Hugo page bundle via blogdown::new_post() or the RStudio addin "New Post" if you set options(blogdown.new_bundle = TRUE). One benefit of using a page bundle instead of a normal page is that you can put resource files associated with the post (such as images) under the same directory of the post itself. This means you no longer have to put them under the static/ directory, which has been quite confusing to Hugo beginners (thanks, @DavisVaughan @romainfrancois @apreshill, #351).

Images added to the site as resource files in this way do not show up in RSS feeds. For example, refer to: https://github.com/rstudio/blog/pull/173

The first image is referenced from the root static directory, and is correctly qualified:

<figure>
    <img src="https://blog.rstudio.com/images/rsc-174-schedules.png"
         alt="View Scheduled Content"/> <figcaption>
            <p>View Scheduled Content</p>
        </figcaption>
</figure>

Whereas the second image is referenced in a page bundle, and is incorrect:

<figure>
    <img src="assets-ci.png"
         alt="CI/CD Toolchains"/> <figcaption>
            <p>Integrate Connect into CI/CD Toolchains</p>
        </figcaption>
</figure>

For certain sites, such as the rstudio blog, this issue also prevents the relative image from being correctly rendered as a thumbnail for a post.


By filing an issue to this repo, I promise that

  • I have fully read the issue guide at https://yihui.name/issue/.
  • I have provided the necessary information about my issue.
    • If I'm asking a question, I have already asked it on Stack Overflow or RStudio Community, waited for at least 24 hours, and included a link to my question there.
    • If I'm filing a bug report, I have included a minimal, self-contained, and reproducible example, and have also included xfun::session_info('blogdown'). I have upgraded all my packages to their latest versions (e.g., R, RStudio, and R packages), and also tried the development version: remotes::install_github('rstudio/blogdown').
    • If I have posted the same issue elsewhere, I have also mentioned it in this issue.
  • I have learned the Github Markdown syntax, and formatted my issue correctly.
@yihui
Copy link
Member

yihui commented May 24, 2019

I think this issue belongs to Hugo; figure is a built-in shortcode in Hugo. My guess is that they may need to explicitly make relative URLs absolute here: https://github.com/gohugoio/hugo/blob/2278b0eb02ccdd3c2d4358d39074767d33fecb71/tpl/tplimpl/embedded/templates/shortcodes/figure.html#L5

Anyway, if the actual trouble is the thumbnail, one workaround is that you use absolute URLs, e.g. https://blog.rstudio.com/2019/05/14/introducing-saml-in-rstudio-connect/assets-ci.png. Or upload images to Github issues/PRs and use those URLs as I mentioned in README: https://github.com/rstudio/blog#manage-images This is probably the easiest way to go.

Images added to the site as resource files in this way do not show up in RSS feeds.

That is up to the RSS parser/reader. For example, Feedly can correct parse and show the images with relative URLs, and I know R-bloggers.com has had problems with relative URLs in RSS feeds for long.

@yihui
Copy link
Member

yihui commented Dec 23, 2020

The thumbnail display issue has been resolved last year via https://github.com/rstudio/blog/pull/199. For the rest of issues (the figure shortcode and the image URLs in posts), I'm afraid I can't do much about them since they are beyond my control. As I said, the easiest solution is probably to use absolute URLs by yourself.

@yihui yihui closed this as completed Dec 23, 2020
@apreshill
Copy link
Contributor

apreshill commented Dec 24, 2020

I'm dealing with this on the theme-side, which is where I think it (currently) needs to be addressed.

Unfortunately, the current behavior punishes folks who use bundled posts, because you insert images using relative links. This is easier for users, but Hugo doesn't have an easy way to turn these relative image links inside content pages into absolute urls. Here are 2 things theme developers can do:

  1. Use a delicate regex to essentially find all HTML image links and convert the relative url to absolute url (https://jdheyburn.co.uk/blog/who-goes-blogging-6-three-steps-to-improve-hugos-rss-feeds/#fix-image-rendering)

  2. If you are lucky enough to have users reliably use markdown syntax to generate image links (instead of HTML) AND use the goldmark renderer, you can add a theme-level render hook: https://discourse.gohugo.io/t/how-does-render-image-rss-xml-work/29935

@yihui
Copy link
Member

yihui commented Dec 30, 2020

Both relative and absolute URLs have their pros and cons. I prefer using relative URLs by myself whenever possible. As I said in my first reply, if the relative URLs don't work in a RSS reader (including R-bloggers), it's not necessarily the problem of the RSS feed. It could be fixed by the RSS feed reader, and it is up to the feed reader whether they are willing to deal with this problem. Feedly is one example that reads relative URLs of images and turns them into absolute URLs correctly.

From a technical point of view, I think this problem is easier to address from the RSS reader's side than the feed author's side, since the RSS reader has to read XML anyway (then it can do the post-processing). It's definitely fixable from the site author's side, but (1) the solution would be hackish, and (2) every site author would have to do this on every site. If it has to be done anyway, I'd go with the first approach you mentioned. That approach is not a complete solution, though. It needs to exclude the cases of absolute URLs, i.e., if a URL is already absolute, we shouldn't prepend the permanent URL to image URLs.

I added a shortcode blogdown/postref in 11fec4c. Authors may also use this shortcode, e.g., ![]({{< blogdown/postref >}foo.png}). I don't like this solution. Currently, blogdown uses this shortcode to generate absolute URLs for R plots from code chunks. For other images, you have to apply the shortcode manually.

Yet another solution is to post-process the RSS files using XML tools. This is not very different with the approach https://jdheyburn.co.uk/blog/who-goes-blogging-6-three-steps-to-improve-hugos-rss-feeds/#fix-image-rendering, but it can provide a much more general tool to deal with any RSS feeds (not limited to those generated by Hugo). This shouldn't be hard to implement. The only problem is that, again, I feel it would be much more efficient for the RSS reader to address this issue, so no authors would need to deal with this annoying problem.

@yihui
Copy link
Member

yihui commented Jan 6, 2021

I asked a friend and he wrote some Python code to process any RSS feeds to convert relative URLs to absolute URLs: https://github.com/elisong/rssxml-linksub

The code is currently deployed as a service at: https://rssxml-linksub.herokuapp.com You can pass any RSS feed to it and it will returned the processed feed, e.g., https://rssxml-linksub.herokuapp.com/api?rss=https://blog.rstudio.com/index.xml You can see, for example, the image table-hexes.png has an absolute URL (compare with the original feed https://blog.rstudio.com/index.xml).

Technically it is pretty much one regular expression: https://github.com/elisong/rssxml-linksub/blob/05aefb6/api/api.py#L38-L41 which is why I said it should be relatively easy for the RSS reader to address this issue.

That said, I might spend some time on improving this approach in the future: https://jdheyburn.co.uk/blog/who-goes-blogging-6-three-steps-to-improve-hugos-rss-feeds/#fix-image-rendering His regex didn't consider the case of URLs that are already absolute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants