Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display repository README on package landing page #560

Closed
jeramyRR opened this issue Aug 28, 2017 · 18 comments
Closed

Display repository README on package landing page #560

jeramyRR opened this issue Aug 28, 2017 · 18 comments

Comments

@jeramyRR
Copy link
Contributor

Rust's cargo.io recently announced some new features that were a nice addition to the package managers site. One such feature is the addition of displaying the a packages github (or whatever repo) README file.

This is different in how hex.pm displays information related to a project. Hex currently provides a link to the packages repo, but doesn't display the README directly.

I think this feature would be greatly appreciated on hex.pm.

@jeramyRR
Copy link
Contributor Author

I apologize if this isn't the correct way to suggest features for hex, but the README does not discuss what route to take for this action.

@ericmj
Copy link
Member

ericmj commented Aug 28, 2017

This is something I would like to have but it has been rejected in the past because of the associated security issues. The readme is user generated content so we need to be careful with how we render and display it, for example hexdocs.pm is on a separate domain to prevent attacks on hex.pm where you are signed in.

Readmes can be in many different formats, but the majority is probably in markdown so we can make it easier by only supporting markdown. We need to filter the generated HTML to prevent XSS attacks [1]. There is a library for this https://github.com/rrrene/html_sanitize_ex, but I don't know enough about HTML sanitization to evaluate it properly, it also depends on an entire web server that hasn't had a release on Hex in over a year so I feel hesitant.

Hex.pm is a critical infrastructure for the Elixir/Erlang communities so we need be very careful that we are not introducing unnecessary security risks. /cc @josevalim @GriffinMB

Can you link to cargo's announcement?

[1] https://github.com/showdownjs/showdown/wiki/Markdown's-XSS-Vulnerability-(and-how-to-mitigate-it)

@josevalim
Copy link
Member

josevalim commented Aug 28, 2017

@ericmj I wonder if we can solve this issue by having hexreadme.pm and embedding the markdown documents as an iframe. Although I am not sure if vulnerabilities can be exposed like that. I suppose iframes are mostly safe since people use it to embed third-party contents frequently?

@josevalim
Copy link
Member

Another idea is to do the sanitization on the javascript side, since we can rely on the browser tools for parsing and manipulating the tree. Here is an example: https://www.quaxio.com/html_white_listed_sanitizer/

@josevalim
Copy link
Member

/cc @hrefhref

@hrefhref
Copy link

I've had a look at crates.io' implementation, and it works by rendering the README, sanitizing the output with Ammonia, uploaded to S3 and loaded onto crates.io by an XHR request.

Adding a restrictive CSP policy adds another level of security, as it will safe-guard from the sanitizer possible flaws.

With a sanitizer & CSP, I think the iframe is unnecessary.

Another problem to consider is the images. They will be loaded from an external source: the hex.pm users IPs will be leaked to the source, and the CSP policy will need to allow all hosts for images. If the source is over HTTP, depending on the server configuration/CSP it may not load, and if it loads that'll display a warning in the browser URL bar (mixed content).

The solution for the images issue is to proxy all images, like GitHub's camo. This adds infrastructure requirements.

@GriffinMB
Copy link

I agree with @hrefhref that sanitization plus a good CSP should be sufficient. Additionally, readme content doesn't typically need to be particularly "rich", so I also think you would be able to be extra conservative in what tags and attributes you allow. IE, don't allow the target attribute for links, or user-specified alt-name for images, etc. I think the key will be deciding upon as minimal an HTML tag-set as possible.

The most difficult part of this may be finding a trusted sanitizer -- the html_sanitize_ex linked above looks like it's heading in the right direction, but I haven't really gotten a solid look at it. The good news is, if there isn't already a well-vetted and robust HTML sanitizer that the community uses, this process could lead to it 😄

@jeramyRR
Copy link
Contributor Author

Here is the link to the crates.io announcement which is really just a pull request. They officially announced it at RustConf.

rust-lang/crates.io#869

@ericmj
Copy link
Member

ericmj commented Aug 28, 2017

Thanks for you input, sanitizing and setting a restrictive CSP seems like the way to go. Would loading the readme inside an iframe instead of loading directly or over XHR add any safety?

I think we are going to go with a more established sanitizer such as the one used by github https://github.com/github/markup or cargo.io https://github.com/notriddle/ammonia and call it over a port, then upload the html and assets to static file hosting.

@GriffinMB
Copy link

There's the sandbox attribute for iframes, which should prevent JavaScript execution. It would only work for modern browsers, but could also be beneficial.

@hrefhref
Copy link

Would loading the readme inside an iframe instead of loading directly or over XHR add any safety?

I don't think so. With the sanitizer and CSP policy everything is covered.

@GriffinMB I think this attribute is only useful when you don't control what's going to be in the iframe

@ericmj
Copy link
Member

ericmj commented Aug 30, 2017

@hrefhref Sanitizers can have bugs so if we can add sandboxing that would be beneficial.

@hrefhref
Copy link

A restrictive CSP is enough:

content-security-policy: default-src 'none'; style-src 'self'; script-src 'self'; manifest-src 'self'; font-src 'self'; media-src 'self'; img-src 'self';

will only allow to load stylesheets, scripts and fonts from self (the site hostname) and nothing else. No inline scripts, styles, objects, iframes, ….

@GriffinMB
Copy link

GriffinMB commented Aug 30, 2017

Chiming in to reaffirm that an appropriately restrictive CSP + sanitizer would definitely be enough. Really, a restrictive enough HTML sanitizer should be enough on its own. The sandbox attribute is just another/different option for preventing JavaScript execution, with the benefit of additionally supporting older browsers like IE10.

Edit: I sound more pro-sandbox than I intend to. It's just an additional option w/ pros and cons.

@jeramyRR
Copy link
Contributor Author

So, I think its been decided that this is doable, but with a bit of defensive work. What if we changed the requirements a little bit and switched from displaying github readmes to hexdocs readmes? An example would be the readme.html from watchdogs hexdocs.

https://hexdocs.pm/watchdog/readme.html

@ericmj
Copy link
Member

ericmj commented Sep 22, 2017

@jeramyRR Unless I misunderstand you I don't think that changes anything because both are rendered from the same markdown readme file.

@josecfreittas
Copy link

josecfreittas commented Apr 24, 2020

Hi, this is open for a while... Only leaving my +1 here.
Comparing to npm, crates, pub.dev, pip, packagist, etc... hex.pm fells a little empty and less convenient without this :/

@ericmj
Copy link
Member

ericmj commented Jan 29, 2021

This is being added to the website redesign.

@ericmj ericmj closed this as completed Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants