New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show Contents of Readme on Crate Pages #81

Closed
killercup opened this Issue Nov 24, 2014 · 15 comments

Comments

Projects
None yet
9 participants
@killercup
Copy link
Member

killercup commented Nov 24, 2014

I consider the inline Readme file to be one of the most useful things on npm.

The Readme data is already available (and searchable, cf. #72), but I couldn't find any mention of a plan to show it on crate pages. As this can add a lot of content to a crate page (with possibly complicated markup), a new layout for the page might also be necessary.

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Nov 29, 2014

This'd be great.

@Luthaf

This comment has been minimized.

Copy link
Contributor

Luthaf commented Feb 18, 2016

Any new on this? How hard would it be to do?

@steveklabnik

This comment has been minimized.

Copy link
Member

steveklabnik commented Feb 18, 2016

None yet. It shouldn't be that hard, though it would require paying close attention to escapes.

NPM does this with markymark, I would suggest blatantly copying whatever they do.

@Luthaf

This comment has been minimized.

Copy link
Contributor

Luthaf commented Feb 19, 2016

NPM does this with markymark, I would suggest blatantly copying whatever they do.

Wouldn't that need a server-side node process? The current server is written in Rust, if I understand everything correctly.

@steveklabnik

This comment has been minimized.

Copy link
Member

steveklabnik commented Feb 19, 2016

I didn't mean literally running their code; I meant investigating how they do so. What HTML do they escape? Do they show the whole README or a summary? Etc.

On Feb 19, 2016, 04:17 -0500, Luthafnotifications@github.com, wrote:

NPM does this with markymark, I would suggest blatantly copying whatever they do.

Wouldn't that need a server-side node process? The current server is written in Rust, if I understand everything correctly.


Reply to this email directly orview it on GitHub(#81 (comment)).

@Luthaf

This comment has been minimized.

Copy link
Contributor

Luthaf commented Mar 2, 2016

After some investigation, I managed to get a local version running, but when I try to upload a simple example crate it fails because it can not connect to S3. How can I do to load example data in a local website instance?

@steveklabnik

This comment has been minimized.

Copy link
Member

steveklabnik commented Mar 2, 2016

It might be worth using the --proxy option and use the data from the actual
backend instead.

On Wed, Mar 2, 2016 at 4:45 PM, Luthaf notifications@github.com wrote:

After some investigation, I managed to get a local version running, but
when I try to upload a simple example crate it fails because it can not
connect to S3. How can I do to load example data in a local website
instance?


Reply to this email directly or view it on GitHub
#81 (comment).

@Luthaf

This comment has been minimized.

Copy link
Contributor

Luthaf commented Mar 2, 2016

Yes, but I do not see how I can change the behaviour of the backend, to add an API point to get the README content as HTML in that case.

@steveklabnik

This comment has been minimized.

Copy link
Member

steveklabnik commented Mar 2, 2016

Ah, sorry, I forgot about that part. I was only thinking about the frontend rendering aspect :(

@krzkaczor

This comment has been minimized.

Copy link

krzkaczor commented Jul 9, 2016

I have some time and willingness to work on this issue. @Luthaf can u help me with backend changes? Frontend changes seems to be trivial when we have nice endpoint returning readme as html :D

I couldn't find the project which is responsible actually for adding new packages into the repository. crates.io seems to be frontend and backend but it's only querying data - not adding them. I wanted to research textsearchable_index_col field in database because it seems to hold README contents and be used for full text search.

@Luthaf

This comment has been minimized.

Copy link
Contributor

Luthaf commented Jul 9, 2016

@Luthaf can u help me with backend changes?

Sorry, I have no knowledge at all of crates.io backend. I tried to do this, but gave up as I did not managed to get a local version of the backend working with some data.

@shssoichiro

This comment has been minimized.

Copy link

shssoichiro commented Oct 17, 2016

I'd like to step up and take a shot at implementing this.

@steveklabnik

This comment has been minimized.

Copy link
Member

steveklabnik commented Oct 25, 2016

I mentioned this bug to @ashleygwilliams yesterday, and she mentioned that for @npm, they actually made this a separate service. One issue is that right now, crates.io displays stuff in the metadata, displaying READMEs would mean moving README stuff into the metadata, but that means that it blows up in size....

(By separate service I mean "implemented as a microservice", not a totally different website)

@ashleygwilliams

This comment has been minimized.

Copy link
Member

ashleygwilliams commented Oct 26, 2016

👋 hey all! i'm currently writing a blog post about the interesting history around dealing with READMEs for npm. right now, READMEs are part of the package metadata, and due to the potential for a DDoS, as well as the performance hit large READMEs cause, we've had to place an arbitrary restriction of 64kb on the READMEs. yes, yuck, arbitrary restrictions.

our goal with the READMEs in npm registry v3 will be to remove them from the package metadata completely and run them as a separate service that pre-renders the markdown and serves the generated html statically. this will remove any threat large READMEs would cause to performance or security and it will also have the added benefit of allowing us to render READMEs per version. it will also speed up our website since they will be pre-rendered and served statically.

i would greatly encourage ya'll to learn from our struggles and opt to keep the README out of the package metadata. when ya'll become as popular as npm you'll thank me :)

if this is still open when the blogpost comes out later this week i'll be sure to post it here.

@jdm

This comment has been minimized.

Copy link

jdm commented Nov 15, 2016

@carols10cents carols10cents added the E-big label Feb 22, 2017

@kureuil kureuil referenced this issue Jul 9, 2017

Merged

Show Contents of Readme on Crate Pages #869

5 of 5 tasks complete

bors-voyager bot added a commit that referenced this issue Aug 17, 2017

Merge #869
869: Show Contents of Readme on Crate Pages r=carols10cents

Hey everyone, I started working on README rendering earlier and I thought that I should get some feedback now that it is working correctly on my local instance.

I edited the `/crates/new` endpoint so that when a crate is uploaded to the registry, its README (markdown only) is rendered to HTML and a `readmes/{crate}/{crate}-{version}.html` file is created on S3 when the crate file is uploaded there.
Then, when visiting a crate page, the Ember application makes a request to `{s3}/readmes/{crate}/{crate}-{version}.html`: if the file is found, its content is rendered in place, if not nothing happens.
This way, the renderd README is cached and versioned, following @ashleygwilliams [recommendations](#81 (comment)).

On the `crate/version` route, the README contents is displayed in place of the package description.

The generated HTML is sanitized using [Ammonia](https://crates.io/crates/ammonia). This crate removes all tags and attributes that are not whitelisted, making the resulting markup as safe as possible for the end user. Bonus point: is uses the html5ever crate (created by the Servo project) to parse HTML.

~~~Codeblocks syntax highlighting is done server-side using [syntect](https://crates.io/crates/syntect). This crate uses Sublime syntax files to highlight code samples, which means that we automatically have access to a lot of pre-existing grammars and we can easily add/create missing ones.~~~ Syntax highlighting is now done client-side using PrismJS due to syntect's dependency onigumura conflicting with git2.

Finally, I added a new script in `src/bin` named `render-readmes`. Its goal is to render the README of every uploaded crate version and to upload it on S3. This means that if modify the renderer behavior and we want to retroactively apply it, we're just one command (and a lil' bit of time) away ! (Though I might have left too much `unwrap()`/`expect()` in it)

TODO:

* [x] Security checks in `render::markdown_to_html` (script tag, javascript callbacks, more ?)
* [x] Tests
* [x] Documentation
* [x] ~Edit delete-crate & delete-version scripts to delete README files~ It would seem that the scripts only delete database entries, not stored files
* [x] Script to render published crates README (?)

Fixes #81

Preview:

![screenshot from 2017-07-10 00-22-45](https://user-images.githubusercontent.com/1275463/27998080-f83bad7a-6505-11e7-884e-95ee4f7c2a47.png)

@bors-voyager bors-voyager bot closed this in #869 Aug 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment