Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output crate pages so that the latest version is easily scrapable #238

Closed
brson opened this Issue Dec 16, 2015 · 7 comments

Comments

Projects
None yet
4 participants
@brson
Copy link
Contributor

brson commented Dec 16, 2015

Debian uses automation to scrape web pages for updates to libraries, and for some reason these tools don't seem to easily understand crates.io. Find out what kind of structure they want and implement it.

@sanxiyn

This comment has been minimized.

Copy link
Member

sanxiyn commented Dec 16, 2015

Re: "for some reason". crates.io is a JavaScript webapp. Static HTML would be more amenable to scraping.

@anguslees

This comment has been minimized.

Copy link

anguslees commented Dec 16, 2015

Right, crates.io is a javascript app that hits the crates.io API (JSON over HTTP). The good stuff is actually buried in a URL like https://crates.io/api/v1/crates/$crate/versions

From our (Debian's) point of view, we need an HTML page somewhere that our tools can scrape looking for <a href=...> links to discover/download new versions. This limitation is completely a consequence of the Debian tools and isn't something the Rust community needs to solve, except that I suspect the solution might be reuseable for other distros/tools too.

Option 1:
We add something to the crates.io website to return a suitable HTML page somewhere (perhaps as a new API endpoint that returns HTML rather than JSON?)

Option 2:
We do something similar within Debian infrastructure by adding a crates.io-specific piece of logic to http://anonscm.debian.org/viewvc/qa/trunk/cgi-bin/fakeupstream.cgi?view=markup - this would be a proxy that does the JSON API query and reformats it as an HTML webpage of download links.

I don't want this to be a big thing, and I'm ok with either approach.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Dec 17, 2015

I suspect that if y'all have a tool already to turn JSON into HTML that'll be the easiest way forward, it unfortunately isn't trivial to add HTML endpoints for us :(

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Dec 17, 2015

Now that being said we could also just add an endpoint that returns a string corresponding to the latest version

@anguslees

This comment has been minimized.

Copy link

anguslees commented Dec 23, 2015

Bug (with patch) adding crates.io support to fakeupstream.cgi: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=808790

@anguslees

This comment has been minimized.

Copy link

anguslees commented Dec 23, 2015

My fakeupstream.cgi patch is now live, eg: https://qa.debian.org/cgi-bin/fakeupstream.cgi?upstream=crates.io/libc

I think we can close this issue now (or reduce its urgency).

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jan 11, 2016

Thanks @anguslees!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.