Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crates.io returns 404 when not specifying Accept: text/html #788

Open
kud1ing opened this issue Jun 20, 2017 · 13 comments
Open

Crates.io returns 404 when not specifying Accept: text/html #788

kud1ing opened this issue Jun 20, 2017 · 13 comments
Labels
C-bug 🐞 Category: unintended, undesired behavior

Comments

@kud1ing
Copy link

kud1ing commented Jun 20, 2017

  • awesome-rust uses awesome_bot to check links. Unfortunately links to crates.io return 404.
  • curl https://crates.io/keywords/cassandra receives a 404, too.

Current summary

  • This is confirmed as a bug that still exists as long as this issue is open. We do not need any further confirmations of the issue at this time.
  • What we do need is a PR fixing the problem, those are quite welcome
  • The workaround for this issue is to pass an Accept header of text/html.
@carols10cents
Copy link
Member

I think we are doing something nonstandard, but I'm not sure what. Until we figure this out, a workaround is specifying HTML as the content type you want:

curl -v 'https://crates.io/crates/assert_approx_eq' -H 'Accept: text/html'

returns a 200 for me. Hope that helps!

@kud1ing
Copy link
Author

kud1ing commented Jun 20, 2017

crates.io was whitelisted in rust-unofficial/awesome-rust#310
I'd like to revert this in the end so that we can verify links to crates, categories and keywords.

@Nemo157
Copy link
Member

Nemo157 commented Jul 2, 2017

Yeah, this is caused by

crates.io/src/dist.rs

Lines 38 to 52 in a20eb96

// Second, if we're requesting html, then we've only got one page so
// serve up that page. Otherwise proxy on to the rest of the app.
let wants_html = req.headers()
.find("Accept")
.map(|accept| accept.iter().any(|s| s.contains("html")))
.unwrap_or(false);
if wants_html {
self.dist.call(&mut RequestProxy {
other: req,
path: Some("/index.html"),
method: None,
})
} else {
self.handler.as_ref().unwrap().call(req)
}

Interestingly requesting any URL with Accept: text/html will return 200 since the server doesn't know what the valid routes are (e.g. curl -H 'Accept: text/html' -I https://crates.io/foo/bar)

I don't think there's any way to have a bot validate URLs on crates.io without executing the javascript and checking if it loads the page successfully. At least, while crates.io does client-side only rendering, if server-side rendering of at least the initial page is ever added that should then know whether the URL being requested is valid or not.

@locks
Copy link
Contributor

locks commented Jul 2, 2017

FYI, you can track server-side rendering efforts at #819.

@AaronFriel
Copy link

This is still not fixed.

image

@carols10cents
Copy link
Member

@AaronFriel Yes, that's why this issue is still open.

@AaronFriel
Copy link

Would a commit altering the wants_html test to include wildcards be accepted? I'm just not sure why this has languished and I'm guessing there is something I'm missing.

@carols10cents
Copy link
Member

Yes, I am not sure of the scope of the solution needed or the effects of various solutions on the frontend and backend-- please give it a try and let me know if you have questions!

@mark-i-m
Copy link
Member

Any progress on this?

@locks
Copy link
Contributor

locks commented Sep 17, 2018

@mark-i-m this is indirectly being tracked at #204

@luciusmagn
Copy link

Any updates on this?

@kud1ing kud1ing changed the title Crates.io returns mostly 404 Crates.io returns 404 when not specifying media-range Jul 20, 2019
jtgeibel added a commit to jtgeibel/crates.io that referenced this issue Jul 25, 2019
The backend no longer checks for an "html" in the `Accept` header.
With the exception of 3 session related routes, all paths not starting
with "/api" will be redirected to the static Ember bootstrap page.

As a result of this change all non-api requests that don't contain
"html" in the `Accept` header will now unconditionally return `200`,
rather than `404`.  In a sense, this expands the scope of rust-lang#556 to all
requests, not just those that set the header.  It also inverts the
problem described in rust-lang#788, effectively turning it into a duplicate
of rust-lang#556.

Fixes: rust-lang#163
@jtgeibel
Copy link
Member

I've recently added PR #1788 which would impact this bug. This PR would make it so that the behavior is consistent, regardless of if an Accept: text/html header was sent or not. All such requests will now return a status 200 with the Ember index html (as is currently done for browsers and other clients that set the header).

This will in effect invert the issue described here. Instead of always returning a 404 with a JSON response, the site will now always respond with a status 200 with static HTML. Instead of false negatives for crates that do exists, there would be false positives for crates that do not exist.

@jtgeibel jtgeibel changed the title Crates.io returns 404 when not specifying media-range Crates.io returns 404 when not specifying Accept: text/html Jul 25, 2019
lopopolo added a commit to artichoke/artichoke that referenced this issue Mar 22, 2020
Workaround for rust-lang/crates.io#788 returning 404 for non-browser
HTTP requests.
FlyingRatBull added a commit to FlyingRatBull/bevy that referenced this issue May 1, 2021
* Fix [404 on crates.io](rust-lang/crates.io#788)
* Only block external checks on github.com
FlyingRatBull added a commit to FlyingRatBull/bevy that referenced this issue May 1, 2021
* Fix [404 on crates.io](rust-lang/crates.io#788)
* Only block external checks on github.com
termoshtt added a commit to ricosjp/ruststep that referenced this issue Sep 27, 2021
@rust-lang rust-lang locked as off-topic and limited conversation to collaborators Oct 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C-bug 🐞 Category: unintended, undesired behavior
Projects
None yet
Development

No branches or pull requests

9 participants