New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crates.io returns mostly 404 #788

Open
kud1ing opened this Issue Jun 20, 2017 · 10 comments

Comments

Projects
None yet
6 participants
@kud1ing
Copy link

kud1ing commented Jun 20, 2017

Maybe this is a duplicate of other 404 issues, i am not sure.

awesome-rust uses awesome_bot to check links.

Unfortunately many (all?) links to crates.io return 404.
curl https://crates.io/keywords/cassandra receives a 404, too.

Is there something that needs to be configured on the server or client?

@carols10cents

This comment has been minimized.

Copy link
Member

carols10cents commented Jun 20, 2017

I think we are doing something nonstandard, but I'm not sure what. Until we figure this out, a workaround is specifying HTML as the content type you want:

curl -v 'https://crates.io/crates/assert_approx_eq' -H 'Accept: text/html'

returns a 200 for me. Hope that helps!

@kud1ing

This comment has been minimized.

Copy link
Author

kud1ing commented Jun 20, 2017

crates.io was whitelisted in rust-unofficial/awesome-rust#310
I'd like to revert this in the end so that we can verify links to crates, categories and keywords.

@Nemo157

This comment has been minimized.

Copy link
Contributor

Nemo157 commented Jul 2, 2017

Yeah, this is caused by

crates.io/src/dist.rs

Lines 38 to 52 in a20eb96

// Second, if we're requesting html, then we've only got one page so
// serve up that page. Otherwise proxy on to the rest of the app.
let wants_html = req.headers()
.find("Accept")
.map(|accept| accept.iter().any(|s| s.contains("html")))
.unwrap_or(false);
if wants_html {
self.dist.call(&mut RequestProxy {
other: req,
path: Some("/index.html"),
method: None,
})
} else {
self.handler.as_ref().unwrap().call(req)
}

Interestingly requesting any URL with Accept: text/html will return 200 since the server doesn't know what the valid routes are (e.g. curl -H 'Accept: text/html' -I https://crates.io/foo/bar)

I don't think there's any way to have a bot validate URLs on crates.io without executing the javascript and checking if it loads the page successfully. At least, while crates.io does client-side only rendering, if server-side rendering of at least the initial page is ever added that should then know whether the URL being requested is valid or not.

@locks

This comment has been minimized.

Copy link
Contributor

locks commented Jul 2, 2017

FYI, you can track server-side rendering efforts at #819.

@AaronFriel

This comment has been minimized.

Copy link

AaronFriel commented Dec 15, 2017

This is still not fixed.

image

@carols10cents

This comment has been minimized.

Copy link
Member

carols10cents commented Dec 15, 2017

@AaronFriel Yes, that's why this issue is still open.

@AaronFriel

This comment has been minimized.

Copy link

AaronFriel commented Dec 16, 2017

Would a commit altering the wants_html test to include wildcards be accepted? I'm just not sure why this has languished and I'm guessing there is something I'm missing.

@carols10cents

This comment has been minimized.

Copy link
Member

carols10cents commented Dec 17, 2017

Yes, I am not sure of the scope of the solution needed or the effects of various solutions on the frontend and backend-- please give it a try and let me know if you have questions!

@mark-i-m

This comment has been minimized.

Copy link
Contributor

mark-i-m commented Sep 15, 2018

Any progress on this?

@locks

This comment has been minimized.

Copy link
Contributor

locks commented Sep 17, 2018

@mark-i-m this is indirectly being tracked at #204

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment