Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

man.dvc.org/import-url redirect is broken #768

Closed
shcheklein opened this issue Nov 3, 2019 · 23 comments · Fixed by #769 or #770
Closed

man.dvc.org/import-url redirect is broken #768

shcheklein opened this issue Nov 3, 2019 · 23 comments · Fixed by #769 or #770
Labels
A: docs Area: user documentation (gatsby-theme-iterative) p0-critical Affects users in a bad way at the moment 🐛 type: bug Something isn't working. website: eng-doc DEPRECATED JS engine for /doc

Comments

@shcheklein
Copy link
Member

It supposed to behave in the same way as man.dvc.org/get-url

The fix should be applies to the server.js file.

@shcheklein shcheklein added 🐛 type: bug Something isn't working. A: docs Area: user documentation (gatsby-theme-iterative) website: eng-doc DEPRECATED JS engine for /doc good first issue Good for newcomers hacktoberfest p0-critical Affects users in a bad way at the moment labels Nov 3, 2019
@jorgeorpinel
Copy link
Contributor

That is funny. Probably a bug in the redirect code...

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Nov 3, 2019

This is still broken after attempting a hotfix. In fact now man.dvc.org/get-url is also broken! Which is weird because I tested the logic in server.js and it works well locally. Here's the latest prod deploy diff: 37aab6a...a07bd0c#diff-78c12f5adc1848d13b1c6f07055d996eL37-R37 for this code:

dvc.org/server.js

Lines 34 to 42 in a07bd0c

} else if (req.headers.host === 'man.dvc.org') {
// man.dvc.org/{cmd} -> dvc.org/doc/command-reference/{cmd}
let normalized_pathname =
['/get-url', '/import-url'].indexOf(pathname) < 0
? pathname.replace('-', '/')
: pathname
const doc_pathname = '/doc/command-reference' + normalized_pathname
res.writeHead(301, { Location: 'https://dvc.org' + doc_pathname })
res.end()

['/get-url', '/import-url'].indexOf(pathname) < 0 basically checks that pathname is not one of those 2 exceptions, before the code replaces - for / in any other path. Tested locally with node:

> ['/get-url', '/import-url'].indexOf('/get-url')
0
> ['/get-url', '/import-url'].indexOf('/import-url')
1
> ['/get-url', '/import-url'].indexOf('/cache-dir')
-1

However on prod everything gets replaced, even when indexOf results in >= 0

shcheklein added a commit that referenced this issue Nov 3, 2019
hotfix: rewrite man.dvc.org redirect logic without ternary op to see if it fixes #768
@jorgeorpinel jorgeorpinel reopened this Nov 3, 2019
@jorgeorpinel

This comment has been minimized.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Nov 4, 2019

There's 2 things going on here ^ BTW:

  1. Replace - for / in man.dvc.org/ paths (except /get-url and /import-url).
  2. Redirect man.dvc.org -> dvc.org/doc/command-reference .

We could separate each one for even further clarity, but that would result in an extra redirect.

@jorgeorpinel
Copy link
Contributor

Related: #757

@shcheklein
Copy link
Member Author

I think the reason it does not work for me is caching of redirects:

Request URL: https://man.dvc.org/import-url
Request Method: GET
Status Code: 301 Moved Permanently (from disk cache)
Remote Address: 35.170.171.200:443
Referrer Policy: no-referrer-when-downgrade

If I run it a private windows everything works fine. So, we should check if there is a way to manage cache lifetime for redirects.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Nov 4, 2019

I guess this is the issue we bumped into! 301 Redirects: The Horror That Cannot Be Uncached

Basically you can only do a 301 redirect once...

@jorgeorpinel

This comment has been minimized.

@jorgeorpinel jorgeorpinel removed the status: research Writing concrete steps for the issue label Nov 4, 2019
@shcheklein

This comment has been minimized.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Nov 4, 2019

Right, that's an example of a permanent one, from man.dvc.org to dvc.org/doc/command-reference

But it used to turn - into / and now we don't really want that, so that part was not so permanent. 😋

  • We should keep the 301 redirect from the man. subdomain to the cmd ref path, but make a separate 302 that turns - to / in some paths only (instead of excluding just import-url and get-url) for backward compatibility.
  • And review all the other redirects to see if any one should be 302.

What do you think?

Let's not worry about stuff that is already cached somewhere outside.

I guess in this case we can indeed ignore it. I doubt many people have used man.dvc.org/import-url or /get-url and definitely no search engines since man.dvc.org URLs are only printed in our command line. 🙂 (I think)

@shcheklein
Copy link
Member Author

I think

there are probably some links outside, not many indeed

We should keep the 301 redirect from the man. subdomain to the cmd ref path, but make a separate 302 that turns - to / in some paths only (instead of excluding just import-url and get-url) for backward compatibility.

sounds too complicated to be honest. Let's first see what we can with no-cache policy.

Almost all our redirects are permanent in nature, I actually don't see a good example of 302.

@jorgeorpinel
Copy link
Contributor

Almost all our redirects are permanent in nature, I actually don't see a good example of 302.

Well, let's see our current redirects in server.js:

  • Enforce https protocol and remove www from host = Permanent (SEO)
  • man.dvc.org/{cmd} -> dvc.org/doc/command-reference/{cmd} = Permanent (route design)
    • replace - for / in {cmd} except for /get-url, /import-url = Temporary (backward compat)
  • {code/data/remote}.dvc.org -> corresponding S3 bucket -> Permanent (route design)
  • path /(deb|rpm) -> corresponding S3 bucket = Permanent (route design)
  • path /(help|chat) -> Discord chat invite = Permanent (route design)
  • path /doc/commands-reference... -> /doc/command-reference... = Permanent (backward compat)
  • path /doc/tutorial/... -> /doc/tutorials/deep/... = Permanent (backward compat)
  • path /doc/tutorial -> /doc/tutorials = Permanent (backward compat)
  • path /doc/use-cases/data-and-model-files-versioning -> /doc/use-cases/versioning-data-and-model-files = Permanent (backward compat)
  • path /doc*/... -> /doc/... = Permanent (backward compat)

Seems like you're right! Only the - for / replacement under man.dvc.org redirects seems to me like something we should address with 302s or 303s or 307s or 308s. (See #757 (comment))

Let's first see what we can (do) with no-cache policy.

But OK! Let's try this first...

@shcheklein
Copy link
Member Author

replace - for / in {cmd} except for /get-url, /import-url = Temporary (backward compat)

it's also permanent, I think. We don't plan to change it back right? It's temporary in a sense that we will remove it completely soon and start returning 404. Tbh, would not worry that much about redirects like man.dvc.org/remote-add at all because of this.

302s or 303s or 307s or 308s.

need to be careful with this - I don't know anything about 30x, x>2. :)

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Nov 4, 2019

Let's first see what we can (do) with no-cache policy.

OK I added Cache-Control: no-cache to the 301 header. Will have to merge #772 to check the effect (on new client only).

I don't know anything about 30x

302 (Found (Previously "Moved temporarily")) has been superseded by 303 and 307.
303 See Other: "The response to the request can be found under another URI using the GET method."
307 Temporary Redirect (since HTTP/1.1): "The request should be repeated with another URI; however, future requests should still use the original URI... the request method is not allowed to be changed..."

Seems to me 307 is exactly what we want during a period of backward compatibility with certain man.dvc.org/cmd-subcmd URL paths that used to need converting - to /.

From https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#3xx_Redirection

@shcheklein
Copy link
Member Author

yep, I just don't know how well all these codes are supported that's why a bit cautious

@jorgeorpinel
Copy link
Contributor

OK so now that #772 is merged, I opened an incognito window in Chrome and went to https://man.dvc.org/import-url, then tried again. Both times I see this same Network response:

General

Request URL: https://man.dvc.org/cache-dir
Request Method: GET
Status Code: 301 Moved Permanently
Remote Address: 52.72.245.79:443
Referrer Policy: no-referrer-when-downgrade

Response Headers

HTTP/1.1 301 Moved Permanently
Server: Cowboy
Connection: keep-alive
Cache-Control: no-cache
Location: https://dvc.org/doc/command-reference/import-url
Date: Tue, 05 Nov 2019 00:08:45 GMT
Transfer-Encoding: chunked
Via: 1.1 vegur

No more (from disk cache). I guess this means the issue is fixed? Not sure how else to confirm.

Can also text with curl -Lv man.dvc.org/some-thing > /dev/null.

@jorgeorpinel
Copy link
Contributor

Alright, seems like this is fixed now even for previous clients.

Having a no-cache 301 "Moved Permanently" seems counterintuitive, but I guess that's the easiest solution here, and allowed by the HTTP 1.1 standard: https://tools.ietf.org/html/rfc2616#section-14.9

I still think we should use 307 for some or all the backward compatibility redirects (see #768 (comment)) but OK, will address this later in #757.

@shcheklein
Copy link
Member Author

@jorgeorpinel I don't think it's fixed unfortunately. I think Firefox at least does not respect the flag.

@shcheklein shcheklein reopened this Nov 5, 2019
@jorgeorpinel
Copy link
Contributor

Boo FF. Happily it has like 5% of the market. https://en.wikipedia.org/wiki/Usage_share_of_web_browsers#Summary_tables

Safari is more relevant, but I can't tell whether it respects the cache-directive.

@shcheklein
Copy link
Member Author

It's 11% for dvc.org, the same as Safari :)

@jorgeorpinel
Copy link
Contributor

So should we separate the 301 redirect from man.dvc.org to /doc/cmd-ref from a new 302 or 307 redirect that replaces - for / in certain paths only? And if so, do you know which exact commands with dash we are trying to provide backward compatibility for in this redirect?

@shcheklein
Copy link
Member Author

And if so, do you know which exact commands

almost all of them (except import-url and get-url)

from a new 302 or 307 redirect that replaces - for / in certain paths only

let's not do this. Not that we understand that it's only caching issue, I think we can keep 301s. Most of our users will see the right result. When we get to the #757 we can think one more time about what codes we apply where - most likely most of them will be 301s anyway to keep SEO.

let's close this for now? good findings and good research for 757.

@jorgeorpinel
Copy link
Contributor

let's not do this... When we get to the #757 we can think one more time about what codes

Sounds like a plan.

most likely most of them will be 301s anyway to keep SEO

We'll see haha 🙈

let's close this for now?

Sure! You reopened it so I was just thinking out-loud what alternatives we had left. Closing! 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative) p0-critical Affects users in a bad way at the moment 🐛 type: bug Something isn't working. website: eng-doc DEPRECATED JS engine for /doc
Projects
None yet
2 participants