Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle HTTP redirects #754

Merged
merged 13 commits into from
Mar 13, 2023
Merged

Handle HTTP redirects #754

merged 13 commits into from
Mar 13, 2023

Conversation

hadley
Copy link
Member

@hadley hadley commented Mar 9, 2023

Fixes #674

Uploading to shinyapps.io works without redirects because the result of client$updateBundleStatus(), which needs the redirect, is not used.

@aronatkins any ideas for testing? I verified that this works with an interactive deployment to shinyapps, and the httpRequest() path is tested via validateConnectUrl(). But currently no automated tests hit the httpRequestWithBody() path. Would you mind also verifying that connect/shinyapps doesn't emit any 307 or 308 redirects that I should be respecting?

@@ -78,40 +79,22 @@ httpLibCurl <- function(protocol,
con <- file(contentFile, "rb")
on.exit(if (!is.null(con)) close(con), add = TRUE)

if (identical(method, "POST")) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possibly overly aggressive, but the comment doesn't make sense to me as the 3xx redirects only apply to the result of the upload, not the upload itself. This is possibly a misdiagnosis of an older problem, because I don't think there should be a general difference between PUTs and POSTs. It's possible that this might have also resolved #544.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this code existed from day one of the libcurl implementation: #348

@hadley hadley requested a review from aronatkins March 9, 2023 22:44
else
stop(e)
}))
time <- system.time(
Copy link
Member Author

@hadley hadley Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also simplified this since the user interrupt error message has changed so that had no effect and I'm pretty sure that "transfer closed" is a legit error we do want to bubble up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that some of this logic was trying to trap auth errors during uploads. When there's an auth failure, the client may not see the entire body as sent. This code was trying to handle that situation, but never quite handled most of the cases.

In poking around a little, we see terminated uploads for rpubs -- things that are larger than the bundle upload limit. #450 (comment) for quite a bit of discussion about it, but I never really figured out the cleanest way to handle those errors.

We wanted to try to ignore the "upload" problem and extract the real HTTP error in the response, if possible.

I don't know the best way to react to this type of error, nor if we really need to maintain the old code -- mostly, this helps inform a testing strategy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll have a go with some uploading some really large files and see if I can recreate it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I uploaded a 1 Gb app to shinyapps and Posit connect, and didn't see any problems. I'll follow up on the rpubs issue later since it's already individually tracked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connect users generally see these errors when there is an intermediate proxy which limits the upload payload size. For example, nginx may use a restricted client_max_body_size.

https://docs.posit.co/connect/admin/proxy/#simple-configuration

Focus on the rpubs case, since I know that's reliable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I uploaded a 33 Mb html with no problems. You can see it in all its glory at https://rpubs.com/hadley/large-file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also tested updating, since that uses a PUT instead of POST.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was that a large Rmd or a large bundle? Because we gzip, it's harder to get a large bundle. I ended up using a specific document when last playing with that issue, but it's likely changed since I tried.. I think I have a copy locally somewhere. #450 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aronatkins From the progress bar 😄 Uploaded 23796573 bytes... so ~22 Mb.

Copy link
Contributor

@aronatkins aronatkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll dig into some of the possible response codes tomorrow.

else
stop(e)
}))
time <- system.time(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that some of this logic was trying to trap auth errors during uploads. When there's an auth failure, the client may not see the entire body as sent. This code was trying to handle that situation, but never quite handled most of the cases.

In poking around a little, we see terminated uploads for rpubs -- things that are larger than the bundle upload limit. #450 (comment) for quite a bit of discussion about it, but I never really figured out the cleanest way to handle those errors.

We wanted to try to ignore the "upload" problem and extract the real HTTP error in the response, if possible.

I don't know the best way to react to this type of error, nor if we really need to maintain the old code -- mostly, this helps inform a testing strategy.

R/http.R Outdated Show resolved Hide resolved
# https://www.rfc-editor.org/rfc/rfc9110.html#name-redirection-3xx
service <- parseHttpUrl(httpResponse$location)
authed_headers <- c(headers, authHeaders(authInfo, "GET", service$path))
httpResponse <- http(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This redirect handling is assuming that the original request (and its payload) do not need to be re-sent. Is that the situation you're seeing?

It feels like the redirection that's set-up for colorado might demand a replay.

curl -v http://colorado.rstudio.com/rsc/__api__/server_settings
#> HTTP/1.1 301 Moved Permanently
#> Location: https://colorado.rstudio.com/rsc/__api__/server_settings
curl -v https://colorado.rstudio.com/rsc/__api__/server_settings
#> HTTP/1.1 302 Found
#> Location: http://colorado.posit.co/rsc/__api__/server_settings

The original request never landed.

Some systems support 100-continue request/response to avoid sending the payload before receiving confirmation; I'm not sure that any of these servers support that workflow.

Most resources will indicate that you should not automatically replay the request in cases like colorado - a shift from one hostname to another - without explicit confirmation.

We probably want at least:

  1. Be transparent about the redirect.
  2. Offer an option to err rather than walk through the redirect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To precisely follow the spec we should re-POST with body only for 307 and 308. I'm happy to do that, but I suspect it won't appreciably change behaviour because most folks won't be using those status codes.

To be clear, I don't think code is a panacea for all redirect related issues, it's just an important first step. If we really want to handle a server changing host names, I think we'll need code that detects the problem and then updates the record on disk. But I'm not sure how important that problem is and I'm not sure we need to do it proactively, or should wait on someone to file an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The colorado examples give us two motivating examples:

  1. HTTP => HTTPS
  2. Host rename

I'm not sure that there's a good answer, especially since the colorado example shows us that 301/302 are much more common than 307/308.

Thinking out loud:

  • Automatic replay of all redirects requires no user intervention, which is a smooth experience.
  • Server URLs are left stale, meaning every subsequent request hits the redirect penalty.
  • Server names are left stale, meaning the UI / metadata will always reflect the old name.
  • Account records reflect the stale server name.

If we ignore updating the server and account records for a moment, we could run a pre-flight check against the client URL to make sure it is still "correct" and update the URL in-memory. That would correct the following requests without hitting multiple redirects and would give us a single place to ask for confirmation, log, or reject the redirect.

In other words, run something like validateServerUrl before using the client to do meaningful work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes to self: need something like validateConnectServer() that we run in deployApp() before hitting any endpoints. If url has redirected, need to prompt the user to update the record on disk (if interactive), or otherwise warn or continue. Might want to consider automatically updating if the hostname stays the name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While making coffee this morning, I suddenly realised that we can't just update an existing server record on disk, because that will break existing deployments. We need some way to also "redirect" existing server records to a new server record. So I won't tackle that here, but consider it in a future PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah; the association between the server records and the accounts and deployments records came up during our chat. The pre-flight can ask if folks should proceed through the redirect, but it's not straightforward to rewrite the records.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking this in #760.

@@ -355,6 +391,7 @@ authHeaders <- function(authInfo, method, path, file = NULL) {
}
}

# https://github.com/rstudio/connect/wiki/token-authentication#request-signing-rsconnect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi: this is an internal link.

DESCRIPTION Outdated Show resolved Hide resolved
R/http-libcurl.R Outdated Show resolved Hide resolved
@hadley hadley mentioned this pull request Mar 13, 2023
R/deployApp.R Outdated Show resolved Hide resolved
else
stop(e)
}))
time <- system.time(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was that a large Rmd or a large bundle? Because we gzip, it's harder to get a large bundle. I ended up using a specific document when last playing with that issue, but it's likely changed since I tried.. I think I have a copy locally somewhere. #450 (comment)

curl::handle_setopt(
handle,
noprogress = !progress,
upload = TRUE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we lost post = TRUE, which uses a POST request? Is that redundant with the curl::handle_setopt(handle, customrequest = method)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, good catch and good question. The post is a pretty complicated option that changes a bunch of stuff behind the scenes (e.g. adds Content-Type: application/x-www-form-urlencoded, Expect: 100-continue, headers etc). I have a vague recollection that this also changes the redirect handling (because it'll follow the spec for 307/308), but that doesn't matter here since we've suppressed that.

I'm neutral on whether to preserve it or drop it.

  • In favour of dropping it: the code in this PR works and it's very clear exactly how we're modifying the request.
  • In favour of keeping: I know curl applies more special POST logic if it's set, and that may save us from grief down the road.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK dropping it.

@hadley hadley merged commit 1342535 into main Mar 13, 2023
@hadley hadley deleted the redirects branch March 13, 2023 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

libCurl HTTP handler doesn't handle redirects
2 participants