Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify and fix the way URL path components are handled #119

Merged
merged 3 commits into from
Nov 14, 2019

Conversation

alandipert
Copy link
Contributor

@alandipert alandipert commented Nov 1, 2019

Context

There are 3 changes in this PR. One is big and two are small.

  1. Big: URLBuilder improvements. Details below.
  2. Small: Simplify Host header creation. Previously we included the port in the Host header after the domain, if the target application had an https protocol. Apparently, while the port number here is optional, no browsers actually do this, and it can confuse proxies. Now, we just use the target domain for Host and it seems to have resolved a problem for a customer.
  3. Small: Specify proper content-length header in response. The curl package we use automatically decompresses responses, but we were blindly copying the original response's content-length header -- which reflected the gzip'd size -- instead of synthesizing a new, correct header based on the uncompressed size. Fixed it.

Details on the big change are below

Big change: URLBuilder improvements

Because we do a lot of URL manipulation in shinyloadtest, we have a little R6-based helper library, the URLBuilder class, to help us out. It's a URL builder with a fluent API in the spirit of uribuilder-tiny for Java.

Unfortunately, prior to this PR, the code in URLBuilder around appending paths was quite complex. In particular, it was complex because of the representation I chose for the path component of the URL, maintained internally as self$path. We even had a TODO in there about cleaning it up because we never felt great about it.

I wanted to protect myself from constructing URLs with too many / (slashes) in between path components when $appendPaths() was called. So, I decided to represent the path component of the URL internally as a character vector of parts without any slashes. That way, to append, one simply concatenated character vectors. The slashes were then added inside $build() when a string URL needed to be produced.

The bug with this approach was that the representation was lossy. It was lossy because if you passed it a URL like http://example.com/foo/, it would initially populate self$paths with a vector like c("foo"). Then, at $build() time, it would join the path components to produce e.g. http://example.com/foo, which isn't the same URL that was passed in — it's missing the trailing /.

This lossiness didn't matter for a long time because all of the apps we test with apparently weren't sensitive to this. However, eventually a customer reported problems recording their app, and it turned out to be this bug in the wild.

URLBuilder Fix

The fix was simple, and obvious to the present-time version of myself. We just need to make sure we don't duplicate slashes when we join path component strings. This is achieved with a simple check of whether the left and right components end and start with slashes, respectively, and then conditionally inserting a slash, removing one, or just joining the strings, depending.

  • In the PR, the new helper function joinPaths() in R/url.R does the slash-aware join.
  • In URLBuilder the path component is now represented as a length-1 character vector instead of a length-n vector, self$path (was self$paths)
  • Since xml2:url_parse() calls the path component path, we now call it path instead of paths everywhere ourselves. That meant also changing $setPaths() to $setPath() and $appendPaths() to $appendPath().
  • I removed an optional parameter, raw, from $setPath() and $appendPath() because it's something I originally thought we would need, but never have. None of the paths we append ever need to be URL encoded.
  • I added a test case ensuring the trailing slash of a path isn't lost, which fixes the customer issue that caused us to revisit this.

@alandipert alandipert requested review from wch and removed request for wch November 1, 2019 23:06
@alandipert alandipert self-assigned this Nov 13, 2019
@alandipert alandipert merged commit 0753fdc into master Nov 14, 2019
@alandipert alandipert deleted the fix-url-paths branch November 14, 2019 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants