Simplify and fix the way URL path components are handled #119

alandipert · 2019-11-01T23:06:40Z

Context

There are 3 changes in this PR. One is big and two are small.

Big: URLBuilder improvements. Details below.
Small: Simplify Host header creation. Previously we included the port in the Host header after the domain, if the target application had an https protocol. Apparently, while the port number here is optional, no browsers actually do this, and it can confuse proxies. Now, we just use the target domain for Host and it seems to have resolved a problem for a customer.
Small: Specify proper content-length header in response. The curl package we use automatically decompresses responses, but we were blindly copying the original response's content-length header -- which reflected the gzip'd size -- instead of synthesizing a new, correct header based on the uncompressed size. Fixed it.

Details on the big change are below

Big change: URLBuilder improvements

Because we do a lot of URL manipulation in shinyloadtest, we have a little R6-based helper library, the URLBuilder class, to help us out. It's a URL builder with a fluent API in the spirit of uribuilder-tiny for Java.

Unfortunately, prior to this PR, the code in URLBuilder around appending paths was quite complex. In particular, it was complex because of the representation I chose for the path component of the URL, maintained internally as self$path. We even had a TODO in there about cleaning it up because we never felt great about it.

I wanted to protect myself from constructing URLs with too many / (slashes) in between path components when $appendPaths() was called. So, I decided to represent the path component of the URL internally as a character vector of parts without any slashes. That way, to append, one simply concatenated character vectors. The slashes were then added inside $build() when a string URL needed to be produced.

The bug with this approach was that the representation was lossy. It was lossy because if you passed it a URL like http://example.com/foo/, it would initially populate self$paths with a vector like c("foo"). Then, at $build() time, it would join the path components to produce e.g. http://example.com/foo, which isn't the same URL that was passed in — it's missing the trailing /.

This lossiness didn't matter for a long time because all of the apps we test with apparently weren't sensitive to this. However, eventually a customer reported problems recording their app, and it turned out to be this bug in the wild.

URLBuilder Fix

The fix was simple, and obvious to the present-time version of myself. We just need to make sure we don't duplicate slashes when we join path component strings. This is achieved with a simple check of whether the left and right components end and start with slashes, respectively, and then conditionally inserting a slash, removing one, or just joining the strings, depending.

In the PR, the new helper function joinPaths() in R/url.R does the slash-aware join.
In URLBuilder the path component is now represented as a length-1 character vector instead of a length-n vector, self$path (was self$paths)
Since xml2:url_parse() calls the path component path, we now call it path instead of paths everywhere ourselves. That meant also changing $setPaths() to $setPath() and $appendPaths() to $appendPath().
I removed an optional parameter, raw, from $setPath() and $appendPath() because it's something I originally thought we would need, but never have. None of the paths we append ever need to be URL encoded.
I added a test case ensuring the trailing slash of a path isn't lost, which fixes the customer issue that caused us to revisit this.

Simplify and fix the way URL path components are handled

58ecdee

alandipert requested review from wch and removed request for wch November 1, 2019 23:06

Don't include port after domain in Host: field

cd6bba4

wch approved these changes Nov 13, 2019

View reviewed changes

Set content-length to content size

1788d8c

alandipert self-assigned this Nov 13, 2019

alandipert merged commit 0753fdc into master Nov 14, 2019

alandipert deleted the fix-url-paths branch November 14, 2019 22:33

This was referenced Nov 20, 2019

Misc fixes rstudio/shinycannon#47

Merged

Trailing slash removed from record_session url #112

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify and fix the way URL path components are handled #119

Simplify and fix the way URL path components are handled #119

alandipert commented Nov 1, 2019 •

edited

Loading

Simplify and fix the way URL path components are handled #119

Simplify and fix the way URL path components are handled #119

Conversation

alandipert commented Nov 1, 2019 • edited Loading

Context

Big change: URLBuilder improvements

URLBuilder Fix

alandipert commented Nov 1, 2019 •

edited

Loading