Simplify and fix the way URL path components are handled #119
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
There are 3 changes in this PR. One is big and two are small.
Host
header creation. Previously we included the port in theHost
header after the domain, if the target application had anhttps
protocol. Apparently, while the port number here is optional, no browsers actually do this, and it can confuse proxies. Now, we just use the target domain forHost
and it seems to have resolved a problem for a customer.content-length
header in response. Thecurl
package we use automatically decompresses responses, but we were blindly copying the original response'scontent-length
header -- which reflected the gzip'd size -- instead of synthesizing a new, correct header based on the uncompressed size. Fixed it.Details on the big change are below
Big change: URLBuilder improvements
Because we do a lot of URL manipulation in shinyloadtest, we have a little R6-based helper library, the
URLBuilder
class, to help us out. It's a URL builder with a fluent API in the spirit of uribuilder-tiny for Java.Unfortunately, prior to this PR, the code in
URLBuilder
around appending paths was quite complex. In particular, it was complex because of the representation I chose for thepath
component of the URL, maintained internally asself$path
. We even had aTODO
in there about cleaning it up because we never felt great about it.I wanted to protect myself from constructing URLs with too many
/
(slashes) in between path components when$appendPaths()
was called. So, I decided to represent thepath
component of the URL internally as a character vector of parts without any slashes. That way, to append, one simply concatenated character vectors. The slashes were then added inside$build()
when a string URL needed to be produced.The bug with this approach was that the representation was lossy. It was lossy because if you passed it a URL like
http://example.com/foo/
, it would initially populateself$paths
with a vector likec("foo")
. Then, at$build()
time, it would join the path components to produce e.g.http://example.com/foo
, which isn't the same URL that was passed in — it's missing the trailing/
.This lossiness didn't matter for a long time because all of the apps we test with apparently weren't sensitive to this. However, eventually a customer reported problems recording their app, and it turned out to be this bug in the wild.
URLBuilder Fix
The fix was simple, and obvious to the present-time version of myself. We just need to make sure we don't duplicate slashes when we join path component strings. This is achieved with a simple check of whether the left and right components end and start with slashes, respectively, and then conditionally inserting a slash, removing one, or just joining the strings, depending.
joinPaths()
inR/url.R
does the slash-aware join.self$path
(wasself$paths
)xml2:url_parse()
calls the path componentpath
, we now call itpath
instead ofpaths
everywhere ourselves. That meant also changing$setPaths()
to$setPath()
and$appendPaths()
to$appendPath()
.raw
, from$setPath()
and$appendPath()
because it's something I originally thought we would need, but never have. None of the paths we append ever need to be URL encoded.