Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix path encoding #31

Merged
merged 7 commits into from
Mar 5, 2018
Merged

Fix path encoding #31

merged 7 commits into from
Mar 5, 2018

Conversation

jayeb
Copy link
Contributor

@jayeb jayeb commented Mar 2, 2018

This PR fixes the path-encoding logic for this library, to bring it in line with other libraries in our SDK. The logic is as follows:

  • All paths are expected to be passed in unencoded. If a path is provided that is already encoded, it will become double-encoded.
  • If the path matches /^https?:\/\//, it will be fully encoded, slashes and all. In this case, we're assuming it's a fully-qualified URL (this is necessary for web proxy sources). We had previously been using urlencode() for this, but version 1.2.0 introduced an option to use rawurlencode() instead. This PR removes that option and uses rawurlencode() in all cases. The general consensus in the PHP community seems to be that rawurlencode() is strictly better for our use-case--see this SO thread for more info.
  • If the path does not match /^https?:\/\//, it will be partially-encoded. Non-URL-safe characters (everything that matches /([^\w\-\/\:@])/) will be encoded with rawurlencode(), and all other characters will be left alone.

In addition, I added tests for the UrlHelper.formatPath() method. Test coverage in this library is still wildly insufficient, but this PR at least makes things better than they were yesterday.

Note: this PR should fix #28.

@jayeb jayeb requested a review from nathan-heskia March 2, 2018 21:18
$uh = new URLHelper("test.imgix.net", $path);

// The pre-encoded characters should now be *double* encoded
$this->assertEquals($uh->formatPath($path), "/http%3A%2F%2Fmywebsite.com%2Fimages%2Fp%25C3%25BCg.jpg");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we want to encode an already encoded path?

Copy link
Contributor Author

@jayeb jayeb Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's basically impossible to detect whether the path we're given has been intentionally or accidentally pre-encoded, so we need to assume all paths are given to us unencoded and encode them blindly. This test just makes sure that, in the rare event that someone chooses to pass in a pre-encoded path, it gets double-encoded as expected. More of a safeguard to make sure we don't break this edge-case behavior in the future by accident.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Thanks for clarifying.

@@ -10,23 +10,34 @@ class UrlHelper {
private $signKey;
private $params;

public function __construct($domain, $path, $scheme = "http", $signKey = "", $params = array(), $rawEncodePath = false) {
public function __construct($domain, $path, $scheme = "http", $signKey = "", $params = array()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this changes means you can also delete $rawEncodePath from both the signature and the call to this constructor in UrlBuilder's createURL method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks


$path = ($path[0] === '/' ? '' : '/') . $path;
return $path;
if (preg_match("/^https?:\/\//", $path)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you only match on https? Could someone also provide a path with http?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex matches both--the ? following the s makes it an optional character.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay! Got it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approved the changes, but I think it would still work the same and be more readable to have the regex as /^http(s)?:\/\//. But I'll leave that up to you!

Copy link

@nathan-heskia nathan-heskia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this LGTM. If you'd like to clean up the call to UrlHelper in UrlBuilder I think this should be good.

@jayeb jayeb merged commit c4416d6 into master Mar 5, 2018
@jayeb jayeb deleted the fix-path-encoding branch March 5, 2018 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression in version 1.2
2 participants