Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove location pathname decoding #656

Merged

Conversation

OliverJAsh
Copy link
Contributor

@OliverJAsh OliverJAsh commented Jan 10, 2019

Re. #505 (comment)

Fixes #505

To do:

  • update tests (I probably need help here)

@kevinkyyro
Copy link

It looks like a lot of the failing tests are for the behavior you're removing, so they should either be removed as well or updated to assert that the pathname remains encoded (a simple find-and-replace of "/歴史" to "/%E6%AD%B4%E5%8F%B2" might do it)

@OliverJAsh
Copy link
Contributor Author

@kevinkyyro Done! I had to delete some tests which weren't valid anymore.

@OliverJAsh OliverJAsh force-pushed the oja/fix/remove-location-pathname-decoding branch from e1c8d9e to ea4fac3 Compare January 23, 2019 08:21
TestSequences.PushInvalidPathname(history, done);
});
});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was no longer valid, since we don't throw an error now we have no decoding. (We assume the pathname provided will be correctly encoded.)

@OliverJAsh OliverJAsh force-pushed the oja/fix/remove-location-pathname-decoding branch from ea4fac3 to 4a8985b Compare January 23, 2019 08:23
@OliverJAsh
Copy link
Contributor Author

@mjackson @kevinkyyro Any updates on this?

@oskarolsson-jimdo
Copy link

oskarolsson-jimdo commented Feb 1, 2019

Is there something one could potentially help out with to get this merged?

Quite keen to get these changes merged since it's a rather big issue for us.

Copy link

@kevinkyyro kevinkyyro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed in CONTRIBUTING.md:

Also, please make sure the CHANGES.md document contains a short note about the nature of your change. Just follow the format of the existing entries in that document. If the most recent entry is a release, please start a new section under a ## HEAD heading.

This definitely seems worthy of a mention there

@@ -32,21 +32,6 @@ export function createLocation(path, state, key, currentLocation) {
location.state = state;
}

try {
location.pathname = decodeURI(location.pathname);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the owners of this library would prefer a warning:

// in try-catch, no sense throwing any longer
if (location.pathname !== decodeURI(location.pathname)) {
  // warn: pathnames are no longer automatically decoded
}

@OliverJAsh
Copy link
Contributor Author

Thanks for the review @kevinkyyro. I'm happy to make those changes once I've heard back from a maintainer of this repository, as then I will be able to batch it with any other requested changes. /cc @mjackson @pshrmn

@saravanansarz
Copy link

@OliverJAsh when is this expected to merge to mainstream branch and released officially or let us know a workaround for this, something like overriding createLocation function inside our project.

@OliverJAsh
Copy link
Contributor Author

@saravanansarz I am not a maintainer of this repo. We are awaiting feedback from a maintainer.

@saravanansarz
Copy link

@mjackson @pshrmn Any updates of merging @OliverJAsh changes ?

Copy link
Contributor

@pshrmn pshrmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (but am not certain) that this would have to be a major version bump of history in order to not break packages (e.g. React Router) that expect decoded pathnames.

Even if this PR was merged today, I couldn't give an estimate on when it would be published. Short term, publishing a fork is probably your best bet.

If you do publish a fork, I would leave a comment on the original issue about using Webpack's resolve.alias to use the fork with RR's build-in router components.

@OliverJAsh
Copy link
Contributor Author

@pshrmn I expected this would require a major version bump, but it's got to happen at some point, and there's a lot of demand for it.

What are the next steps to getting this PR merged?

In the meantime, I commented here about how we're maintaining a local fork (via patch-package), and how others can do the same. Alternatively, someone could publish a fork online.

@pshrmn
Copy link
Contributor

pshrmn commented Feb 13, 2019

To be frank, I don't know, which is why I suggest publishing a fork. That would also give some indication as to how much demand there is for this.

@kevinkyyro
Copy link

... That would also give some indication as to how much demand there is for this.

Please note that the level of demand can already be seen in #505, which at present is the issue with most 👍 reactions and comments of all issues, open or closed.

@oskarolsson-jimdo
Copy link

oskarolsson-jimdo commented Feb 21, 2019

I fail to see why this should be a fork. If it was really only done as a convenience to RR (as indicated by #505), then really it should be done on RR's side and not in this library -- which would make it a lot simpler to turn off by introducing a config flag.

Seems to me the sensible way to proceed is do a major bump with this PR, and then RR will have to adjust accordingly.

@pshrmn
Copy link
Contributor

pshrmn commented Mar 24, 2019

I've come around a bit on this and while encoded strings will be more annoying to work with, they're the only way that we can correctly support all valid pathnames.

Edit: I made the change myself, but I would appreciate if you take a look at the change and verify that this is fine with you.

@OliverJAsh can you add a dependency on https://www.npmjs.com/package/encodeurl and wrap the pathname in a call from that?

location.pathname = encodeURL(location.pathname);
return location;

That would ensure that every pathname is fully encoded, whether it comes from the browser or the user.

The LocationPathnameAlwaysSame test should also be updated to be "always encoded".

I don't have the power to merge this myself (well, technically I can but the decision isn't really mine to make), but those changes would be enough for me to support putting them out as a v5 release.

I have some thoughts on a future implementation that could support decoding, but they would also be breaking changes that would need to wait for a new React Router version, so that can wait.

@OliverJAsh
Copy link
Contributor Author

Thanks for the update @pshrmn.

Initially I thought this change should disable decoding, instead just using the pathname the user provides. (We've been using this change locally for many months at Unsplash and had no problems.)

However I see you're suggesting that we go further than this and proactively encode the URL (via encodeurl which only encodes non-encoded segments).

I think it's hard to think through the implications of this.

Is there any reason why we can't we just leave the pathname as it is (i.e. let the user decide how/when/if to encode it)?

One other thought I've just had: once this is done, we'll have to remember to decode params in React Router (#505 (comment)).

@pshrmn
Copy link
Contributor

pshrmn commented Mar 25, 2019

@OliverJAsh Consistency. I believe that the pathname should always be the same, no matter if it comes from the browser (where it would be encoded) or the user (where it may be non-encoded, encoded, or partially encoded).

I am open to removing the change if there is a good reason to support mixed pathname states, but I feel that that would only lead to bugs.

As far as React Router goes, I think that the "safest" approach would be to release this under v5, leave React Router's history dependency on v4, and clearly document that user's can manually install history v5 to have encoded pathnames. This isn't perfect; if there are bug fixes it would mean maintaining two versions. It would also mean that users start out with the "incorrect" implementation, but it is "safer" than bumping everyone up to v5 and telling everyone whose apps break to downgrade to `v4.

That would give us time to figure out the best approach for React Router v6, which would have encoded pathnames as default (although possibly giving the user the opportunity to provided a function for decoding it).

@OliverJAsh
Copy link
Contributor Author

OliverJAsh commented Mar 25, 2019

Let's put it to the users: #505 (comment).

@audunhalland
Copy link

I've been following this discussion for a while; our project too has problems with history and decoding. A clean and clear API is very important. In my experience, correctness of URL encoding is not something that the average developer pays a lot of attention to. The number of bugs on the internet related to this mess is immense, it's not just react.

"Auto-encoding" of a URL cannot be done unambiguously. Therefore, I think the new major version of the lib should only accept valid URLs as location, and IMO could even throw on invalid input, at least never touch or modify it. This would force developers to think more actively about encoding.

Just my 2 cents, as someone who has spent too many hours fixing sloppy URL code (disclaimer: my native alphabet is non-ASCII 😄 )

@Dalzhim
Copy link

Dalzhim commented Mar 25, 2019

@pshrmn : It seems to me that the best path forward is to apply @kevinkyyro 's suggestion to generate a warning. However, I believe the correct implementation for that warning would be to use the following comparison : location.pathname !== encodeurl(location.pathname). The encodeurl function doesn't try to encode anything that is already valid within an URL, thus if users of the library submit well formed URLs, the call to encodeurl should result in no changes at all. Any change means the original url was not well formed and it is a programming mistake that must be solved by the user of the library.

@Dalzhim
Copy link

Dalzhim commented Mar 25, 2019

@pshrmn : One more thing, considering the timeline to fix this 2year-old known issue is still stretching out, I would urge the maintainers of this repository to accept #675 in order to document this issue appropriately.

@jacobwgillespie
Copy link

jacobwgillespie commented Mar 25, 2019

"Auto-encoding" of a URL cannot be done unambiguously. Therefore, I think the new major version of the lib should only accept valid URLs as location, and IMO could even throw on invalid input, at least never touch or modify it. This would force developers to think more actively about encoding.

Just want to echo this - as someone handling URLs with unicode characters, I would much prefer the history library behavior to match the browser's behavior. There are some complex URL scenarios, and IMO it's important that code using history is able to have full access to the Location object and have the ability to implement whatever logic is necessary for the specific situation.

It's also important to avoid divergent URL behavior based on the entrypoint to a specific location, for instance visiting a URL directly, vs clicking on an <a> tag, vs navigating back/forward, vs navigating with pushState, vs navigating via history / <Link>. Those can produce different pathname behaviors if history doesn't align with the browser.

Today, it's not possible to have a consistent experience due to the auto-encoding behavior, so we have to work around this by directly accessing window.location and bypassing history and the router.

@pshrmn
Copy link
Contributor

pshrmn commented Mar 25, 2019

Throwing on invalid pathnames is an interesting idea.

<Link to={`/user/${encodeURIComponent("Beyoncé")}`}>Beyoncé</Link>

It is certainly more verbose for users, but if it is safer and has the same end result, I can support that.

I realized that we'll actually have to modify encodeurl because it leaves square brackets un-encoded (and undoes their encoding). This is done for IPv6 in a URL's host component, but shouldn't happen for the pathname.

modules/LocationUtils.js Outdated Show resolved Hide resolved
@mjackson
Copy link
Member

The last time I merged a PR that had to do with changing the way we do encoding, we ended up in this mess. @OliverJAsh Can you please summarize for me what your plan is here? There are so many changed files and so many comments...

@pshrmn
Copy link
Contributor

pshrmn commented Mar 26, 2019

Just leaving a note that I reverted my changes for the time being because I don't want to hold this PR up while deciding what to do with non-fully-encoded pathnames.

@OliverJAsh
Copy link
Contributor Author

OliverJAsh commented Mar 26, 2019

@mjackson

@OliverJAsh Can you please summarize for me what your plan is here?

The best summary is still the one I linked to in this PR's description: #505 (comment). Essentially, with this change, we stop decoding pathname and just leave it as is. This fixes #505.

There was some discussion about whether to proactively encode, but it seems this was dropped due to issues.

There is still some ongoing discussion in this PR about whether to warn or throw when users provide pathnames which are not correctly encoded. However I'm not sure whether this should block this PR or not.

It's still not clear how to manage this upgrade with regards to React Router, but @pshrmn has some thoughts. @mjackson We will likely need your help on this.

When we do upgrade React Router to use this change, we will have to consider what to do with match.params. IIUC, they are currently decoded because they are taken from pathname, which was decoded, but won't be after this change. Thus we will have to decide whether or not React Router should decode match.params.

My two cents: it should decode them, because that's what other frameworks like Express do. Furthermore, it should decode them using decodeURIComponent, not decodeURI. This will in turn fix another bug whereby not all characters are decoded inside match.params:

@mjackson
Copy link
Member

Thanks, @OliverJAsh. And thank you for sticking with this for so long. Let's get this merged.

Going forward, I think this library should have no opinion about encodings (it didn't, originally). When you read location.pathname it should be just like reading window.location.pathname. When you call history.push(url) it should work exactly like window.history.pushState(undefined, undefined, url) does.

Since this is a breaking change we will need to bump the major version. I'll add this to the roadmap.

@mjackson mjackson merged commit 845d690 into remix-run:master Mar 26, 2019
@OliverJAsh
Copy link
Contributor Author

Thank you for merging this! Please ping me in any upstream React Router changes which relate to this? :-)

@Dalzhim
Copy link

Dalzhim commented Mar 26, 2019

Great job @OliverJAsh on leading this PR! Thanks to @pshrmn and @mjackson for merging it! I'm looking forward for the upcoming release!

@mjackson mjackson mentioned this pull request Mar 26, 2019
25 tasks
@lock lock bot locked as resolved and limited conversation to collaborators May 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Path is decoded in createLocation
9 participants