Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uri::toString does not encode special characters #22

Open
sakiss opened this issue Mar 10, 2020 · 6 comments
Open

Uri::toString does not encode special characters #22

sakiss opened this issue Mar 10, 2020 · 6 comments

Comments

@sakiss
Copy link

sakiss commented Mar 10, 2020

Steps to reproduce the issue

In any layout file add the following code:

<?php
$uri = Uri::getInstance('index.php?option=com_content&var=#//:?&@\')');?>
<a href="<?php echo $uri;?>">some link</a>

Then check the link in the browser.

Expected result

The link contains special/reserved characters which are supposed to be encoded.
Otherwise the url is not functional.

Actual result

Non encoded reserved characters.
Even if we pass the characters encoded, those are decoded when the url is converted to string.

@mbabker
Copy link
Contributor

mbabker commented Mar 10, 2020

This sounds like a case where you would need to handle any required encoding to me after the URI class creates the string, I don't think this is something that you can address in a generally reusable abstraction class. Using the example within the HTML context, you're right in that it would be best if special characters were encoded. BUT, considering the URI class is used in other non-HTML contexts, assuming that the resulting URL should be encoded for a specific scenario isn't the best of ideas. That assumption could very well break the Framework's HTTP package as an example, which leads to other breakages in other systems.

So any change proposal needs to tread very carefully.

@mbabker
Copy link
Contributor

mbabker commented Mar 10, 2020

#21 has a past attempt at dealing with encoding.

@sakiss
Copy link
Author

sakiss commented Mar 10, 2020

Thanks for the feedback!

When the query contains reserved characters, those should not be encoded, no matter the usage?
https://tools.ietf.org/html/rfc3986#section-2.2

@mbabker
Copy link
Contributor

mbabker commented Mar 10, 2020

To be honest, I don't know the RFCs well enough in that regard to say what the absolute "right" behavior is. The way the RFC reads is that encoding is only mandatory if it doesn't conflict with a character's other use in that segment of a URI, so & only requires encoding if you're trying to pass that character inside the value of something in a query string, but doesn't seem to require encoding in the path or fragment segments (and I'll say I'm either completely over-simplifying this or am completely wrong here, but like I said, I'm clueless on what the "right" thing is).

Maybe bounce things off of how Laminas or Guzzle implement their PSR-7 URI classes, and see if there's something in those implementations that can set a guideline on how to change things here.

@sakiss
Copy link
Author

sakiss commented Mar 11, 2020

The way the RFC reads is that encoding is only mandatory if it doesn't conflict with a character's other use in that segment of a URI, so & only requires encoding if you're trying to pass that character inside the value of something in a query string.

This sounds reasonable and seems to be backed by the RFC3986.

but doesn't seem to require encoding in the path or fragment segments (and I'll say I'm either completely over-simplifying this or am completely wrong here, but like I said, I'm clueless on what the "right" thing is)

I think that the limitations apply in every component (except the host, the port and the user) and this seem to happen in the Guzzle:PSR7 class and the laminas as well.

Though i am not sure if the encoding should be done in every non-ascii character or just the reserved. Given that W3 encourages the use of UTF-8 characters (IRIs) and most web browsers (see example) do the encoding automatically for non reserved characters (not sure about other clients). Sites like wikipedia use IRIs since years.

@sakiss
Copy link
Author

sakiss commented Mar 12, 2020

@mbabker If we set some specs and this has chances to be merged in J4, i can contribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants