Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

falcon.uri.encode_value does not encode the percent character #1872

Closed
vytas7 opened this issue Feb 28, 2021 · 11 comments
Closed

falcon.uri.encode_value does not encode the percent character #1872

vytas7 opened this issue Feb 28, 2021 · 11 comments

Comments

@vytas7
Copy link
Member

vytas7 commented Feb 28, 2021

falcon.uri.encode_value says:

An escaped version of uri, where all disallowed characters have been percent-encoded.

However, it seems that the percent character itself is not encoded...

Cf

>>> from urllib.parse import quote
>>> from falcon import uri
>>> quote('%26')
'%2526'
>>> uri.encode('%26')
'%26'
>>> uri.encode_value('%26')
'%26'
>>> uri.decode(uri.encode_value('%26'))
'&'
>>> uri.decode(quote('%26'))
'%26'
@vytas7 vytas7 added the bug label Feb 28, 2021
@vytas7
Copy link
Member Author

vytas7 commented Feb 28, 2021

Found this when trying to fix #1871...

@vytas7 vytas7 changed the title falcon.uri.encode_value does not encode the percent sign falcon.uri.encode_value does not encode the percent character Feb 28, 2021
@vytas7 vytas7 added good first issue needs contributor Comment on this issue if you'd like to volunteer to work on this. Thanks! labels Feb 28, 2021
@vytas7 vytas7 added this to the Version 3.1 milestone Feb 28, 2021
@MinesJA
Copy link
Contributor

MinesJA commented Mar 2, 2021

I'd be interested in taking this.

@vytas7
Copy link
Member Author

vytas7 commented Mar 2, 2021

That's awesome @MinesJA !
(If you're quick, this might still make it into 3.0.)

@MinesJA
Copy link
Contributor

MinesJA commented Mar 2, 2021

Will try to be quick then!

@vytas7 vytas7 removed the needs contributor Comment on this issue if you'd like to volunteer to work on this. Thanks! label Mar 2, 2021
@MinesJA
Copy link
Contributor

MinesJA commented Mar 4, 2021

So I think I found the cause of the bug, but had some questions on the functionality we're looking for. In this snippet, we check to see if a string has already been encoded, so if we encounter a string like 'abc%26def' we assume the '%26' was a result of the string already being encoded:

  tokens = uri.split('%')
  for token in tokens[1:]:
  hex_octet = token[:2]

  if not len(hex_octet) == 2:
      break

  if not (hex_octet[0] in _HEX_DIGITS and
              hex_octet[1] in _HEX_DIGITS):
      break
  else:
      # NOTE(kgriffs): All percent-encoded sequences were
      # valid, so assume that the string has already been
      # encoded.
      return uri

But it looks like 'quote' doesn't do that. I did notice this comment and was wondering why it was that we think there's a good chance the string has already been escaped by this point?

# NOTE(kgriffs): There's a good chance the string has already
# been escaped. Do one more check to increase our certainty.

@vytas7
Copy link
Member Author

vytas7 commented Mar 4, 2021

Heh, we need to check with @kgriffs then.

However, personally, I think this is a bug regardless of the good intentions mentioned above.
There might be cases where one wants to encode the whole URI as a parameter, and pass it along with other query parameters. (see, for instance this exotic use case: #1856).

Not encoding already escaped characters could then mangle that URI in the same way as, for instance, to_query_str was doing in #1871.

@kgriffs thoughts?

@kgriffs
Copy link
Member

kgriffs commented Mar 4, 2021

Here is the original issue: #68

Admittedly my original attempt at solution was implemented at the wrong layer; encoder() itself shouldn't short-circuit the encoding, at least not by default.

That being said, we should probably minimize breaking changes by retaining the current behavior for these Response properties/methods:

location
content_location
append_link()

@MinesJA
Copy link
Contributor

MinesJA commented Mar 4, 2021

Ok cool, do you think the best way to handle this would be to split the method to retain current behavior for those properties/methods but expose general encode / encode_value methods that don't perform that check? I can submit a pr that demonstrates that if it's easier to continue the discussion on an actual proposal.

@vytas7
Copy link
Member Author

vytas7 commented Mar 4, 2021

@MinesJA Alternatively, we could probably add a new parameter to these methods controlling this behaviour (to detect and ignore existing escapes). The above mentioned properties could then use the new parameter to retain the backwards compatible behaviour.

@MinesJA
Copy link
Contributor

MinesJA commented Mar 4, 2021

Ahh, that's much simpler. Ok, let me try to put something together for review.

MinesJA added a commit to MinesJA/falcon that referenced this issue Mar 7, 2021
Change encode to escape percent by default even if percent
appears to have already been escaped.
Add check_is_escaped flag to allow for option to retain
previous behavior of ignoring strings that appeared escaped.
Use check_is_escaped=True behavior where encode and encode_value
are used in response.

closes falconry#1872
MinesJA added a commit to MinesJA/falcon that referenced this issue Mar 7, 2021
Split tests into already existing encode and encode_value
tests.
Revert accidental formatting changes.
Fix comments.
Tweak doc changelog.

closes falconry#1872
MinesJA added a commit to MinesJA/falcon that referenced this issue Mar 7, 2021
Fix trailing white spaces.
Makes cosmetic changes.

closes falconry#1872
MinesJA added a commit to MinesJA/falcon that referenced this issue Mar 8, 2021
add additional check_is_escaped to expand code coverage

closes: falconry#1872
MinesJA added a commit to MinesJA/falcon that referenced this issue Mar 8, 2021
revert unnecssary tox.ini change

part of fix closing falconry#1872
@MinesJA
Copy link
Contributor

MinesJA commented Mar 9, 2021

@kgriffs @vytas7 Submitted a pr for this. Let me know what you think.

MinesJA added a commit to MinesJA/falcon that referenced this issue Mar 15, 2021
create two new encode methods for encode and encode value
that also check for escapes

closes falconry#1872
MinesJA added a commit to MinesJA/falcon that referenced this issue Mar 15, 2021
update docs to reflect 2 added encode
methods, encode_check_escaped and encode_value.
edit method names and doc strings

closes falconry#1872
MinesJA added a commit to MinesJA/falcon that referenced this issue Mar 15, 2021
update bugfix message to explain change.
remove unnecessary comment.

closes falconry#1872
MinesJA added a commit to MinesJA/falcon that referenced this issue Mar 15, 2021
add imports to falcon uri to satisfy doc test

closes falconry#1872
@vytas7 vytas7 closed this as completed in f895ca6 Mar 16, 2021
@vytas7 vytas7 modified the milestones: Version 3.1, Version 3.0 Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants