Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Receiver to accept requests if both the hash and payload are missing #49

Merged
merged 3 commits into from
Jan 9, 2019

Conversation

jcwilson
Copy link
Contributor

@jcwilson jcwilson commented Oct 14, 2018

Fixes #51

It is an acceptable and expected use case that the sender not provide a content
hash when no content is available. The most common example of this would be
conventional GET requests.

These changes skip the receiver's content hash validation if there is no content to
hash, even if accept_untrusted_content is False.

See the js reference implementation here and here

@jcwilson jcwilson force-pushed the master branch 2 times, most recently from 424269f to 933d9ea Compare October 15, 2018 07:49
wheel >= 0.29.0
twine >= 1.6.5
wheel == 0.29.0
twine >= 1.6.5, < 1.8
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes were necessary to continue to support py26

- python: "3.7"
env: TOXENV=py37
sudo: required
dist: xenial
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hope this is ok

@kumar303
Copy link
Owner

Hi @jcwilson, thanks for spending time on a patch. Just so I understand it better, can you give me an example of an exception you were running into and what the request looked like? I'm confused because I thought we already handle this case for GET requests. Were you encountering this for non-GET requests?

@jcwilson
Copy link
Contributor Author

jcwilson commented Oct 17, 2018

Sure thing!

Just to be sure, I personally wasn't encountering this for non-GET requests, but I don't think it's a requirement that non-GET requests have a payload. For example, I wouldn't expect DELETE to have one, and I can imagine scenarios where POST wouldn't necessarily provide a payload either.

And the current implementation does handle GET requests, it's just that it imposes a constraint on the sender to provide a hash of a non-existent payload and to use and provide an irrelevant Content-Type header on the request, or have both sender and receiver "agree" on what content_type value to use if the payload is empty and the Content-Type header is missing. But in all cases the Receiver enforces that a hash value is always present in the Hawk header.

Here's a code sample that raises a MissingContent exception in the Receiver for what I believe would be a valid hawk-authed request (pretending that the https://iana.org site is protected with hawk):

import mohawk

CREDENTIALS = {
    'joe': {
        'id': 'joe',
        'key': 'supersecret',
        'algorithm': 'sha256'
    }
}
URL = 'https://www.iana.org/domains/reserved'
METHOD = 'GET'

sender = mohawk.Sender(
    credentials=CREDENTIALS['joe'],
    url=URL,
    method=METHOD,
    always_hash_content=False  # Omit the hash from the Hawk header
)

# Raises MissingContent
receiver = mohawk.Receiver(
    credentials_map=lambda user: CREDENTIALS[user],
    request_header=sender.request_header,
    url=URL,
    method=METHOD)

print(receiver.parsed_header)

And here's the relevant debug output from the curl -v https://www.iana.org/domains/reserved command. Notice the lack of Content-Type on the request (and lack of payload, of course):

> GET /domains/reserved HTTP/1.1
> Host: www.iana.org
> User-Agent: curl/7.59.0
> Accept: */*
>

< HTTP/1.1 200 OK
< Date: Wed, 17 Oct 2018 23:25:29 GMT
< X-Frame-Options: SAMEORIGIN
< Referrer-Policy: origin-when-cross-origin
< Content-Security-Policy: upgrade-insecure-requests
< Vary: Accept-Encoding
< Last-Modified: Tue, 21 Jul 2015 00:49:48 GMT
< Cache-control: public, s-maxage=900, max-age=7202
< Expires: Thu, 18 Oct 2018 01:25:29 GMT
< Content-Type: text/html; charset=UTF-8
* Server Apache is not blacklisted
< Server: Apache
< Strict-Transport-Security: max-age=48211200; preload
< X-Cache-Hits: 29
< Accept-Ranges: bytes
< Content-Length: 10225
< Connection: keep-alive
<<html content follows here>>

The changes in this PR would allow the Receiver to successfully validate the request and continue on to the print() statement.

One might point out that we'd get the same behavior just by providing accept_untrusted_content=True to the Receiver, but that would either be too broad, allowing a client to omit the hash on any call and bypass server-side enforcement, or unnecessarily cumbersome for the user of this library to check for the presence of the hash and content in their application in order to decide what value to provide for accept_untrusted_content.

@jcwilson
Copy link
Contributor Author

Is there anything else that I can do to help get this accepted?

@aj-sp
Copy link

aj-sp commented Oct 29, 2018

We have a similar problem. Hawkrest library relies on the default behavior of Mohawk, which expects all payload requests to be hashed, including GETs. This change would be greatly appreciated.

@jcwilson
Copy link
Contributor Author

jcwilson commented Nov 2, 2018

I have started the work to add hawk (via mohawk) as a supported backend for falcon-auth, but I'm considering this a blocker for that work to proceed.

Copy link
Owner

@kumar303 kumar303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed review and thanks for your patience.

Originally, I intended not to allow the behavior you're requesting because I didn't think it was necessary. A client that is passing an empty payload can simply hash an empty payload. I see how this may be inconvenient, though, so I'm willing to accept a patch for it.

However, we need to minimize the footguns here. The NodeJS library you linked to has a lot of footguns (in my opinion) which is also why I did not model the mohawk interface on theirs. I have requested some changes to remove the footguns.

mohawk/base.py Outdated
'(no hash in header)')
check_hash = False
content_hash = None
elif (resource.content is EmptyValue):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EmptyValue means that content has not been given an explicit value. What you meant was to look for actual empty values, such as an empty string or None.

We need to protect against someone defining a receiver object and forgetting to pass in the content keyword.

For example, the following code contains a terrible typo that would allow content tampering, something that the developer did not intend.

request = {
  'headers': {
    'Authorization': sender.request_header,
    'Content-Type': content_type
  },
  'url': url,
  'method': method,
  'content': 'Evil unhashed content'
}

# Whoops! The developer forgot to pass in the `content` keyword. This could have been a mistake.
receiver = Receiver(
  lookup_credentials,
  request['headers']['Authorization'],
  request['url'],
  request['method'],
  content_type=request['headers']['Content-Type'])

mohawk/tests.py Outdated
always_hash_content=False))

def test_expected_unhashed_empty_content(self):
content = EmptyValue
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When content is EmptyValue, MissingContent should still be raised. In other words, you should keep this test but add @raises(MissingContent) to the top. After that, add two more tests. One where content='' and another where content=None.

docs/usage.rst Outdated
@@ -393,7 +393,10 @@ without a declared hash using ``accept_untrusted_content=True``:

This will skip checking the hash of ``content`` and ``content_type`` only if
the ``Authorization`` header omits the ``hash`` attribute. If the ``hash``
attribute is present, it will be checked as normal.
attribute is present, it will be checked as normal. For requests that have no
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's document this explicitly. Say that content can be an empty string or content can be None.

@kumar303
Copy link
Owner

kumar303 commented Nov 2, 2018

I'm still a little confused because I thought I already made a compromise to allow the feature you are requesting in #45 🤔 It's been a while so I'll have to look back at it in detail.

@jcwilson
Copy link
Contributor Author

jcwilson commented Nov 3, 2018

I'm still a little confused because I thought I already made a compromise to allow the feature you are requesting in #45 🤔 It's been a while so I'll have to look back at it in detail.

I saw that, too, and was hoping that it would suffice, but I couldn't find a valid set of content and content_type parameters to provide to the Receiver for it to accept a request without a payload and hash. I could just be missing it, though.

However, we need to minimize the footguns here. The NodeJS library you linked to has a lot of footguns (in my opinion) which is also why I did not model the mohawk interface on theirs. I have requested some changes to remove the footguns.

Agreed, and I'm happy to help out there :)

I think what would be helpful is to make sure my thinking aligns with your intent and if so, communicate it clearly to the user in the documentation in this PR. I may have misunderstood the intent of the EmptyValue default value in this PR so far, so I'd like to clarify here.

I'll keep the discussion focused on the Receiver use case and the content parameter since we'd only expect & examine the content_type parameter if the Receiver deems it's necessary to hash the content, and I believe the existing checks on its presence are sufficient.

We have essentially three "states" for the content parameter: EmptyValue, None and any other value (including the empty string). The user should never explicitly provide EmptyValue for the content parameter, as we intend for it to indicate that the user erroneously omitted that parameter from the call (correct?). None should indicate that no content was present on the request and that no hashing should occur. Any value other than None should expect the hash value to be present in the Hawk header, and the content will be hashed and validated against that.

Situations that might result in None for content would be things like HEAD, GET, DELETE requests that do not contain an entity body. One can determine this by checking for the absence of the Content-Length or Transfer-Encoding headers. It's not an error to include an entity body on these requests, though, so the user should use the existence of one of those headers when determining the content value.

All POST requests should have an entity body. If the Content-Length is 0 then content would be the empty string.

The logic for determining the content value would be consistent in all cases, though: if Content-Length or Transfer-Encoding are defined, content will be non-None.

Then the hash validation logic becomes:

  • First, consider it an error to pass EmptyValue for content (ie. omit it) when accept_untrusted_content == False - content should be None or the entity body.
  • Then consider it an error if content is given as None and a hash is found in the Hawk header. After all, how can one expect to hash null content? Any client library conforming to the reference JS implementation would not send such a request. I do not think coercing null content to the empty string is the correct course of action. (caveat: if accept_untrusted_content == True, skip hash validation)
  • Finally, if content is None, skip the hash validation, otherwise compare the hashes.

Thank you for being patient with me, too! :)

@kumar303
Copy link
Owner

kumar303 commented Nov 5, 2018

...we intend for EmptyValue to indicate that the user erroneously omitted that parameter from the call (correct?).

Correct.

None should indicate that no content was present on the request and that no hashing should occur.

Yes.

Any value other than None should expect the hash value to be present in the Hawk header, and the content will be hashed and validated against that.

If you are setting content based on a framework (Django, for example), it might set empty content to an empty string. I think it would be fine to treat an empty string as None since the effect is the same: the receiving end would not rely on content when processing the request.

The logic for determining the content value would be consistent in all cases, though: if Content-Length or Transfer-Encoding are defined, content will be non-None.

Depending on these headers seems out of scope for Receiver; it would be easy to write a wrapper function that sets content to None based on request headers, if it made sense to do so. Anyway, if we treat empty strings as None then it may not even matter.

All POST requests should have an entity body...

A POST with an empty body is legal. Some apps do this to tell the browser that a destructive operation is happening which will configure caching appropriately.

First, consider it an error to pass EmptyValue for content (ie. omit it) when accept_untrusted_content == False

That seems right.

Then consider it an error if content is given as None and a hash is found in the Hawk header. After all, how can one expect to hash null content?

It is possible to hash an empty string so I think we can convert None to an empty string for this case and attempt to validate the hash.

Finally, if content is None, skip the hash validation, otherwise compare the hashes.

Again, it's possible to hash an empty string, so, if a hash is present, compare it. Otherwise, skip it.

@jcwilson
Copy link
Contributor Author

jcwilson commented Nov 6, 2018

The logic for determining the content value would be consistent in all cases, though: if
Content-Length or Transfer-Encoding are defined, content will be non-None.

Depending on these headers seems out of scope for Receiver; it would be easy to write a wrapper function that sets content to None based on request headers, if it made sense to do so.

Yeah, that's fair. I should have clarified that I'd expect the caller to do this work, not mohawk.

Anyway, if we treat empty strings as None then it may not even matter.

My concern with this is that it's a departure from the reference JS implementation where it checks for payload content (even if it's just the empty string) and then errors if the hash is not present.

All POST requests should have an entity body...

A POST with an empty body is legal. Some apps do this to tell the browser that a destructive operation is happening which will configure caching appropriately.

Yep, and empty (as opposed to non-existent) to me would mean a content value of the empty string and as such would warrant the hashing. But I think this becomes a moot point with your suggestion to coerce None to the empty string when a hash is present. At the very least, it would correctly handle requests from a reference-conforming client.

I think I'm ok with coercing None to the empty string if the hash is present (same would apply for content_type I would imagine). I don't think there is a need to coerce the empty string to None yet (maybe there would be some checks that accept both None and '', though). I'll need some time to put all this together into a new PR with the new suggestions.

@kumar303
Copy link
Owner

kumar303 commented Nov 6, 2018

My concern with this is that it's a departure from the reference JS implementation where it checks for payload content (even if it's just the empty string) and then errors if the hash is not present.

ok, I see. Yes, I suppose it would make sense to also raise an error for this case.

I'll need some time to put all this together into a new PR with the new suggestions.

Sounds good! No rush.

…e missing

It is an acceptable and expected use case that the sender not provide a content
hash when no content is available. The most common example of this would be
conventional GET requests.

These changes omit the content hash validation if there is no content to hash,
even if `accept_untrusted_content` is `False`.
@jcwilson
Copy link
Contributor Author

Ok, I think I've wrapped this up. It turns out that once we treat None and '' as equivalent values, a lot of the confusion falls away. There's still the divergence from the reference implementation, but I think that's out of scope for this PR and IMO isn't of that much consequence even if it's never addressed.

I've replaced the unexpected unhashed content test with two tests: one for the None case and one for the empty string case.

Thanks for looking.

@jcwilson
Copy link
Contributor Author

bump :)

@kumar303
Copy link
Owner

kumar303 commented Dec 28, 2018 via email

@jcwilson
Copy link
Contributor Author

No problem and no rush from my end. Thanks for the update and enjoy your vacation.

Copy link
Owner

@kumar303 kumar303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for your patience. I requested some minor cleanup but I also wanted your feedback on whether or not we need to check content_type. I'm not sure. Empty content is empty content.

mohawk/base.py Outdated
# It is acceptable to not receive a hash if there is no content
# to hash.
log.debug('NOT calculating/verifiying payload hash '
'(no hash in header, but no content either)')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all that matters here is that there's an empty body, I suggest this: "NOT calculating/verifying payload hash (request body is empty)"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with (no hash in header, request body is empty) to be consistent with the one below. I can change it back to this suggestion if you wish, though.

mohawk/base.py Outdated
# content_type values will be coerced to the empty string for
# hashing purposes.
log.debug('NOT calculating/verifiying payload hash '
'(no hash in header)')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this super clear, how about adding: "(no hash in header, accept_untrusted_content=True)"

mohawk/tests.py Outdated
# the payload when in fact there is literally no content. In this case,
# mohawk depends on the presence of the content hash in the auth header
# to determine how to treat the empty strings: no hash in the header
# implies that no hashing is expected to occur on the server.
self.receive(sender_kw=dict(content=EmptyValue,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test relies on default params and those values are important. I suggest setting them explicitly:

diff --git a/mohawk/tests.py b/mohawk/tests.py
index 7d117ed..a784882 100644
--- a/mohawk/tests.py
+++ b/mohawk/tests.py
@@ -642,7 +642,9 @@ class TestReceiver(Base):
         # mohawk depends on the presence of the content hash in the auth header
         # to determine how to treat the empty strings: no hash in the header
         # implies that no hashing is expected to occur on the server.
-        self.receive(sender_kw=dict(content=EmptyValue,
+        self.receive(content='',
+                     content_type='',
+                     sender_kw=dict(content=EmptyValue,
                                     content_type=EmptyValue,
                                     always_hash_content=False))
 

mohawk/tests.py Outdated
@raises(MisComputedContentHash)
def test_unexpected_unhashed_content(self):
def test_expected_unhashed_empty_content(self):
# The receiver will receive empty strings for content and content_type
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment would read better to me if it was clearly about the test itself. I suggest:

This test sets up a scenario where the receiver will receive empty strings...

mohawk/tests.py Outdated
self.receive(sender_kw=dict(content=EmptyValue,
content_type=EmptyValue,
always_hash_content=False))

def test_expected_unhashed_no_content(self):
# The receiver will receive None for content and content_type and no
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this comment would read better to me with a preamble:

This test sets up a scenario where the receiver will receive None for content...

mohawk/base.py Outdated
if not resource.content and not resource.content_type:
# It is acceptable to not receive a hash if there is no content
# to hash.
log.debug('NOT calculating/verifiying payload hash '
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: verifiying (I think it was in the old code)

mohawk/base.py Outdated
# Allow the request, even if it has content. Missing content or
# content_type values will be coerced to the empty string for
# hashing purposes.
log.debug('NOT calculating/verifiying payload hash '
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: verifiying (again)

log.info('request unexpectedly did not hash its content')
if 'hash' not in parsed_header:
# The request did not hash its content.
if not resource.content and not resource.content_type:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really need to check content_type? As far as I can tell, the Hawk reference implementation does not check it for this case.

What I mean is: this line could maybe just be:

if not resource.content:
    log.debug('NOT calculating...')

If you think it does need to check content_type then we should at least add a test case for it:

diff --git a/mohawk/tests.py b/mohawk/tests.py
index 7d117ed..8ab4249 100644
--- a/mohawk/tests.py
+++ b/mohawk/tests.py
@@ -658,6 +658,14 @@ class TestReceiver(Base):
                                     content_type=EmptyValue,
                                     always_hash_content=False))
 
+    @raises(MisComputedContentHash)
+    def test_cannot_receive_partially_empty_content(self):
+        self.receive(content=None,
+                     content_type='text/plain',
+                     sender_kw=dict(content=EmptyValue,
+                                    content_type=EmptyValue,
+                                    always_hash_content=False))
+
     @raises(MissingContent)
     def test_cannot_receive_empty_content_only(self):
         content_type = 'text/plain'

Copy link
Contributor Author

@jcwilson jcwilson Jan 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the content_type check is necessary for the cases where the server would expect a hash and the content == '' and the content_type is an actual non-empty string value. I've updated the tests to reflect this intent.

I think the only weird scenario now is test_expected_unhashed_no_content_with_content_type() but handling that differently would require us to revisit the coercing decision, which I think is outside the scope of this work.

@@ -3,9 +3,9 @@ mock >= 1.0.1
nose >= 1.3.0

# For documentation.
Sphinx >= 1.2.1
Sphinx >= 1.2.1, < 1.5
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't get the docs to build but I was confused that the PR was green. I discovered that tox isn't triggering the html builder because of a typo (you can't specify -b more than once). Oops. Can you add this to your patch? It's working for me after this. I could put it on master but it will create a conflict.

diff --git a/requirements/dev.txt b/requirements/dev.txt
index 674106b..b34c439 100644
--- a/requirements/dev.txt
+++ b/requirements/dev.txt
@@ -3,6 +3,7 @@ mock >= 1.0.1
 nose >= 1.3.0
 
 # For documentation.
+docutils < 0.13.1
 Sphinx >= 1.2.1, < 1.5
 sphinx-rtd-theme >= 0.1.5
 
diff --git a/tox.ini b/tox.ini
index 7a40af1..d018a59 100644
--- a/tox.ini
+++ b/tox.ini
@@ -18,4 +18,5 @@ basepython=python2.7
 changedir=docs
 deps={[base]deps}
 commands=
-    sphinx-build -b html -b doctest -d {envtmpdir}/doctrees .  {envtmpdir}/html
+    sphinx-build -b html -d {envtmpdir}/doctrees .  {envtmpdir}/html
+    sphinx-build -b doctest -d {envtmpdir}/doctrees .  {envtmpdir}/doctest

@@ -395,6 +395,15 @@ This will skip checking the hash of ``content`` and ``content_type`` only if
the ``Authorization`` header omits the ``hash`` attribute. If the ``hash``
attribute is present, it will be checked as normal.

For requests whose ``content`` (and by extension ``content_type``) is ``None``
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks. I expanded on it a bit to add more context. Can you add this?

diff --git a/docs/usage.rst b/docs/usage.rst
index 4704d2c..16a7339 100644
--- a/docs/usage.rst
+++ b/docs/usage.rst
@@ -395,10 +395,17 @@ This will skip checking the hash of ``content`` and ``content_type`` only if
 the ``Authorization`` header omits the ``hash`` attribute. If the ``hash``
 attribute is present, it will be checked as normal.
 
+Empty requests
+==============
+
 For requests whose ``content`` (and by extension ``content_type``) is ``None``
 or ``''``, it is acceptable for the sender to omit the declared hash,
 regardless of the ``accept_untrusted_content`` value provided to the
-:class:`mohawk.Receiver`. If the ``hash`` attribute is present and
+:class:`mohawk.Receiver`.
+For example, a ``GET`` request typically has empty content and some
+libraries may or may not hash the content.
+
+If the ``hash`` attribute *is* present for an empty request and
 ``accept_untrusted_content`` is ``False``, a ``None`` value for either
 ``content`` or ``content_type`` will be coerced to ``''`` prior to hashing.
 This is to account for some dependent libraries that may provide the empty

* Updated documentation with suggested clarifying edits
* Updated logging debug message with suggested clarifying edits
** Did a little massaging for consistencies sake
* Updated tests with further tests and better doc strings
* Updated tox.ini with suggested edits
Copy link
Owner

@kumar303 kumar303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thanks again. I'll get a release out shortly. I didn't realize that were other important changes to release, as well.

@raises(MisComputedContentHash)
def test_unexpected_unhashed_content(self):
self.receive(sender_kw=dict(content=EmptyValue,
def test_expected_unhashed_empty_content_with_content_type(self):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this case. I'm not totally sure it's the right thing to do (the content is still empty even though there is a content type) but I'm fine with shipping like this. Someone will complain if it poses a problem.

@kumar303 kumar303 merged commit 9b27096 into kumar303:master Jan 9, 2019
@jcwilson
Copy link
Contributor Author

jcwilson commented Jan 9, 2019

Thanks for all the help and attention on this. I'd be glad to help address any potential future issues regarding these changes

@kumar303
Copy link
Owner

kumar303 commented Jan 9, 2019

Awesome -- future help would be great!

I just released this in 1.0.0 https://mohawk.readthedocs.io/en/latest/#changelog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants