Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for maximum body size and Content-Encoding #1571

Closed
wants to merge 5 commits into from

Conversation

Projects
None yet
2 participants
@nnposter
Copy link

commented Apr 24, 2019

This patch combines the implementation for both #473 and #1493. Due to circumstances, it was not practical to break the two features apart (but it can be done if absolutely necessary).

Content-Encoding

  • The library now transparently handles the Content-Encoding header, recognizing encodings identity and gzip.
  • The body member of the HTTP response object now contains the processed (decoded) body.
  • The original body, as received from the server, is preserved in a new member, rawbody.
  • Corrupted encoded body is treated like any other response error, including the response status being nil.
  • Unsupported encoding is not triggering an error, instead simply resulting in undecoded body. Note that this body might not be the same as rawbody because other encodings might have been already processed.
  • New response member decoded contains a list of content encodings that were successfully processed.
  • New response member undecoded contains a list of encodings that could not be processed, either because they are not currently supported or the body is corrupt. In other words, a body was successfully decoded if this list is empty (or nil, if no encodings were used in the first place).
  • Returned content-encoding and content-length headers are adjusted to remain consistent with body. (If all encodings got processed then the content-encoding header is removed altogether, which can serve an alternate test for decoding problems.)

Body Size Limit

  • Newly implemented library parameter http.max-body-size defines the maximum response body size (in bytes) the library is willing to receive. The default value is 2 MB. The limit can be completely disabled by setting the value to -1. Note that the limit is enforced by default.
  • This behavior can be overridden case-by-case with request option max_body_size, with its values having the same meaning.
  • This enforcement applies to raw bodies as they are received, regardless of method (Content-Length, Transfer-Encoding, or connection termination)
  • This enforcement also applies when bodies are getting decoded according to Content-Encoding. This provides protection against the so-called zip bomb.
  • Bodies that exceed the limit are still made available in truncated form through the returned response object. (See below for details.) The truncated body is guaranteed to match the size limit exactly.
  • The response object would have status member equal to nil. In other words, the response is composed as a failure, so scripts would not use it unless they explicitly want to.
  • Just as with the status, member body would be nil so scripts would not accidentally use the truncated body.
  • In case of any response processing errors, not just the oversize body, a newly implemented member incomplete contains the partially built response object. This way scripts can for example inspect response status code (preserved as response.incomplete.status) even if the header or body parsing failed.
  • In summary, the truncated body would be available as response.incomplete.body and scripts that do not care about the truncation could obtain the body by sourcing expression response.body or response.incomplete.body.
  • By setting a newly implemented library parameter http.truncated-ok to true, it is possible to suppress the oversized body error, treating the response object as a success, and return the truncated body in the body member as usual. In other words, member incomplete becomes the response itself and a newly implemented response member truncated is set to true.
  • Library parameter http.truncated-ok can be overridden case-by-case with request option truncated_ok.
  • Truncated responses are never cached.
  • By enforcing the body size limit, the library would stop receiving socket data when this limit is reached, leaving the open socket in a state unfit for further use. In other words, the library would not try to "suck up" and discard the remainder of the body.
  • When using a pipeline, the connection is restarted, beginning with the next request. This means that one or more requests following a truncated response could be submitted more than once. To compensate for this behavior, either do not use pipeline for state-changing requests or set parameter http.max-pipeline to 1, which still uses persistent connections but processes only one request at at time.

Please review and comment.

@dmiller-nmap

This comment has been minimized.

Copy link

commented Apr 27, 2019

Thanks for this important work! I have a quick question before I review: how does this work in conjunction with HTTP pipelining? You said "leaving the open socket in a state unfit for further use," but if we're expecting to receive more responses on the socket, we'd have to continue to discard the rest of the body to get to the next response.

@nnposter

This comment has been minimized.

Copy link
Author

commented Apr 27, 2019

Discarding the oversized body just to get to the next response would IMHO defeat the purpose of the size limit. I am instead restarting the connection. You can see it on line 2010:

 if connsent >= connlimit or resp.truncated or not socket:get_info() then
@dmiller-nmap
Copy link

left a comment

This looks like really great stuff. I left a few comments, but only minor things need to change. Looking forward to seeing this committed!

Show resolved Hide resolved nselib/http.lua Outdated
maxlen = maxlen - #part
if maxlen < 0 then
table.insert(parts, part:sub(1, maxlen - 1))
return nil, ERR_OVERSIZED_BODY, table.concat(parts)

This comment has been minimized.

Copy link
@dmiller-nmap

dmiller-nmap May 17, 2019

In a future revision, we might consider simply receiving and discarding the rest of the body in this case, at least while pipelining. It would take more time and bandwidth than just dropping the connection, but it would avoid missing later response bodies in the pipeline. I'm not going to suggest you hold up this PR just to make this change, but consider it for future improvement.

This comment has been minimized.

Copy link
@nnposter

nnposter May 19, 2019

Author

At the moment I am leaning against this alternative. In my opinion we should prevent getting stuck with potentially infinite responses.

If somebody wants to take advantage of the persistent connection but to avoid resubmitted requests then it is always possible to set http.max-pipeline=1.

Show resolved Hide resolved nselib/http.lua

nnposter added some commits May 19, 2019

@nnposter

This comment has been minimized.

Copy link
Author

commented May 19, 2019

@dmiller-nmap Thank you for reviewing the PR. Please let me know if you have any other questions or concerns.

@dmiller-nmap

This comment has been minimized.

Copy link

commented May 20, 2019

No other questions. Go ahead with this and we'll see how the community likes it! I'm guessing you're already using it in your own scans.

@nnposter

This comment has been minimized.

Copy link
Author

commented May 21, 2019

Yes, I have been using the size limit for about a month and the gzip support for additional two. That said, I recognize that this is a non-trivial change in code that is underpinning a large variety of scripts so I fully expect that bugs will crop up.

Committed as r37627 (df26932).

@nnposter

This comment has been minimized.

Copy link
Author

commented Jun 10, 2019

Commit r37627 (df26932) considered sufficiently stable.

@nnposter nnposter closed this Jun 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.