Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for issue30458 (HTTP Header Injection) prevents crafting invalid requests #82397

Closed
jaraco opened this issue Sep 18, 2019 · 30 comments
Closed

Comments

@jaraco
Copy link
Member

jaraco commented Sep 18, 2019

BPO 38216
Nosy @gpshead, @jaraco, @larryhastings, @benjaminp, @ned-deily, @mcepl, @ambv, @webknjaz, @ammaraskar, @miss-islington, @tipabu, @tirkarthi
PRs
  • bpo-38216, bpo-36274: Allow subclasses to override validation and encoding behavior #16321
  • bpo-38216, bpo-36274: Allow subclasses to separately override validation and encoding behavior #16448
  • [3.8] bpo-38216, bpo-36274: Allow subclasses to separately override validation and encoding behavior (GH-16448) #16460
  • [3.7] bpo-38216, bpo-36274: Allow subclasses to separately override v… #16461
  • [3.6] bpo-38216, bpo-36274: Allow subclasses to separately override v… #16462
  • [3.5] bpo-38216, bpo-36274: Allow subclasses to separately override validation and encoding behavior (GH-16448) #16475
  • [2.7] bpo-38216, bpo-36274: Allow subclasses to separately override validation and encoding behavior (GH-16448) #16476
  • Files
  • 0001-bpo-38216-Only-forbid-CR-LF-and-SP-in-http-URLs.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-10-08.17:37:30.548>
    created_at = <Date 2019-09-18.16:14:30.701>
    labels = ['release-blocker']
    title = 'Fix for issue30458 (HTTP Header Injection) prevents crafting invalid requests'
    updated_at = <Date 2019-10-08.17:37:30.547>
    user = 'https://github.com/jaraco'

    bugs.python.org fields:

    activity = <Date 2019-10-08.17:37:30.547>
    actor = 'larry'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-10-08.17:37:30.548>
    closer = 'larry'
    components = []
    creation = <Date 2019-09-18.16:14:30.701>
    creator = 'jaraco'
    dependencies = []
    files = ['48618']
    hgrepos = []
    issue_num = 38216
    keywords = ['patch', '3.6regression', '3.7regression']
    message_count = 30.0
    messages = ['352738', '352759', '352802', '352827', '352830', '352831', '352839', '352847', '352850', '352870', '352899', '352912', '352930', '352937', '352982', '352983', '352988', '352990', '353001', '353040', '353344', '353447', '353449', '353453', '353464', '353481', '353484', '353501', '354159', '354225']
    nosy_count = 12.0
    nosy_names = ['gregory.p.smith', 'jaraco', 'larry', 'benjamin.peterson', 'ned.deily', 'mcepl', 'lukasz.langa', 'webknjaz', 'ammar2', 'miss-islington', 'tburke', 'xtreak']
    pr_nums = ['16321', '16448', '16460', '16461', '16462', '16475', '16476']
    priority = 'release blocker'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue38216'
    versions = ['Python 2.7', 'Python 3.5']

    @jaraco
    Copy link
    Member Author

    jaraco commented Sep 18, 2019

    The fix for bpo-30458 prevents any request line from transmitting non-ascii characters. In some cases, it's useful to transmit these illegal bytes in order to simulate a maliciously-crafted request (such as to ensure a web server responds correctly to such a request). This limitation is a regression from previous behavior that allowed transmission of such invalid requests.

    For example, consider this comment, where tests in CherryPy were disabled due to emergent failures against nightly builds.

    Originally, I reported this issue in bpo-36274, but I believe this issue is distinct and has a different timeline from that issue.

    In this comment, xtreak suggests that the proper solution might be not to transmit raw invalid characters but to actually rely on URL quoting to "transmit" those bytes. While that does seem to exercise CherryPy's error detection properly, I believe it still fails to exercise the receipt and handling of raw invalid bytes, but it does provide a possible satisfactory mitigation to this issue.

    @ned-deily
    Copy link
    Member

    Thanks for identifying this issue and breaking it out into a separate bpo, Jason. If I understand correctly, the problematic fix for bpo-30458 has already been released in maintenance release 3.7.4 and security release 3.6.9, is in the current security release candidate 3.5.8rc1, as well as 3.8.0b4, and, without further action, will be in 2.7.17rc1 and continue to be in 3.7.5rc1. In other words, this issue potentially affects all currently maintained Python branches and/or releases. (In addition, there appear to be still unresolved questions about the original bpo-30458 and the CVE's associated with it. But let's ignore those here. My brain hurts enough already.)

    The immediate question for me is what to do about 3.7.5. We could:

    1. hold 3.7.5rc1 for a mitigation fix
    2. release 3.7.5rc1 and accept a fix for 3.7.5final or for an unplanned 3.7.5rc2
    3. fix in 3.7.6
    4. do nothing other than possibly a doc change

    Since 3.5.8rc1 is already released for testing, a similar decision needs to be made for it.

    And 3.8.0rc1 and 2.7.17rc1 are schedulded for tagging om the coming weeks.

    Since the problem. as best I understand, is most likely to impact tests rather than legitimate user cases (is that correct?) and, since at least some projects and users of 3.7.4 impacted by the change have developed workarounds, and since 3.7.5rc1 is being delayed pending a resolution of this, I think the best options for 3.7.5 at this point are either 2 or 3 above. So, unless someone expresses a major objection in the next few hours, I am going to proceed with 3.7.5rc1 as is with the hope that we will have final resolution prior to 3.7.5 final.

    Decisions will still have to be made by the other RMs for their branches.

    @ned-deily ned-deily added 3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes release-blocker labels Sep 18, 2019
    @larryhastings
    Copy link
    Contributor

    FWIW I planned to tag and release 3.5.8 final early next week. I don't have the domain knowledge to assess the severity of this bug--much less pitch in and help fix it--so I suspect this will simply hold up 3.5.8 final.

    Depending on the complexity of the fix for this issue, I may also insert a second rc into the 3.5.8 schedule.

    @gpshead
    Copy link
    Member

    gpshead commented Sep 20, 2019

    What's needed here is a Decision. (release managers and steering councils make those)

    IMNSHO, this regression is intentional and does not feel like a bug.

    The Python HTTP APIs were never designed with an explicit intent to allow violations of the protocol. That it was "allowed" was an accident because the very old original API design didn't go out of its way to check much. That doesn't mean it was intended. Hyrum's Law applies as the (lack of) behavior has been around forever. So of course someone has written code that depended on that. But it doesn't make said code right.

    We either "be strict in what we produce" or we ignore abuse of the APIs and effectively close all CVEs filed against them as "not a bug, this library is intended to allow abuse when given untrusted input."

    If we take the latter route and intended to allow such things, we'd need to explicitly document it as such and guarantee their behaviors. As noted, disallowing it has already shipped in two stable releases.

    When writing tests of particular out of spec queries such that servers can implement good "be lenient in what you accept" behavior, it is better not to depend on specific behaviors of a given http client library to allow you to do so for such tests.

    If the decision requires code changes, do not expect me to spend time creating them. Please feel free to loop me in on PR reviews but I'd rather not review anything without release manager approval of the direction they take things in the first place.

    @ned-deily
    Copy link
    Member

    Thanks for your comments, Greg. Here's my take as release manager for 3.7 (and for 3.6). What bothers me here is that we apparently changed de facto behavior between maintenance releases, in the middle of 3.7's lifecycle, without warning, no doubt because we didn't realize it would break third-party packages. But it has and that's a big no-no. A very important part of our maintenance strategy is that we implicitly promise our users that they can easily upgrade from any older release in a release family to the most current release without fear of incompatibilities (e.g. 3.7.0 to 3.7.4). In return for that, we will only provide fixes for the most recent maintenance release in a family (e.g. once 3.7.5 is released, 3.7.4 is dead and 2.7.3 through 3.7.0 were already dead). So it seems here we have violated that compatibility promise for 3.7 and 3.6 and are about to do so for 3.5 and 2.7. Since at least one project is known to have been impacted, it's not unreasonable to expect that more will be. So I think we should avoid such breakage and undo the change in behavior for 3.7 (and the older releases as well).

    Now, as for 3.8, as it hasn't released yet, we have more latitude and, although we're close to producing an RC, it may still be OK to document the changed behavior there. That should be Łukasz's call.

    Does that make sense?

    @ammaraskar
    Copy link
    Member

    What bothers me here is that we apparently changed de facto behavior between maintenance releases, in the middle of 3.7's lifecycle, without warning, no doubt because we didn't realize it would break third-party packages.

    Arguably, I think the programs that are affected by this vulnerability far outnumber the amount of third-party packages that will be broken. The trade-off here seems to be between the promise of compatibility and the promise of security, choosing compatibility strikes me as odd.

    @tirkarthi
    Copy link
    Member

    For reference this is how urllib handled the same issue in their test suite : urllib3/urllib3#1673.

    golang also received similar regression report as the change landed in 1.11.6 and discussion : golang/go#30903

    @gpshead
    Copy link
    Member

    gpshead commented Sep 20, 2019

    All bug fixes are behavior changes. Any broken behavior can be relied upon by someone.

    So far the only ones who have popped up with this change as being a problem is one project's test suite where the behavior was used by a test because it was a convenient hack. Not by application code. Reverting mitigations due to the undesired behavior being convenient for someones protocol-abuse tests seems over cautious. Those can be updated to not use our client library for their abuse scenarios.

    @jaraco
    Copy link
    Member Author

    jaraco commented Sep 20, 2019

    Thanks for all the comments. I agree the current (secure by default) implementation is desirable. I also agree that such usage was never explicitly supported, so the "regression" here is perhaps over-stated. What I seek is to avoid the Go recommendation of "fork the implementation" when a lightweight hook could be provided as a means to achieve a reasonable, if unusual and discouraged, behavior. In particular, I'd like to support:

    Sending a single request with invalid bytes for the path.
    Allowing all requests for a client to support invalid bytes on the path.

    Ideally, the solution should allow sending other non-control bytes as well, supporting the use case reported in bpo-36274 also.

    I'll draft a patch. I'll try to get to it this weekend, but if I don't, don't feel like this needs to block releases.

    @tipabu
    Copy link
    Mannequin

    tipabu mannequin commented Sep 20, 2019

    Since at least one project is known to have been impacted, it's not unreasonable to expect that more will be.

    I can confirm at least one other: OpenStack Swift's stable jobs have been broken by bb8071a since 11 Sep; see https://bugs.launchpad.net/swift/+bug/1843816 . (Why we're testing against an untagged version of py27, I still need to investigate...)

    Now, the broken tests *do* represent something of a long-standing bug in application code: our front-end server was lenient in what it accepted but not strict in what it emitted -- it would forward on whatever crazy query params the client sent, directly to the back-end servers. Of course, as long as the back-end servers are *also* lenient in what they accept, it doesn't actually present a problem. Still, I'm happy to fix the issue.

    Actually *addressing* this has proven problematic, however, as we have numerous functional tests that send raw UTF-8 request paths -- to the point that I can only assume there exist clients sending such requests. Changing the tests to URL-encode the path would do a disservice to those clients, as my fix (or some future fix) may break them and we'd be none the wiser.

    We either "be strict in what we produce" or we ignore abuse of the APIs and effectively close all CVEs filed against them as "not a bug, this library is intended to allow abuse when given untrusted input."

    I think this is a false dichotomy; in https://bugs.python.org/issue36274#msg351834 Jason proposed a few alternatives that allow for a secure and obvious default API while adding a new, explicitly unsafe API.

    I'd like to add yet another option that may be useful specifically for maintenance releases: forbid only the problematic characters -- namely LF (and potentially CR and SP). This seems like a much more surgical fix for maintenance releases, allowing the null byte for CherryPy or the raw UTF-8 bytes for Swift, while still mitigating the CVE.

    3.8 and later can still have the more-robust check to ensure callers have done any requisite URL-escaping. Ideally, there would also be a reasonable hook in place to allow application developers to verify behavior in the presence of invalid requests; something short of writing raw bytes to a socket.

    For reference this is how urllib handled the same issue in their test suite : urllib3/urllib3#1673.

    I think the test suites developed by clients (like urllib3) and servers (like CherryPy or Swift) inherently have very different goals -- where a client would want to ensure that it *produces valid bytes* on the wire, a server ought to check that it doesn't completely fall over when it *receives invalid bytes* on the wire.

    What I seek is to avoid the Go recommendation of "fork the implementation" ...

    This is *exactly* the way we worked around https://bugs.python.org/issue36274 in Swift: swap out putrequest() in our func tests. (See openstack/swift@c0ae48b .) It looks like a similar change will be required to fix our stable branches.

    I agree, I'd much rather *not* have to do that. :-/

    @vstinner vstinner changed the title Fix for issue30458 prevents crafting invalid requests Fix for issue30458 (HTTP Header Injection) prevents crafting invalid requests Sep 20, 2019
    @gpshead
    Copy link
    Member

    gpshead commented Sep 20, 2019

    I think this is a false dichotomy; in https://bugs.python.org/issue36274#msg351834 Jason proposed a few alternatives that allow for a secure and obvious default API while adding a new, explicitly unsafe API.

    I'm not against that concept, but it is only appropriate for >= 3.9 as that'd be adding a feature. This issue is marked a release blocker to decide what to do for 3.5-3.7 (and maybe 3.8 if deemed a serious breaking change).

    I'd like to add yet another option that may be useful specifically for maintenance releases: forbid only the problematic characters -- namely LF (and potentially CR and SP). This seems like a much more surgical fix for maintenance releases, allowing the null byte for CherryPy or the raw UTF-8 bytes for Swift, while still mitigating the CVE.

    PRs with explicit tests for what is and isn't allowed welcome. Thankfully for the UTF-8 case, its multi-byte codepoint bytes will never contain LF, CR or SP.

    @tipabu
    Copy link
    Mannequin

    tipabu mannequin commented Sep 20, 2019

    Something like this for 3.7, say? I should probably go add some tests in test_httplib.py (for example, to demonstrate that http.client can still send a raw #, even if urllib appropriately drops the fragment), but I wanted some feedback on whether this is even an avenue worth pursuing.

    @jaraco
    Copy link
    Member Author

    jaraco commented Sep 21, 2019

    I'm not against that concept, but it is only appropriate for >= 3.9 as that'd be adding a feature. This issue is marked a release blocker to decide what to do for 3.5-3.7 (and maybe 3.8 if deemed a serious breaking change).

    The key part of the regression for 2.7 and 3.5+ was that it became impractical to retain the existing behavior of sending invalid bytes. My recommendation is to provide a mechanism to achieve compatibility with the older versions of Python. Although it's a "feature", it's also a feature necessitated by the security bugfix and to alleviate the regression. For this reason, I think the change should be applied to all Python versions that were patched for the security issue.

    @jaraco
    Copy link
    Member Author

    jaraco commented Sep 21, 2019

    I've added PR 16321 illustrating my proposed solution. This solution, while more invasive than Tim's more surgical solution, addresses the concerns brought about by this issue as well as those articulated originally in bpo-36274.

    I'm slightly inclined to suggest accepting this change for Python 3.8, and Tim's surgical patch for earlier versions except that doing so would not address the main concern of bpo-36274, which addresses a migration concern from Python 2.7 (a regression introduced in Python 3.0).

    @jaraco
    Copy link
    Member Author

    jaraco commented Sep 22, 2019

    In cherrypy/cherrypy#1807, I discovered that there is already a fairly straightforward means for a third-party package to override the putrequest character validation (just monkeypatch http.client._contains_disallowed_url_pchar_re with a less strict pattern; I used [\n]). This approach is barely worse than the proposal I made in the PR, the main differences being that a monkeypatch of that global variable is global (not selective to specific instances or subclasses of HTTPConnection) and doesn't have any tests in the CPython test suite to protect that as a supported mechanism.

    @jaraco
    Copy link
    Member Author

    jaraco commented Sep 22, 2019

    Also, with the CherryPy approach, the Python 2.7 story is more complicated. I haven't yet addressed that in the CherryPy 17 maintenance branch (which supports Python 2.7).

    @ned-deily
    Copy link
    Member

    I am certainly not a domain expert but, at a high level, I think the approach in PR 16321 is a reasonable compromise and I would support merging it to 3.7 and 3.6 (I'll let Larry and Benjamin speak for 3.5 and 2.7) assuming there are no review objections. Since we seem to be getting close to a comprehensive solution here, I am going to hold off on tagging 3.7.5r1 for a few more days in the hopes we can reach agreement on this approach.

    @ned-deily
    Copy link
    Member

    Also, besides the normal news entry (via blurb), there should probably be a "Notable changes in x.y.z" entry added to the end of each affected release's Doc/whatsnew/x.y.rst file.

    @tirkarthi
    Copy link
    Member

    If I understand PR 16321 correctly it has a private hook to bypass validating invalid bytes in URL added in 3.7.4 and also has the fix to accept non-ascii values which is a regression from 2.7 to 3.0 . Will the latter to accept non-ascii values also be merged to security branches too given that it predates the security issue addressed ?

    @ned-deily
    Copy link
    Member

    Will the latter to accept non-ascii values also be merged to security branches too given that it predates the security issue addressed ?

    At this point, I'm willing to allow it in 3.6 unless someone identifies a compelling reason not to.

    @gpshead
    Copy link
    Member

    gpshead commented Sep 27, 2019

    Regardless, since things have already shipped in stable releases, there is a release that code _will_ encounter somewhere that does validate data but does not support overruling that behavior. so i'm not sure if it actually matters to have this in 3.7 or 3.6 as anyone who wants to avoid complaints from users who happen to be on those python patch levels will need to include a workaround in their code or explicitly avoid those versions.

    That said, PR 16321 looks overall like a good idea, I have no problem with it being backported to 3.7 and 3.6.

    @jaraco
    Copy link
    Member Author

    jaraco commented Sep 28, 2019

    New changeset 7774d78 by Jason R. Coombs in branch 'master':
    bpo-38216, bpo-36274: Allow subclasses to separately override validation and encoding behavior (GH-16448)
    7774d78

    @jaraco
    Copy link
    Member Author

    jaraco commented Sep 28, 2019

    New changeset 80dd66a by Jason R. Coombs in branch '3.7':
    [3.7] bpo-38216, bpo-36274: Allow subclasses to separately override validation and encoding behavior (GH-16448) (GH-16461)
    80dd66a

    @miss-islington
    Copy link
    Contributor

    New changeset 8f478b4 by Miss Islington (bot) in branch '3.8':
    bpo-38216, bpo-36274: Allow subclasses to separately override validation and encoding behavior (GH-16448)
    8f478b4

    @ned-deily
    Copy link
    Member

    New changeset 5b18ce6 by Ned Deily (Jason R. Coombs) in branch '3.6':
    [3.6] bpo-38216, bpo-36274: Allow subclasses to separately override validation and encoding behavior (GH-16448) (GH-16462)
    5b18ce6

    @gpshead gpshead removed 3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes labels Sep 29, 2019
    @larryhastings
    Copy link
    Contributor

    So, following this recent flurry of activity, all that remains are to sort out 2.7 and 3.5. 3.5.8 is still in a holding pattern; at this point I think I'm going to insert another RC, so I can add the new version of expat.

    Will a makes-everyone-happy PR appear for 3.5 soon?

    @gpshead
    Copy link
    Member

    gpshead commented Sep 29, 2019

    I'd imagine an equivalent of the 3.6 PR would work for 3.5.

    Someone should look at how to do similar in 2.7 _if_ the project(s) that complained about the problem rely on such behavior in their last 2.7 compatible releases.

    @jaraco
    Copy link
    Member Author

    jaraco commented Sep 29, 2019

    Someone should look at how to do similar in 2.7 _if_ the project(s) that complained about the problem rely on such behavior in their last 2.7 compatible releases.

    Looking at the history, it seems that only two projects were mentioned, CherryPy and urllib3. For CherryPy, a workaround is in place such that the 2.7 maintenance branch does not encounter this issue (the offending test was removed), and the master branch only supports Python 3. Similarly, urllib3 has a workaround that bypasses transmission of these bytes entirely. Therefore, the urgency is reduced.

    I do feel a little uneasy recommending not to backport the bugfix to the versions where the change was introduced, as it leaves an opportunity for another project to encounter. I'll go ahead and prep a backport for Python 3.5 and 2.7, with the understanding that the 2.7 change may be rejected or deferred until requested.

    @benjaminp
    Copy link
    Contributor

    New changeset f5b1abb by Benjamin Peterson (Jason R. Coombs) in branch '2.7':
    [2.7] bpo-38216, bpo-36274: Allow subclasses to separately override validation and encoding behavior (GH-16476)
    f5b1abb

    @larryhastings
    Copy link
    Contributor

    New changeset 2784e78 by larryhastings (Jason R. Coombs) in branch '3.5':
    [3.5] bpo-38216, bpo-36274: Allow subclasses to separately override validation and encoding behavior (GH-16448) (bpo-16475)
    2784e78

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants