Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgi.FieldStorage memory usage can spike in line-oriented ops #41503

Closed
chrism mannequin opened this issue Jan 30, 2005 · 16 comments
Closed

cgi.FieldStorage memory usage can spike in line-oriented ops #41503

chrism mannequin opened this issue Jan 30, 2005 · 16 comments
Assignees
Labels
release-blocker stdlib Python modules in the Lib dir

Comments

@chrism
Copy link
Mannequin

chrism mannequin commented Jan 30, 2005

BPO 1112549
Nosy @gvanrossum, @birkenfeld
Files
  • FieldStorage-readline.patch: Fieldstorage readline patch.
  • FieldStorage-readline.patch: FieldStorage readline patch with proper boundary recognition (after Guido's comment)
  • test_cgi.py: Updated test_cgi.py
  • FieldStorage-readline-svn-50879.patch: FieldStorage readline patch against just-post-2.5-beta1 svn checkout
  • test_cgi-svn-50879.patch: Patch to test_cgi.py which exercises readline patch just-post-2.5b1
  • test_output_test_cgi-svn-50879.patch: Patch to tests/output/test_cgi to make regrtest.py pass when two below patches are applied.
  • cgi-module-fixes-bundle-svn-50879.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gvanrossum'
    closed_at = <Date 2006-08-10.18:12:14.000>
    created_at = <Date 2005-01-30.13:40:48.000>
    labels = ['library', 'release-blocker']
    title = 'cgi.FieldStorage memory usage can spike in line-oriented ops'
    updated_at = <Date 2006-08-10.18:12:14.000>
    user = 'https://bugs.python.org/chrism'

    bugs.python.org fields:

    activity = <Date 2006-08-10.18:12:14.000>
    actor = 'nnorwitz'
    assignee = 'gvanrossum'
    closed = True
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2005-01-30.13:40:48.000>
    creator = 'chrism'
    dependencies = []
    files = ['1570', '1571', '1572', '1573', '1574', '1575', '1576']
    hgrepos = []
    issue_num = 1112549
    keywords = []
    message_count = 16.0
    messages = ['24079', '24080', '24081', '24082', '24083', '24084', '24085', '24086', '24087', '24088', '24089', '24090', '24091', '24092', '24093', '24094']
    nosy_count = 4.0
    nosy_names = ['gvanrossum', 'nnorwitz', 'georg.brandl', 'chrism']
    pr_nums = []
    priority = 'release blocker'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue1112549'
    versions = ['Python 2.3']

    @chrism
    Copy link
    Mannequin Author

    chrism mannequin commented Jan 30, 2005

    Various parts of cgi.FieldStorage call its
    "read_lines_to_outerboundary", "read_lines" and
    "skip_lines" methods. These methods use the
    "readline" method of the file object that represents an
    input stream. The input stream is typically data
    supplied by an untrusted source (such as a user
    uploading a file from a web browser). The input data
    is not required by the RFC 822/1521/1522/1867
    specifications to contain any newline characters. For
    example, it is within the bounds of the specification
    to supply a a multipart/form-data input stream with a
    "file-data" part that consists of a 2GB string composed
    entirely of "x" characters (which happens to be
    something I did that led me to noticing this bug).

    The simplest fix is to make use of the "size" argument
    of the readline method of the file object where it is
    used within all parts of FieldStorage that make use of
    it. A patch against the Python 2.3.4 cgi.py module
    that does this is attached.

    @chrism chrism mannequin closed this as completed Jan 30, 2005
    @chrism chrism mannequin added the release-blocker label Jan 30, 2005
    @chrism chrism mannequin assigned gvanrossum Jan 30, 2005
    @chrism chrism mannequin added the stdlib Python modules in the Lib dir label Jan 30, 2005
    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    Methinks that the fix isn't quite right: it would
    incorrectly recognize as a boundary a very long line
    starting with "--" followed by the appropriate random string
    at offset 2**16. This could probably be taken care of by
    adding a flag that is true initially and after that keeps
    track of whether the previous line ended in \n.

    Also, there's a call to fp.readline() in parse_multipart()
    that you didn't patch -- it wouldn't help because that code
    is saving the lines in a list anyway, but isn't that code
    vulnerable as well? Or is it not used?

    @chrism
    Copy link
    Mannequin Author

    chrism mannequin commented Apr 1, 2005

    Logged In: YES
    user_id=32974

    Re: parse_multipart.. yes, it looks like there's no use
    fixing that as it just turns around and puts the line into a
    list.. it is vulnerable but just by virtue of its non-use of
    a tempfile, it appears doomed anyway for large requests. I
    don't know of anything that uses it.

    Good catch wrt boundary recognition bug, I'm uploading
    another patch.

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    Can I tweak you into uploading a unit test?

    @chrism
    Copy link
    Mannequin Author

    chrism mannequin commented Apr 3, 2005

    Logged In: YES
    user_id=32974

    An updated test_cgi.py is attached. I test both the
    readline behavior and add a test for basic multipart parsing.

    @chrism
    Copy link
    Mannequin Author

    chrism mannequin commented Apr 3, 2005

    Logged In: YES
    user_id=32974

    FYI, I'd be happy to do the merging here if you wanted to
    give me checkin access.

    @chrism
    Copy link
    Mannequin Author

    chrism mannequin commented Jul 27, 2006

    Logged In: YES
    user_id=32974

    The files I've just uploaded are revisions to the cgi and test_cgi modules for the
    current state of the SVN trunk. If someone could apply these, it would be
    appreciated, or give me access and I'll be happy to.

    FTR, this is a bug which exposes systems which use the cgi.FieldStorage class
    (most Python web frameworks do) to a denial of service potential.

    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Aug 7, 2006

    Logged In: YES
    user_id=33168

    Doesn't this require a change to test/output/test_cgi or
    something like that? Does this test pass when run under
    regrtest.py ?

    The verify(x == y) should be vereq(x, y).
    type(a) != type(0), should be not isinstance(a, int).

    The last chunk of the patch in cgi.py should be:
    last_line_lfend = line.endswith('\n')

    rather than the 4 lines of if/else.

    I don't know if this patch really addresses the problem or
    creates other problems. However, if someone more
    knowledgable is confident about this patch, I'm fine with
    this going in. At this point, it might be better to wait
    for 2.5.1 though.

    @chrism
    Copy link
    Mannequin Author

    chrism mannequin commented Aug 7, 2006

    Logged In: YES
    user_id=32974

    Yup, test/output/test_cgi did need fixing. Apologies, I did not understand
    the test regime. A new patch file named test_output_test_cgi-
    svn-50879.patch has been uploaded with the required change. regrtest.py
    now passes.

    As far as verify vs. vereq, the test_cgi module uses verify all over the place.
    I'm apt to not change all of those places, so to change it in just that one place
    is probably ineffective. Same for type comparison vs. isinstance. I'm trying to
    make the smallest change possible as opposed to refactoring the test
    module.

    I've uploaded a patch which contains a) the fix to cgi.py, b) the fix to
    test_cgi.py, and the fix to output/test_cgi.

    The stylistic change wrt to last_line_lfend is fine with me, but I'll leave that
    jugment to someone else.

    I'm not sure how to ensure the fix doesn't create other problems other than
    what has already been done by proving it still passes the tests it was
    subjected to in the "old" test suite and adds new tests that prove it no longer
    has the denial of service problem.

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    +1.

    minor nits:

    in the main patch: instead of

    + if line.endswith('\n'):
    + last_line_lfend = True
    + else:
    + last_line_lfend = False

    you can just use

      last_line_lfend = line.endswith('\n')

    in the unit test: instead of

      if type(a) != type(0):

    use

      if not isinstance(a, int):

    so that if some future release changes file.closed to return
    a bool (as it should :-) this test won't break.

    Is tehre a reason why you're not patching the fp.readline()
    call in parse_multipart()? It would seem to have the same
    issue (even if it isn't used in Zope :-).

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    BTW it would be better if all patches were in a single file
    -- then you can delete the older patches (if SF lets you do
    that).

    @chrism
    Copy link
    Mannequin Author

    chrism mannequin commented Aug 10, 2006

    Logged In: YES
    user_id=32974

    wrt parse_multipart: this function just turns around and puts the output from
    readline() into a list, as opposed to FieldStorage, which writes its chunked
    output to a tempfile, so just adding an argument to readline within it would
    not be useful. We'd only be able to fix the memory consumption issue in
    parse_multipart if we were to make it also use a tempfile, but it's not clear
    that *not* writing a tempfile isn't its expected behavior as the docs for
    parse_multipart state:

    Returns a dictionary just like parse_qs() keys are the field names, each
    value is a list of values for that field. This is easy to use but not much
    good if you are expecting megabytes to be uploaded -- in that case, use
    the FieldStorage class instead which is much more flexible.

    Is it OK to write a tempfile in this function (e.g. does that make it not useful
    on stuff like embedded systems)?

    If not, maybe we should just deprecate parse_multipart? I do find things that
    use it if I google hard enough but there are only 187 hits when I google for
    "cgi.parse_multipart" and 53,600 hits when I google for "cgi.FieldStorage".

    I'm uploading another file with your style change suggestions. It bundles all
    fixes to cgi.py, test_cgi.py, and test_cgi and includes the style changes, but
    does nothing about parse_multipart.

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    OK, let's forget about parse_multipart(). Feel free to add a
    doc patch and a comment that deprecate it (actual warnings
    are not a good idea IMO).

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    Checked in as r51190 (Chris's patch plus a warning added to
    an XXX comment for parse_multipart()) and r51191 (Misc/NEWS).

    @birkenfeld
    Copy link
    Member

    Logged In: YES
    user_id=849994

    Is this a backportable fix?

    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Aug 10, 2006

    Logged In: YES
    user_id=33168

    Yes, I believe so.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    release-blocker stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants