Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bz2.BZ2DEcompressor.decompress fail on large files #58606

Closed
LaurentGautier mannequin opened this issue Mar 24, 2012 · 18 comments
Closed

bz2.BZ2DEcompressor.decompress fail on large files #58606

LaurentGautier mannequin opened this issue Mar 24, 2012 · 18 comments
Labels
extension-modules C modules in the Modules dir type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@LaurentGautier
Copy link
Mannequin

LaurentGautier mannequin commented Mar 24, 2012

BPO 14398
Nosy @loewis, @birkenfeld, @benjaminp, @serhiy-storchaka
Files
  • testbz2.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2013-04-21.22:30:05.084>
    created_at = <Date 2012-03-24.16:15:19.174>
    labels = ['extension-modules', 'type-crash']
    title = 'bz2.BZ2DEcompressor.decompress fail on large files'
    updated_at = <Date 2013-04-21.22:30:05.083>
    user = 'https://bugs.python.org/LaurentGautier'

    bugs.python.org fields:

    activity = <Date 2013-04-21.22:30:05.083>
    actor = 'nadeem.vawda'
    assignee = 'nadeem.vawda'
    closed = True
    closed_date = <Date 2013-04-21.22:30:05.084>
    closer = 'nadeem.vawda'
    components = ['Extension Modules']
    creation = <Date 2012-03-24.16:15:19.174>
    creator = 'Laurent.Gautier'
    dependencies = []
    files = ['25015']
    hgrepos = []
    issue_num = 14398
    keywords = []
    message_count = 18.0
    messages = ['156698', '156701', '156705', '156709', '156710', '156711', '156713', '156714', '156715', '156717', '173471', '173479', '173481', '173483', '173484', '187083', '187298', '187533']
    nosy_count = 7.0
    nosy_names = ['loewis', 'georg.brandl', 'nadeem.vawda', 'benjamin.peterson', 'Laurent.Gautier', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue14398'
    versions = ['Python 2.7', 'Python 3.2', 'Python 3.3', 'Python 3.4']

    @LaurentGautier
    Copy link
    Mannequin Author

    LaurentGautier mannequin commented Mar 24, 2012

    The call ends with:
    Objects/stringobject.c:3884: bad argument to internal function

    sys.version:
    '2.7.2 (default, Jun 13 2011, 15:14:50) \n[GCC 4.4.5]'
    (on 64bit Linux)

    @LaurentGautier LaurentGautier mannequin added the type-crash A hard crash of the interpreter, possibly with a core dump label Mar 24, 2012
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Mar 24, 2012

    I can't reproduce this. Can you please provide a test script along with input data that allows us to reproduce this error?

    @LaurentGautier
    Copy link
    Mannequin Author

    LaurentGautier mannequin commented Mar 24, 2012

    Wow! Quick follow-up.

    The data file is about 1.6Gb. Is there a preferred way to pass it on (I suspect that the bug tracker is not the preferred way).

    The code goes like:

    import bz2
    f = file("foobar.bz2", mode="rb")
    src_buf = f.read()
    decomp = bz2.BZ2Decompressor()
    tmp = decomp.decompress(src_buf)

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Mar 24, 2012

    I have been able to reproduce it; see attached script. It happens for
    inputs of 2GB (decompressed), but not for ones of 1GB.

    It seems that bz2module.c doesn't guard against 32-bit overflows when
    handling the size of the decompressed data. This affects both the
    BZ2Decompressor object's decompress() method, and the module-level
    decompress() function. All python versions prior to 3.3 are affected.

    @nadeemvawda nadeemvawda mannequin added the extension-modules C modules in the Modules dir label Mar 24, 2012
    @nadeemvawda nadeemvawda mannequin self-assigned this Mar 24, 2012
    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Mar 24, 2012

    (the contents of the input file don't matter; I just pulled out a
    bunch of zeros from /dev/zero and compressed them with bzip2.)

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Mar 24, 2012

    This should be fixed for 2.7.3. I'll have a patch ready in the next day
    or two.

    @nadeemvawda nadeemvawda mannequin added the release-blocker label Mar 24, 2012
    @benjaminp
    Copy link
    Contributor

    This isn't a regression, is it? If it's not, I don't think it's essential to get into 2.7.3.

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Mar 24, 2012

    No, it's been around since at least 2.6. I wasn't really sure what the
    protocol was for bugs found during the RC process. It'd be nice to get
    a fix for this into 2.7.3 (and 3.2.3), but it's not urgent.

    @nadeemvawda nadeemvawda mannequin removed the release-blocker label Mar 24, 2012
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Mar 24, 2012

    Nadeem: the final release candidate of 2.7.3 was already made. Any further change would require another release candidate, which in turn would delay the release further. This has to wait for 2.7.4.

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Mar 24, 2012

    That's fine by me, then. Sorry for the confusion.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 21, 2012

    New changeset ebb8c7d79f52 by Nadeem Vawda in branch '3.2':
    Issue bpo-14398: Fix size truncation and overflow bugs in bz2 module.
    http://hg.python.org/cpython/rev/ebb8c7d79f52

    New changeset 25fdf297c077 by Nadeem Vawda in branch '3.3':
    Merge bpo-14398: Fix size truncation and overflow bugs in bz2 module.
    http://hg.python.org/cpython/rev/25fdf297c077

    New changeset d6bf506ea13f by Nadeem Vawda in branch 'default':
    Merge bpo-14398: Fix size truncation and overflow bugs in bz2 module.
    http://hg.python.org/cpython/rev/d6bf506ea13f

    @serhiy-storchaka
    Copy link
    Member

    What about 2.7?

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Oct 21, 2012

    I'm working on it now. Will push in the next 15 minutes or so.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 21, 2012

    New changeset f03a335621ce by Nadeem Vawda in branch '2.7':
    Issue bpo-14398: Fix size truncation and overflow bugs in bz2 module.
    http://hg.python.org/cpython/rev/f03a335621ce

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Oct 21, 2012

    All fixed, along with some other similar but harder-to-trigger bugs.

    Thanks for the bug report, Laurent!

    @nadeemvawda nadeemvawda mannequin closed this as completed Oct 21, 2012
    @benjaminp
    Copy link
    Contributor

    Why does only 2.7 have tests?

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Apr 18, 2013

    An oversight on my part, I think. I'll add tests for 3.x this weekend.

    @nadeemvawda nadeemvawda mannequin reopened this Apr 18, 2013
    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Apr 21, 2013

    Hmm, so actually most of the bugs fixed in 2.7 and 3.2 weren't present
    in 3.3 and 3.4, and those versions already had tests equivalent to the
    tests I added for 2.7/3.2.

    As for the changes that I did make to 3.3/3.4:

    • two of the three cover cases that only occur if the output data is
      larger than ~32GiB. Even if we have a buildbot with enough memory for
      it (which I don't think we do), actually running such tests would take
      forever and then some.

    • the third is for a condition that's actually pretty much impossible to
      trigger - grow_buffer() has to be called on a buffer that is already at
      least 8*((size_t)-1)/9 bytes long. On a 64-bit system this is
      astronomically large, while on a 32-bit system the OS will probably
      have reserved more than 1/9th of the virtual address space for itself,
      so it won't be possible to allocate a large enough buffer.

    @nadeemvawda nadeemvawda mannequin closed this as completed Apr 21, 2013
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants