-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bz2.BZ2File doesn't support multiple streams #45966
Comments
The BZ2File class only supports one stream per file. It possible to have Once this done, this would add the ability to open a file for appending, I'll probably try to do this, but the fact it's done in C (unlike gzip) |
If you're referring to an 'append' mode for bz2file objects, it may be a It may be possible to implement r/w/a using the lower-level |
Like gzip, you can concatenate two bzip2 files: bzip2 -c /etc/passwd >/tmp/pass.bz2 bzip2 -c /etc/passwd >>/tmp/pass.bz2 bunzip2 will output both parts, generating two copies of the file. So nothing needs to be done on compression, but uncompression needs to |
The gzip module supports reopening an existing file to add another |
I've got a patch that fixes this. It allows BZ2File to read We originally wrote it against 2.5, but I've updated the patch to py3k |
sorry, the previous patch was from an old version. attaching the |
Some notes about posting patches:
I'll look at the patch itself another day, I don't have the time right |
Thanks for the reply. My company's legal dept. told me that we needed to put the boilerplate I'm reattaching just the patch to the bug now. I'll check with legal |
If the patch is substantial enough that legal boilerplate is even an |
I can remove the boilerplate from the code as long as I add the VMware, Inc. is providing this bz2 module patch to you under the terms |
As far as I can tell, the patch looks mostly good. As a sidenote, the bz2 module implementation seems to have changed quite |
Hrm...yeah, I should probably be setting it to closed as soon as I'll try and get a 2.7 patch done and uploaded in a day or two. |
A new patch will make it more likely that it will actually get applied :) Thanks for your work on this. |
Understandable. New patch attached. |
I'm not comfortable with the following change (which appears twice in - BZ2_bzReadClose(&bzerror, self->fp);
+ if (self->fp)
+ BZ2_bzReadClose(&bzerror, self->fp);
break;
case MODE_WRITE:
- BZ2_bzWriteClose(&bzerror, self->fp,
- 0, NULL, NULL);
+ if (self->fp)
+ BZ2_bzWriteClose(&bzerror, self->fp,
+ 0, NULL, NULL); If you need to test for the file pointer, perhaps there's a logic flaw |
That was mostly just out of paranoia, since the comments mentioned |
You don't need to, but on the other hand I forgot to ask you to update |
Picking this back up again. There's actually no docs changes necessary...the docs never mentioned that the module didn't support multiple logical streams, and I didn't see any other mentions in the docs that seemed to need updating. I supposed I could add something along the lines of "BZ2File supports multiple logical streams in a single compressed file", if that's what you/re looking for. Working on a patch for trunk as well. |
Dear all, first of all, thank you for the patch making multiple file-streams in bz2 available in python. Yesterday, I've tried to adapt the recent patch for python 3k to the actual python 2.7. Most of the hunks could be easy adapted by editing just one ore two things concerning the different file-layout between p3k and python 2.7. Unfortunatelly it wasn't possible for me to completly downgrade this patch to python 2.7. Especially the last hunk in the patch and also the hunks which are related to self->rawfp couldn't be adapted succesfully by me. Could anybody assist me to make this patch available for python 2.7 or does a patch for this python version already exist? If you like, I can upload my recent changes in this patch for further investigation. Thank you! best regards, |
Here is an update of the patch against current py3k. I've added a copyright mention at the top of Modules/bz2module.c which I hope manages to capture the essence of msg93721. Martin, what do you think? |
Thanks for the update.... Like I mentioned before in my previous comment, I'm still searching for a solution/patch for python 2.x able to handle multiple streams of bz2. Does anybody know a work-around or have a solution porting the p3k-patch to the good old python 2.x?! At the moment I try to use this patch with py3k and it seems to work. But I need this multistream option for pythone 2.x because the most of the time I deal with matplotlib....and matplotlib at the moment isn't able to deal with py3k....so, I would be very happy for any suggestion running multiple-streams with python 2.x ! ! Thank you very much, and best regards |
The patch here is totally out of date, following bpo-5863. |
Hi, I attach a patch to Python 3.3 Lib/bz2.py with updated tests: |
Thanks for the patch. I'll review it tomorrow. |
Wait, the tests seem wrong. I'll post an update later today. |
Right! I updated the patch and added a test for the aligned stream/buffer case. |
New changeset 8cebbc6473d9 by Nadeem Vawda in branch 'default': New changeset 0be55601f948 by Nadeem Vawda in branch 'default': |
Committed. Once again, thanks for the patch! |
I made a few comments and asked two questions on the review page. (I should have said so here.) |
I seem to be unable to log in to rietveld, so I'll reply here.
Good point. I hadn't thought about other implementations. Also, you're right about the superfluous comments in test_bz2; I'll do a
I wouldn't think so. It's not as though there is an index that the code That said, I wouldn't be opposed to adding a test for that sort of thing |
If you’re logged into Roundup, you should automatically be logged into our Rietveld instance. You can file a bug on the meta-tracker (link in the left sidebar) if this does not work. |
New changeset 48e837b2a327 by Nadeem Vawda in branch 'default': |
I thought this was the case, but it isn't working for me. I've filed a |
New changeset 3e5200abf8eb by Nadeem Vawda in branch 'default': |
I ported the bz2ms.patch to Python 2.7.2 and it works correctly within the bz2 module. But when you open a multistream (tar)bz2 with the tarfile module, even the tarfile uses the BZ2File() class, there exists unextracted missing files. I'll now try with the current tip. |
With the current tip everything works correctly. I think it's because of the complete rewrite of the bz2 module with python and the refactoring of _bz2.so. |
Attached patch is a revised version of bz2ms.patch against Python 2.7.2. The patch is tested using tarfile and bz2 modules. It also passes the included tests correctly. It also imports a missing class from BytesIO to fix the tests. It's up to you to take that into 2.7.x branch or not. |
We don’t add news features in stable releases. Nadeem has closed this bug as fixed for 3.3 and it can’t go in 2.7, so I think we’re done here. |
Ozan: Thanks for taking the time to backport the patch. Unfortunately, as |
This is all fine and well, but this is clearly a bug and not a feature. Can we please see this bug fix go into 2.7 at some point? |
+1. If we think this as a bug, python 2.x users will never be able to extract multiple-stream bz2 files. |
I mean "as a feature". |
No, it is not at all clear that this is a bug. I agree that this is a desirable
Incorrect. It is perfectly possible to extract a multi-stream bz2 file in 2.x - If there is really a large demand for these facilities in 2.x, I would be willing |
I think this support should be backported to Python 2.7 and 3.2. Current code can't decompress files generated by "pbzip2", fairly popular. I would consider that a bug, not a feature request. I am just recompressing a 77GB file because of this :-(. |
Sorry to hear that :(
Semantic issues aside, my concern here is that the patch for 2.7 is considerably An alternative solution I'd like to pursue is to backport 3.3's BZ2File How does that sound? |
Well, that was easier than I expected. It didn't take much work to get it |
Best regards |
Éric, bz2 module in Python is documented as able to manage bz2 files. But that is not true, since it is unable to manage popular bz2 files. That looks like a bug to me. In fact, you can create those files from python, files that python can not unpack later. Moreover, Python 2.7 is going to live for a long time. If the refusal of backporting this to 3.2 and 2.7 is firm, I would beg to document this limitation in the 2.7/3.2 docs. It is serious enough. Please, reconsider backporting this to 2.7, at least. |
As the bug/feature judgment is not easy to make, I think python-dev should be asked. |
Really? How so? BZ2File only started accepting the "a" mode in 3.3
Of course. I'll add a note to the docs once I've created the bz2file
Fair enough; I was actually going to suggest consulting the 2.7 release |
New changeset e73d549b7458 by Nadeem Vawda in branch '2.7': New changeset 190826ee0450 by Nadeem Vawda in branch '3.2': |
Just a thought: maybe the doc note should mention that bz2file is a backport of 3.3’s improved class, so that people know that 1) it’s well-supported code 2) a future Python version will remove the need for the external dependency. |
New changeset ad20324229f4 by Nadeem Vawda in branch '2.7': New changeset e4c4595033ad by Nadeem Vawda in branch '3.2': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: