Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add support for encrypted archives #51

Closed
fdellwing opened this issue Jan 10, 2020 · 15 comments
Closed

Feature: Add support for encrypted archives #51

fdellwing opened this issue Jan 10, 2020 · 15 comments
Assignees
Labels
enhancement New feature or request for archiving Issue on archiving, compression or encryption for extraction Issue on extraction, decompression or decryption help wanted Extra attention is needed no-issue-activity

Comments

@fdellwing
Copy link

fdellwing commented Jan 10, 2020

Is your feature request related to a problem? Please describe.
n/a

Describe the solution you'd like
Fully support compressing and decompressing with password.

Describe alternatives you've considered
n/a

Additional context
n/a

@miurahr miurahr added enhancement New feature or request help wanted Extra attention is needed labels Jan 10, 2020
@miurahr miurahr added this to the v0.7 encryption milestone Jan 25, 2020
@miurahr
Copy link
Owner

miurahr commented Jan 25, 2020

Now trial to implement decryption on topic branch https://github.com/miurahr/py7zr/tree/topic-decrypt-aes
Here is a big problem about AES encryption/decryption library on python.

The branch uses pycrypto but it was released in 2013 and no update now. That is because, installation is failed on Windows 10/MSVC 2015/Pytohn 3.6.

@fdellwing do you know any library which support AES and SHA256 on modern python and OSes?

@miurahr
Copy link
Owner

miurahr commented Jan 25, 2020

https://nitratine.net/blog/post/python-encryption-and-decryption-with-pycryptodome/ may be an option.

@miurahr miurahr added for archiving Issue on archiving, compression or encryption for extraction Issue on extraction, decompression or decryption labels Jan 28, 2020
@andormarkus
Copy link

@miurahr

I have tested the module for large .csv file and it can not decompress it. I got the following error

ValueError: Data must be padded to 16 byte boundary in CBC mode

How to reproduce the error:

  1. Get some large (few gb) csv file and compress it to 7z.
    My sampe file: http://sdm.lbl.gov/fastbit/data/star2002-full.csv.gz
    I have decompressed and recompressed with the following code:
    7z a large_w_pwd.7z star2002-full.csv -p1234

  2. Try to uncompress with py7zr
    import py7zr
    archive = py7zr.SevenZipFile('large_w_pwd.7z', mode='r', password='1234')
    print(archive.getnames())
    archive.extractall()

Full error log:

Traceback (most recent call last):
File "unzip.py", line 19, in
archive.extractall()
File "/home/amarkus/.pyenv/versions/3.7.3/lib/python3.7/site-packages/py7zr/py7zr.py", line 766, in extractall
self.worker.extract(self.fp, multithread=multi_thread)
File "/home/amarkus/.pyenv/versions/3.7.3/lib/python3.7/site-packages/py7zr/compression.py", line 288, in extract
self.extract_single(fp, self.files, self.src_start)
File "/home/amarkus/.pyenv/versions/3.7.3/lib/python3.7/site-packages/py7zr/compression.py", line 300, in extract_single
self.decompress(fp, f.folder, fileish, f.uncompressed[-1], f.compressed)
File "/home/amarkus/.pyenv/versions/3.7.3/lib/python3.7/site-packages/py7zr/compression.py", line 316, in decompress
tmp = decompressor.decompress(inp, max_length)
File "/home/amarkus/.pyenv/versions/3.7.3/lib/python3.7/site-packages/py7zr/compression.py", line 450, in decompress
folder_data = self.decompressor.decompress(data, max_length=max_length)
File "/home/amarkus/.pyenv/versions/3.7.3/lib/python3.7/site-packages/py7zr/compression.py", line 247, in decompress
temp = self.cipher.decrypt(data)
File "/home/amarkus/.pyenv/versions/3.7.3/lib/python3.7/site-packages/Crypto/Cipher/_mode_cbc.py", line 246, in decrypt
raise ValueError("Data must be padded to %d byte boundary in CBC mode" % self.block_size)
ValueError: Data must be padded to 16 byte boundary in CBC mode

@miurahr
Copy link
Owner

miurahr commented Jan 31, 2020

AES CBC mode is a block cipher which use 16byte block.
Current implementation does not handle this. https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation

@andormarkus
Copy link

Please can you modify decompress method in the AESDecompressor class so I can decompress large file with this package? It would be life changing for me.

I tried to pad the data before decryption and unpad after it. Did not work.
def decompress(self, data: bytes, max_length: Optional[int] = None) -> bytes:
padded_data = pad(data, AES.block_size)
decrypted_data = self.cipher.decrypt(padded_data)
unpadded_data = unpad(decrypted_data, AES.block_size)
temp = unpadded_data
return self.lzma_decompressor.decompress(temp, max_length)

File "/home/amarkus/.pyenv/versions/3.7.3/lib/python3.7/site-packages/py7zr/compression.py", line 249, in decompress
unpadded_data = unpad(decrypted_data, AES.block_size)
File "/home/amarkus/.pyenv/versions/3.7.3/lib/python3.7/site-packages/Crypto/Util/Padding.py", line 90, in unpad
raise ValueError("Padding is incorrect.")
ValueError: Padding is incorrect.

@miurahr
Copy link
Owner

miurahr commented Feb 1, 2020

Could you provide me a test data with minimal size(-kb) to reproduce the issue?

miurahr added a commit that referenced this issue Feb 1, 2020
Signed-off-by: Hiroshi Miura <miurahr@linux.com>
@miurahr
Copy link
Owner

miurahr commented Feb 1, 2020

I could reproduce it and add test case.

@andormarkus
Copy link

Hi

I have tested with a 1mb .txt which is 300kb .7z and I got the following error:

Traceback (most recent call last):
File "test.py", line 4, in
archive.test()
File "/home/amarkus/.local/lib/python3.7/site-packages/py7zr/py7zr.py", line 688, in test
return self._test_digests()
File "/home/amarkus/.local/lib/python3.7/site-packages/py7zr/py7zr.py", line 528, in _test_digests
if self._test_unpack_digest():
File "/home/amarkus/.local/lib/python3.7/site-packages/py7zr/py7zr.py", line 520, in _test_unpack_digest
self.worker.extract(self.fp) # TODO: print progress
File "/home/amarkus/.local/lib/python3.7/site-packages/py7zr/compression.py", line 285, in extract
self.extract_single(fp, self.files, self.src_start)
File "/home/amarkus/.local/lib/python3.7/site-packages/py7zr/compression.py", line 297, in extract_single
self.decompress(fp, f.folder, fileish, f.uncompressed[-1], f.compressed)
File "/home/amarkus/.local/lib/python3.7/site-packages/py7zr/compression.py", line 317, in decompress
tmp = decompressor.decompress(b'', max_length)
File "/home/amarkus/.local/lib/python3.7/site-packages/py7zr/compression.py", line 447, in decompress
folder_data = self.decompressor.decompress(data, max_length=max_length)
File "/home/amarkus/.local/lib/python3.7/site-packages/py7zr/compression.py", line 226, in decompress
return self.lzma_decompressor.decompress(b'', max_length)
_lzma.LZMAError: Corrupt input data

Sample raw file:
10000_Sales_Records.txt
source: http://eforexcel.com/wp/wp-content/uploads/2017/07/10000-Sales-Records.zip

How to reproduce:

  1. 7z the attached raw file
    7z a test_1mb_raw.7z '10000 Sales Records.txt' -p1234

  2. Run the following code
    import py7zr
    archive = py7zr.SevenZipFile('test_1mb_raw.7z', mode='r', password='1234')
    print(archive.getnames())
    archive.test()

Thanks,
Andor

@andormarkus
Copy link

Hi

I have tested the latest commit with a 1.4gb .7z (10gb .csv) and I could process it.
I will let you know once I find an another issue.

Thanks,
Andor

@andormarkus
Copy link

Hi

I have tested the py7zr-0.6a1 release on AWS Glue. I could import the package without any problem and I could process the password protected .7z.

Can I do more testing for you help further improve this package?

Thanks,
Andor

@github-actions
Copy link

github-actions bot commented Mar 5, 2020

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

@miurahr
Copy link
Owner

miurahr commented Mar 5, 2020

Fully support compressing and decompressing with password.

Decompressing is on schedule to release as v0.6 but compressing is future plan.
It is why the issue is still opened.

@miurahr miurahr reopened this May 9, 2020
@miurahr
Copy link
Owner

miurahr commented May 31, 2020

I'd like to reopen when starting an effort to implement encryption.

@miurahr miurahr self-assigned this May 31, 2020
@miurahr miurahr reopened this Jun 4, 2020
@miurahr
Copy link
Owner

miurahr commented Jun 4, 2020

#142

@miurahr
Copy link
Owner

miurahr commented Jun 13, 2020

Feature merged.

@miurahr miurahr closed this as completed Jun 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request for archiving Issue on archiving, compression or encryption for extraction Issue on extraction, decompression or decryption help wanted Extra attention is needed no-issue-activity
Projects
None yet
Development

No branches or pull requests

3 participants