# 파일 압축

## `zlib`
데이터를 압축/해제할 때 사용하는 모듈
- `compress()` `decompress()` 함수로 문자열을 압축 및 해제
- **데이터 크기를 줄여서** 전송해야 할 때 사용

In [1]:
import zlib

In [2]:
# 대용량 문자열 데이터 (350,000 byte)
data = "Life is too short, You need Python." * 10000

In [3]:
len(data)

350000

In [6]:
# zlib 압축
## 유니코드로 인코딩 후 압축해야 함
compressed = zlib.compress(data.encode(encoding='utf-8'))
compressed

b'x\x9c\xed\xca\xa1\r\x800\x10\x00\xc0U~\x00\xc2$\x15X<%\xd4\xf4\x13Z\x04\xdb3\x06\xe6N_ig\x8d6bf\xc6\xb8\xf2\x9eK\xec\xf9D\xaf\xf5\x88\xed\x9dW\xf6\xb5(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\x8a\xa2(\

In [7]:
len(compressed)

1077

In [8]:
# 압축률
print(f'zlib 압축률 : {round(len(data) / len(compressed), 2)}')

zlib 압축률 : 324.98


In [9]:
# zlib 압축 해제
decompressed = zlib.decompress(compressed).decode('utf-8')
len(decompressed)

350000

## `gzip`
**파일**을 압축/해제할 때 사용하는 모듈<br>
내부적으로 `zlib` 알고리즘 사용

In [10]:
import gzip

In [11]:
# 위의 문자열 data를 파일로 원본 데이터 저장
with open('origin.txt', 'w') as f:
    f.write(data)

In [12]:
# gzip 압축
with gzip.open('origin.txt.gz', 'wb') as f:
    f.write(data.encode('utf-8'))

In [13]:
# gzip 압축 해제
with gzip.open('origin.txt.gz', 'rb') as f:
    org_data = f.read().decode('utf-8')

In [14]:
len(org_data)

350000

## `zipfile`
여러 개의 파일을 **함께** .zip 확장자로 압축할 때 사용하는 모듈

In [15]:
import zipfile

In [16]:
# 여러 파일을 한번에 압축
with zipfile.ZipFile('./sample/test.zip', 'w') as myzip:
    myzip.write('./sample/test1.txt')
    myzip.write('./sample/test2.txt')
    myzip.write('./sample/test3.txt')

In [18]:
# 압축 해제
with zipfile.ZipFile('./sample/test.zip') as myzip:
    myzip.extractall()

## `tarfile`
여러 개의 파일을 .tar 확장자로 압축할 때 사용하는 모듈

In [19]:
import tarfile

In [20]:
# 여러 파일을 한번에 압축
with tarfile.open('./sample/test.tar', 'w') as mytar:
    mytar.add('./sample/test1.txt')
    mytar.add('./sample/test2.txt')
    mytar.add('./sample/test3.txt')

In [21]:
# 압축 해제
with tarfile.open('./sample/test.tar') as mytar:
    mytar.extractall()