# Description

Python builds in support for the gzip, bz2, and xz compression formats.  We would like to compare the amount of compression we obtain using each technique (at the default level for each).  For this exercise, you will want to utilize some of what you learned in lesson 1 of this course as well, for general file handling functionality.

As a test collection for files to compress in each manner, you should include:

* The Python executable on the executable path for this test system
* The bash executable on the executable path for this test system
* The file `token.asc` in the current directory of this exercise.

For each of these files, determine the size of each compressed format, after compressing to a file on disk.  Be sure to compress to files rather than entirely in-memory (sizes may vary slightly between those operations).  The results should be stored in a dictionary mapping the name of the file (but not full path) to the size, for both the original files and the compressed versions. 

The setup shows the correct answer on the system where this exercise was created. The test platform will have different executable versions with similar but not identical sizes.  The correct answer will simply be a dictionary with different sizes as values, but presumably you will write one or more functions to arrive at those numbers.  The test for this exercise takes about 30 seconds to run since it actually compresses the files at issue.

# Setup

In [1]:
import gzip, bz2, lzma

sizes = {
    'python': 15292625,
    'tmp-python.gz': 4915354,
    'tmp-python.bz2': 5070327,
    'tmp-python.xz': 3928031,
    'bash': 1201543,
    'tmp-bash.gz': 567177,
    'tmp-bash.bz2': 534225,
    'tmp-bash.xz': 480938,
    'token.asc': 13335,
    'tmp-token.asc.gz': 10122,
    'tmp-token.asc.bz2': 10201,
    'tmp-token.asc.xz': 10441
}

# Solution

In [2]:
from pathlib import Path
import shutil
import os
import pickle

pyfile = Path(shutil.which('python'))
bashfile = Path(shutil.which('bash'))
token = Path('token.asc')

def do_compress(files=[pyfile, bashfile, token]):
    for file in files:
        content = file.read_bytes()
        with gzip.open(f'tmp-{file.name}.gz', 'w') as f:
            f.write(content)
        with bz2.open(f'tmp-{file.name}.bz2', 'w') as f:
            f.write(content)
        with lzma.open(f'tmp-{file.name}.xz', 'w', format=lzma.FORMAT_XZ) as f:
            f.write(content)

def find_sizes(files=[pyfile, bashfile, token]):
    result = dict()
    for file in files:
        result[file.name] = file.stat().st_size
        for ext in ['gz', 'bz2', 'xz']:
            name = f'tmp-{file.name}.{ext}'
            result[name] = Path(name).stat().st_size
    return result

pickle.dump([do_compress, find_sizes], open('utils.pkl', 'wb'))
do_compress()
sizes = find_sizes()

# Test Cases

In [3]:
def test_sizes():
    import pickle
    do_compress, find_sizes = pickle.load(open('utils.pkl', 'rb'))
    do_compress()
    good_sizes = find_sizes()
    for fname in good_sizes:
        assert sizes[fname] == good_sizes[fname], \
            f"{fname} should be {good_sizes[fname]}"
    
test_sizes()