<a href="https://colab.research.google.com/github/pbeens/Zip-File-Tutorial/blob/main/Zip_File_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This Colab Notebook and the support files can be found at:

https://github.com/pbeens/Zip-File-Tutorial

## Download the files we'll play with

In [1]:
!pip -q install wget # need to install wget

  Building wheel for wget (setup.py) ... [?25l[?25hdone


In [2]:
import wget

files = []

file_urls = ['https://raw.githubusercontent.com/pbeens/Zip-File-Tutorial/main/files/Canada-Populations-by-Province-eng.csv',
             'https://github.com/pbeens/Zip-File-Tutorial/raw/main/files/Canada-Populations-by-Province-eng.xlsx',
             'https://github.com/pbeens/Zip-File-Tutorial/raw/main/files/Lorem-Ipsum.docx',
             'https://raw.githubusercontent.com/pbeens/Zip-File-Tutorial/main/files/Lorem-Ipsum.rtf',
             'https://raw.githubusercontent.com/pbeens/Zip-File-Tutorial/main/files/Lorem-Ipsum.txt',
             'https://github.com/pbeens/Zip-File-Tutorial/raw/main/files/snow-scene.jpg']

for url in file_urls:
    wget.download(url)

print('Files downloaded.')


Files downloaded.


## Declare the file variables (as a list)

In [3]:
files = []

for url in file_urls:
    filename = url.split('/')[-1] # filename is last part of URL
    files.append(filename) # add filename to files list

print(files)

['Canada-Populations-by-Province-eng.csv', 'Canada-Populations-by-Province-eng.xlsx', 'Lorem-Ipsum.docx', 'Lorem-Ipsum.rtf', 'Lorem-Ipsum.txt', 'snow-scene.jpg']


## Let's zip our files

In [4]:
from zipfile import ZipFile

zip_filename = 'files.zip'

with ZipFile(zip_filename, mode="w") as archive:
    for file in files:
        archive.write(file)

## What about file compression?

If we look at the filesizes of our files and the size of the zip file you'll see we don't have any compression (yet). In fact, the zip file is bigger than the total size of the files, due to overhead.

###Checking filesizes with no compression

In [5]:
import os # needed for getsize()

zip_filename = 'files.zip'

def get_and_print_filesizes():
    total_filesize = 0
    for file in files:
        total_filesize += os.path.getsize(file)

    size_of_zip_file = os.path.getsize(zip_filename)

    print(f'Size of all files: {total_filesize}')
    print(f'Size of zip file: {size_of_zip_file}')
    print(f'Compression amount: {(1-size_of_zip_file/total_filesize)*100:.1f}%')

get_and_print_filesizes()

Size of all files: 215985
Size of zip file: 216737
Compression amount: -0.3%


###With file compression

To compress the files, we need to program in the compression method and the compression level we want to use. 

For this tutorial, we will use the ZIP_DEFLATED method of compression. 

Compression levels are from 0 to 9, with 0 being no compression and 9 being the highest. Note that the higher the compression level the longer it will take to compress and decompress the files.

In [6]:
from zipfile import ZipFile, ZIP_DEFLATED

zip_filename = 'files.zip'

with ZipFile(zip_filename, "w", ZIP_DEFLATED, compresslevel=9) as archive:
    for file in files:
        archive.write(file)

get_and_print_filesizes()

Size of all files: 215985
Size of zip file: 159083
Compression amount: 26.3%
