# Analyzing the filesystem footprint of Python 3 on Fedora

In [1]:
import collections
import pathlib

import humanize
import tabulate

from IPython.display import HTML, display

The list of Python 3 packages, `python-unversioned-command` is ommited here, as we will query Python 3.8 on Fedora 31.

In [2]:
python_pkgs = [
    'python3',
    'python3-libs',
    'python3-tkinter',
    'python3-idle',
    'python3-test',
    'python3-devel',
]

What version of Python 3.8 is installed?

In [3]:
!rpm -q python38

python38-3.8.1-1.fc31.x86_64


We do care about all files in the `python38` package. The differences between `python38` on Fedora 31 and `python3` subpackages on Fedora 32 are not relevant in this context. The standard library is the same.

In [4]:
python38_files = !rpm -ql python38
python38_files = [f for f in python38_files if not f.startswith('/usr/lib/.build-id')]  # avoid noise
python38_files[:5]

['/usr/bin/idle3.8',
 '/usr/bin/msgfmt3.8.py',
 '/usr/bin/pydoc3.8',
 '/usr/bin/pygettext3.8.py',
 '/usr/bin/python3.8']

For each Fedora 32 `python3` subpackage, we get the relevant files installed from `python38` on Fedora 31:

In [5]:
pkg_files = {}
for pkg in python_pkgs:
    pkg_files[pkg] = !repoquery --repo=rawhide -l {pkg} 2>/dev/null
    pkg_files[pkg] = [f for f in pkg_files[pkg] if f in python38_files]
pkg_files['python3-libs'][:5]

['/usr/include/python3.8',
 '/usr/lib/python3.8',
 '/usr/lib/python3.8/site-packages',
 '/usr/lib/python3.8/site-packages/__pycache__',
 '/usr/include/python3.8']

In [6]:
file_pkgs = {path: pkg for pkg in pkg_files for path in pkg_files[pkg]}
file_pkgs['/usr/lib64/python3.8/tkinter']

'python3-tkinter'

Finally, we get the size of every file. On different archtectures or different Python versions, or even different compiler version in different Fedora release, the sizes might be different. But we don't care for little differences, we are after the big stuff and we will assume what's big here will be big everywhere. We are aiming for  along term solution, so considering the differeneces here would not be helpful anyway.

Note that the `Counter.most_common()` method gives us the largest files, but we will care about directories and file types more.

In [7]:
file_sizes = collections.Counter({p: pathlib.Path(p).stat().st_size for p in python38_files})
file_sizes.most_common()[:8]

[('/usr/lib64/libpython3.8.so', 3851336),
 ('/usr/lib64/libpython3.8.so.1.0', 3851336),
 ('/usr/lib64/python3.8/lib-dynload/unicodedata.cpython-38-x86_64-linux-gnu.so',
  1096688),
 ('/usr/lib64/python3.8/pydoc_data/topics.py', 671801),
 ('/usr/lib64/python3.8/test/testtar.tar', 435200),
 ('/usr/lib64/python3.8/pydoc_data/__pycache__/topics.cpython-38.opt-1.pyc',
  417962),
 ('/usr/lib64/python3.8/pydoc_data/__pycache__/topics.cpython-38.opt-2.pyc',
  417962),
 ('/usr/lib64/python3.8/pydoc_data/__pycache__/topics.cpython-38.pyc', 417962)]

A quick check, how are directories sized:

In [8]:
humanize.naturalsize(file_sizes['/usr/lib64/python3.8'], binary=True)

'12.0 KiB'

Clearly, not recursively. Note: When we will measure a gain by removing a file, we only factor in the removal of that file, not the possible slim down of the directory originally containing it. We assume most of the directories are 4--92 KiB and hence we ignore such changes as insignificant. Let us measure that for sure:

In [9]:
directory_sizes = collections.Counter({p: pathlib.Path(p).stat().st_size for p in python38_files if pathlib.Path(p).is_dir()})
directory_sizes.most_common()[:8]

[('/usr/lib64/python3.8/test/__pycache__', 94208),
 ('/usr/lib64/python3.8/__pycache__', 32768),
 ('/usr/lib64/python3.8/test', 24576),
 ('/usr/lib64/python3.8/encodings/__pycache__', 20480),
 ('/usr/lib64/python3.8/idlelib/idle_test/__pycache__', 16384),
 ('/usr/lib64/python3.8', 12288),
 ('/usr/lib64/python3.8/ctypes/test/__pycache__', 12288),
 ('/usr/lib64/python3.8/distutils/tests/__pycache__', 12288)]

In [10]:
# Total space of directories larger than 4 KiB and hence possible to make smaller
humanize.naturalsize(sum(v for v in directory_sizes.values() if v > 4*1024), binary=True)

'244.0 KiB'

## Filesystem footprint by subpackages

In total, this has a large footprint, although large chunks of this are already split out:

In [11]:
humanize.naturalsize(sum(file_sizes.values()), binary=True)

'111.0 MiB'

In [12]:
{pkg: humanize.naturalsize(sum(s for p, s in file_sizes.items() if p in pkg_files[pkg]), binary=True) for pkg in pkg_files}

{'python3': '21.0 KiB',
 'python3-libs': '37.5 MiB',
 'python3-tkinter': '2.0 MiB',
 'python3-idle': '4.2 MiB',
 'python3-test': '62.8 MiB',
 'python3-devel': '4.5 MiB'}

It can be seen that the "main" `python3` package is not very relevant here. The `python3-test` package is optional and pretty much only usefull to test Python itself. It contains a lot of test data and we will not try to optimize its size. The `python3-idle` packag contains an application and while we can aim to minimize anything, we will not focus on this package either.

**The main problem is in the `python3-libs` package** – it is always installed when Python is installed.

The `python3-tkinter` package is less problematic. It is optional and only recommended if *Tk* is installed.

The `python3-devel` package is quite big as well and it is used both for builidng Python extension modules and Python RPM packages. Getting it slimmed won might be nice, but we would also consider moving stuff from `python3-libs` into it.

## Filesystem footprint by filetype

The standrad library (`/usr/lib64/python3.8/`, mostly in `python3-libs` and `python3-test`) contains several file types:

In [13]:
def ext(path):
    """Get a file extenstion, but treat .opt-?.pyc as special case"""
    suffixes = pathlib.Path(path).suffixes
    if not suffixes:
        return None
    if suffixes[-1] == '.pyc' and suffixes[-2].startswith('.opt-'):
        return suffixes[-2] + suffixes[-1]
    return suffixes[-1]

In [14]:
stdlib_files = [p for p in python38_files if p.startswith('/usr/lib64/python3.8/')]
extensions = {path: ext(path) for path in stdlib_files}

In [15]:
exts = collections.Counter(extensions.values())
exts.most_common()[:10]

[('.py', 1640),
 ('.opt-1.pyc', 1623),
 ('.opt-2.pyc', 1623),
 ('.pyc', 1623),
 (None, 242),
 ('.decTest', 143),
 ('.txt', 109),
 ('.so', 75),
 ('.xml', 56),
 ('.pem', 22)]

In [16]:
extsizes = collections.Counter({ext: sum(s for p, s in file_sizes.items()
                                         if p in stdlib_files and extensions[p] == ext)
                                for ext in exts})

{ext: humanize.naturalsize(size, binary=True) for ext, size in extsizes.most_common()[:5]}

{'.py': '26.4 MiB',
 '.pyc': '22.0 MiB',
 '.opt-1.pyc': '22.0 MiB',
 '.opt-2.pyc': '19.8 MiB',
 '.so': '5.3 MiB'}

Only from `python3-libs`:

In [17]:
extensions_libs = {p: e for p, e in extensions.items() if p in pkg_files['python3-libs']}
exts_libs = collections.Counter(extensions_libs.values())
exts_libs.most_common()[:6]

[('.py', 607),
 ('.opt-1.pyc', 607),
 ('.opt-2.pyc', 607),
 ('.pyc', 607),
 (None, 84),
 ('.so', 67)]

In [18]:
extsizes_libs = collections.Counter({ext: sum(s for p, s in file_sizes.items()
                                         if p in stdlib_files and p in pkg_files['python3-libs'] and extensions[p] == ext)
                                for ext in exts})

{ext: humanize.naturalsize(size, binary=True) for ext, size in extsizes_libs.most_common()[:5]}

{'.py': '9.8 MiB',
 '.pyc': '6.7 MiB',
 '.opt-1.pyc': '6.7 MiB',
 '.opt-2.pyc': '5.2 MiB',
 '.so': '4.9 MiB'}

## Filesystem footprint by module (and package)

In [19]:
module_sizes_by_extension = collections.defaultdict(lambda: collections.defaultdict(int))
msbe = module_sizes_by_extension

In [20]:
for path in stdlib_files:
    pkg = file_pkgs[path]
    libdir = '/usr/lib64/python3.8/'
    _path = path[len(libdir):]
    if _path.startswith('lib-dynload/'):
        _path = _path[len('lib-dynload/'):]
    elif _path.startswith('__pycache__/'):
        _path = _path[len('__pycache__/'):]

    if '/' in _path:
        modname = _path.partition('/')[0]
    else:
        modname = _path.partition('.')[0]

    msbe[(modname, pkg)][ext(path)] += file_sizes[path]

In [21]:
by_total = collections.Counter({m: sum(e.values()) for m, e in msbe.items()})
by_total.most_common()[:7]

[(('test', 'python3-test'), 58359936),
 (('idlelib', 'python3-idle'), 4406246),
 (('encodings', 'python3-libs'), 2598516),
 (('unittest', 'python3-test'), 2437651),
 (('pydoc_data', 'python3-libs'), 1934377),
 (('distutils', 'python3-libs'), 1870804),
 (('asyncio', 'python3-libs'), 1505039)]

In [22]:
def ns(num):
    if num:
        return humanize.naturalsize(num, binary=True)
    return '-'

In [23]:
sizes = [
    [
        m[0],
        m[1][len("python3-"):],
        ns(msbe[m]['.py']),
        ns(msbe[m]['.pyc']),
        ns(msbe[m]['.opt-1.pyc']),
        ns(msbe[m]['.opt-2.pyc']),
        ns(msbe[m]['.so']),
        ns(msbe[m][None]),
        ns(sum(s for s in msbe[m].values())),
    ]
    for m, _ in by_total.most_common()]

In [24]:
hdr = ['module', 'pkg', '.py', '.pyc', '.opt-1.pyc', '.opt-2.pyc', '.so', 'other', 'total']
display(HTML(tabulate.tabulate([hdr] + sizes, tablefmt='html')))

0,1,2,3,4,5,6,7,8
module,pkg,.py,.pyc,.opt-1.pyc,.opt-2.pyc,.so,other,total
test,test,13.3 MiB,12.2 MiB,12.1 MiB,11.8 MiB,-,560.1 KiB,55.7 MiB
idlelib,idle,1.1 MiB,977.9 KiB,976.6 KiB,839.9 KiB,-,95.0 KiB,4.2 MiB
encodings,libs,1.4 MiB,378.4 KiB,377.9 KiB,362.4 KiB,-,24.0 KiB,2.5 MiB
unittest,test,552.8 KiB,605.0 KiB,605.0 KiB,601.8 KiB,-,16.0 KiB,2.3 MiB
pydoc_data,libs,656.1 KiB,408.3 KiB,408.3 KiB,408.3 KiB,-,8.0 KiB,1.8 MiB
distutils,libs,647.1 KiB,421.3 KiB,420.5 KiB,321.1 KiB,-,16.9 KiB,1.8 MiB
asyncio,libs,441.2 KiB,365.8 KiB,363.6 KiB,291.2 KiB,-,8.0 KiB,1.4 MiB
lib2to3,test,353.0 KiB,333.0 KiB,322.9 KiB,321.1 KiB,-,32.4 KiB,1.3 MiB
tkinter,tkinter,350.5 KiB,354.9 KiB,354.9 KiB,225.3 KiB,-,8.0 KiB,1.3 MiB


## PEP 594 -- Removing dead batteries from the standard library

In [25]:
removed = """aifc
asynchat
asyncore
audioop
binhex
cgi
cgitb
chunk
crypt
formatter
fpectl
imghdr
imp
macpath
msilib
nntplib
nis
ossaudiodev
parser
pipes
smtpd
sndhdr
spwd
sunau
telnetlib
uu
xdrlib""".splitlines()

nested = """email.message.Message
email.mime
email.policy.Compat32""".splitlines()

In [26]:
!du /usr/lib64/python3.8/email/mime/

72	/usr/lib64/python3.8/email/mime/__pycache__
108	/usr/lib64/python3.8/email/mime/


In [27]:
!du /usr/lib64/python3.8/email/message.py /usr/lib64/python3.8/email/__pycache__/message.*.pyc

48	/usr/lib64/python3.8/email/message.py
40	/usr/lib64/python3.8/email/__pycache__/message.cpython-38.opt-1.pyc
24	/usr/lib64/python3.8/email/__pycache__/message.cpython-38.opt-2.pyc


In [28]:
!du /usr/lib64/python3.8/email/policy.py /usr/lib64/python3.8/email/__pycache__/policy.*.pyc

12	/usr/lib64/python3.8/email/policy.py
12	/usr/lib64/python3.8/email/__pycache__/policy.cpython-38.opt-1.pyc
4	/usr/lib64/python3.8/email/__pycache__/policy.cpython-38.opt-2.pyc


In [29]:
ns(sum(s for m in removed for s in msbe[m, 'python3-libs'].values()) + (108 + 48 + 40 + 24 + 12 + 12 + 4) * 1024)

'1.4 MiB'

## Developer modules

In [30]:
!du /usr/share/python-wheels/{pip,setuptools}*

1212	/usr/share/python-wheels/pip-19.1.1-py2.py3-none-any.whl
348	/usr/share/python-wheels/setuptools-41.2.0-py2.py3-none-any.whl


In [31]:
 ns((1212 + 348) * 1024)

'1.5 MiB'

In [32]:
devmodules = """pydoc_data
distutils
ensurepip
lib2to3
unittest
pydoc
doctest
venv
""".splitlines()

In [33]:
ns(sum(s for m in devmodules for s in msbe[m, 'python3-libs'].values()))

'6.1 MiB'

## Zip encodings and pydoc_data

In [34]:
!du /usr/lib64/python3.8/encodings/ /usr/lib64/python3.8/pydoc_data/

1132	/usr/lib64/python3.8/encodings/__pycache__
2868	/usr/lib64/python3.8/encodings/
420	/usr/lib64/python3.8/pydoc_data/__pycache__
1088	/usr/lib64/python3.8/pydoc_data/


In [35]:
!zip -9 /tmp/encodings.zip -r /usr/lib64/python3.8/encodings/

updating: usr/lib64/python3.8/encodings/ (stored 0%)
updating: usr/lib64/python3.8/encodings/cp1253.py (deflated 76%)
updating: usr/lib64/python3.8/encodings/iso8859_3.py (deflated 77%)
updating: usr/lib64/python3.8/encodings/cp850.py (deflated 80%)
updating: usr/lib64/python3.8/encodings/iso8859_5.py (deflated 78%)
updating: usr/lib64/python3.8/encodings/ascii.py (deflated 61%)
updating: usr/lib64/python3.8/encodings/shift_jis.py (deflated 65%)
updating: usr/lib64/python3.8/encodings/koi8_u.py (deflated 77%)
updating: usr/lib64/python3.8/encodings/cp863.py (deflated 80%)
updating: usr/lib64/python3.8/encodings/cp864.py (deflated 80%)
updating: usr/lib64/python3.8/encodings/gbk.py (deflated 65%)
updating: usr/lib64/python3.8/encodings/mac_latin2.py (deflated 77%)
updating: usr/lib64/python3.8/encodings/iso8859_11.py (deflated 76%)
updating: usr/lib64/python3.8/encodings/cp1254.py (deflated 76%)
updating: usr/lib64/python3.8/encodings/cp874.py (deflated 76%)
updating: usr/lib64/python3.

updating: usr/lib64/python3.8/encodings/__pycache__/punycode.cpython-38.pyc (deflated 49%)
updating: usr/lib64/python3.8/encodings/__pycache__/cp857.cpython-38.pyc (deflated 54%)
updating: usr/lib64/python3.8/encodings/__pycache__/cp1257.cpython-38.opt-2.pyc (deflated 47%)
updating: usr/lib64/python3.8/encodings/__pycache__/utf_32_le.cpython-38.pyc (deflated 54%)
updating: usr/lib64/python3.8/encodings/__pycache__/base64_codec.cpython-38.opt-2.pyc (deflated 60%)
updating: usr/lib64/python3.8/encodings/__pycache__/quopri_codec.cpython-38.pyc (deflated 56%)
updating: usr/lib64/python3.8/encodings/__pycache__/iso8859_16.cpython-38.opt-1.pyc (deflated 45%)
updating: usr/lib64/python3.8/encodings/__pycache__/utf_32.cpython-38.opt-2.pyc (deflated 60%)
updating: usr/lib64/python3.8/encodings/__pycache__/palmos.cpython-38.pyc (deflated 45%)
updating: usr/lib64/python3.8/encodings/__pycache__/palmos.cpython-38.opt-2.pyc (deflated 47%)
updating: usr/lib64/python3.8/encodings/__pycache_

updating: usr/lib64/python3.8/encodings/__pycache__/cp855.cpython-38.opt-2.pyc (deflated 54%)
updating: usr/lib64/python3.8/encodings/__pycache__/mac_farsi.cpython-38.pyc (deflated 46%)
updating: usr/lib64/python3.8/encodings/__pycache__/unicode_escape.cpython-38.pyc (deflated 54%)
updating: usr/lib64/python3.8/encodings/__pycache__/iso8859_15.cpython-38.opt-2.pyc (deflated 47%)
updating: usr/lib64/python3.8/encodings/__pycache__/utf_16_le.cpython-38.opt-1.pyc (deflated 52%)
updating: usr/lib64/python3.8/encodings/__pycache__/mac_arabic.cpython-38.pyc (deflated 53%)
updating: usr/lib64/python3.8/encodings/__pycache__/aliases.cpython-38.opt-1.pyc (deflated 61%)
updating: usr/lib64/python3.8/encodings/__pycache__/iso8859_16.cpython-38.opt-2.pyc (deflated 46%)
updating: usr/lib64/python3.8/encodings/__pycache__/mac_romanian.cpython-38.pyc (deflated 44%)
updating: usr/lib64/python3.8/encodings/__pycache__/cp1252.cpython-38.opt-1.pyc (deflated 45%)
updating: usr/lib64/python3.8/en

updating: usr/lib64/python3.8/encodings/kz1048.py (deflated 77%)
updating: usr/lib64/python3.8/encodings/iso2022_jp_2004.py (deflated 66%)
updating: usr/lib64/python3.8/encodings/cp037.py (deflated 76%)
updating: usr/lib64/python3.8/encodings/hz.py (deflated 65%)
updating: usr/lib64/python3.8/encodings/iso2022_jp.py (deflated 65%)
updating: usr/lib64/python3.8/encodings/koi8_r.py (deflated 78%)
updating: usr/lib64/python3.8/encodings/gb18030.py (deflated 65%)
updating: usr/lib64/python3.8/encodings/shift_jis_2004.py (deflated 65%)
updating: usr/lib64/python3.8/encodings/cp865.py (deflated 80%)
updating: usr/lib64/python3.8/encodings/__init__.py (deflated 65%)
updating: usr/lib64/python3.8/encodings/cp852.py (deflated 81%)
updating: usr/lib64/python3.8/encodings/utf_32_be.py (deflated 63%)
updating: usr/lib64/python3.8/encodings/cp857.py (deflated 80%)
updating: usr/lib64/python3.8/encodings/iso8859_2.py (deflated 78%)
updating: usr/lib64/python3.8/encodings/iso8859_8.py (

In [36]:
!zip -9 /tmp/pydoc_data.zip -r /usr/lib64/python3.8/pydoc_data/

updating: usr/lib64/python3.8/pydoc_data/ (stored 0%)
updating: usr/lib64/python3.8/pydoc_data/__pycache__/ (stored 0%)
updating: usr/lib64/python3.8/pydoc_data/__pycache__/topics.cpython-38.opt-2.pyc (deflated 71%)
updating: usr/lib64/python3.8/pydoc_data/__pycache__/topics.cpython-38.pyc (deflated 71%)
updating: usr/lib64/python3.8/pydoc_data/__pycache__/__init__.cpython-38.opt-1.pyc (deflated 26%)
updating: usr/lib64/python3.8/pydoc_data/__pycache__/__init__.cpython-38.opt-2.pyc (deflated 26%)
updating: usr/lib64/python3.8/pydoc_data/__pycache__/topics.cpython-38.opt-1.pyc (deflated 71%)
updating: usr/lib64/python3.8/pydoc_data/__pycache__/__init__.cpython-38.pyc (deflated 26%)
updating: usr/lib64/python3.8/pydoc_data/__init__.py (stored 0%)
updating: usr/lib64/python3.8/pydoc_data/_pydoc.css (deflated 15%)
updating: usr/lib64/python3.8/pydoc_data/topics.py (deflated 79%)


In [37]:
!du /tmp/encodings.zip /tmp/pydoc_data.zip

972	/tmp/encodings.zip
500	/tmp/pydoc_data.zip


In [38]:
ns((2868+1088)*1024)

'3.9 MiB'

In [39]:
ns((972-500)*1024)

'472.0 KiB'

In [40]:
ns(((2868+1088)-(972-500))*1024)

'3.4 MiB'

In [41]:
allzipfiles = [p for p in pkg_files['python3-libs'] if p.startswith('/usr/lib64/python3.8/') and not p.endswith('.so')]
allzipfiles[:10]

['/usr/lib64/python3.8/LICENSE.txt',
 '/usr/lib64/python3.8/__future__.py',
 '/usr/lib64/python3.8/__phello__.foo.py',
 '/usr/lib64/python3.8/__pycache__',
 '/usr/lib64/python3.8/__pycache__/__future__.cpython-38.opt-1.pyc',
 '/usr/lib64/python3.8/__pycache__/__future__.cpython-38.opt-2.pyc',
 '/usr/lib64/python3.8/__pycache__/__future__.cpython-38.pyc',
 '/usr/lib64/python3.8/__pycache__/__phello__.foo.cpython-38.opt-1.pyc',
 '/usr/lib64/python3.8/__pycache__/__phello__.foo.cpython-38.opt-2.pyc',
 '/usr/lib64/python3.8/__pycache__/__phello__.foo.cpython-38.pyc']

In [42]:
uncompressed = sum(file_sizes[p] for p in allzipfiles)
ns(uncompressed)

'28.8 MiB'

In [43]:
len(allzipfiles)

2525

In [44]:
!zip -9 /tmp/pylib.zip {' '.join(allzipfiles[:1024])}

updating: usr/lib64/python3.8/LICENSE.txt (deflated 68%)
updating: usr/lib64/python3.8/__future__.py (deflated 67%)
updating: usr/lib64/python3.8/__phello__.foo.py (deflated 3%)
updating: usr/lib64/python3.8/__pycache__/ (stored 0%)
updating: usr/lib64/python3.8/__pycache__/__future__.cpython-38.opt-1.pyc (deflated 54%)
updating: usr/lib64/python3.8/__pycache__/__future__.cpython-38.opt-2.pyc (deflated 51%)
updating: usr/lib64/python3.8/__pycache__/__future__.cpython-38.pyc (deflated 54%)
updating: usr/lib64/python3.8/__pycache__/__phello__.foo.cpython-38.opt-1.pyc (deflated 26%)
updating: usr/lib64/python3.8/__pycache__/__phello__.foo.cpython-38.opt-2.pyc (deflated 26%)
updating: usr/lib64/python3.8/__pycache__/__phello__.foo.cpython-38.pyc (deflated 26%)
updating: usr/lib64/python3.8/__pycache__/_bootlocale.cpython-38.opt-1.pyc (deflated 41%)
updating: usr/lib64/python3.8/__pycache__/_bootlocale.cpython-38.opt-2.pyc (deflated 42%)
updating: usr/lib64/python3.8/__pycache__/_bootlocale

updating: usr/lib64/python3.8/__pycache__/codecs.cpython-38.pyc (deflated 68%)
updating: usr/lib64/python3.8/__pycache__/codeop.cpython-38.opt-1.pyc (deflated 59%)
updating: usr/lib64/python3.8/__pycache__/codeop.cpython-38.opt-2.pyc (deflated 48%)
updating: usr/lib64/python3.8/__pycache__/codeop.cpython-38.pyc (deflated 59%)
updating: usr/lib64/python3.8/__pycache__/colorsys.cpython-38.opt-1.pyc (deflated 47%)
updating: usr/lib64/python3.8/__pycache__/colorsys.cpython-38.opt-2.pyc (deflated 49%)
updating: usr/lib64/python3.8/__pycache__/colorsys.cpython-38.pyc (deflated 47%)
updating: usr/lib64/python3.8/__pycache__/compileall.cpython-38.opt-1.pyc (deflated 49%)
updating: usr/lib64/python3.8/__pycache__/compileall.cpython-38.opt-2.pyc (deflated 42%)
updating: usr/lib64/python3.8/__pycache__/compileall.cpython-38.pyc (deflated 49%)
updating: usr/lib64/python3.8/__pycache__/configparser.cpython-38.opt-1.pyc (deflated 62%)
updating: usr/lib64/python3.8/__pycache__/configparser.cpython-38

updating: usr/lib64/python3.8/__pycache__/inspect.cpython-38.opt-2.pyc (deflated 54%)
updating: usr/lib64/python3.8/__pycache__/inspect.cpython-38.pyc (deflated 59%)
updating: usr/lib64/python3.8/__pycache__/io.cpython-38.opt-1.pyc (deflated 49%)
updating: usr/lib64/python3.8/__pycache__/io.cpython-38.opt-2.pyc (deflated 46%)
updating: usr/lib64/python3.8/__pycache__/io.cpython-38.pyc (deflated 49%)
updating: usr/lib64/python3.8/__pycache__/ipaddress.cpython-38.opt-1.pyc (deflated 69%)
updating: usr/lib64/python3.8/__pycache__/ipaddress.cpython-38.opt-2.pyc (deflated 65%)
updating: usr/lib64/python3.8/__pycache__/ipaddress.cpython-38.pyc (deflated 69%)
updating: usr/lib64/python3.8/__pycache__/keyword.cpython-38.opt-1.pyc (deflated 39%)
updating: usr/lib64/python3.8/__pycache__/keyword.cpython-38.opt-2.pyc (deflated 31%)
updating: usr/lib64/python3.8/__pycache__/keyword.cpython-38.pyc (deflated 39%)
updating: usr/lib64/python3.8/__pycache__/linecache.cpython-38.opt-1.pyc (deflated 44%)

updating: usr/lib64/python3.8/__pycache__/pydoc.cpython-38.opt-2.pyc (deflated 55%)
updating: usr/lib64/python3.8/__pycache__/pydoc.cpython-38.pyc (deflated 55%)
updating: usr/lib64/python3.8/__pycache__/queue.cpython-38.opt-1.pyc (deflated 63%)
updating: usr/lib64/python3.8/__pycache__/queue.cpython-38.opt-2.pyc (deflated 61%)
updating: usr/lib64/python3.8/__pycache__/queue.cpython-38.pyc (deflated 63%)
updating: usr/lib64/python3.8/__pycache__/quopri.cpython-38.opt-1.pyc (deflated 46%)
updating: usr/lib64/python3.8/__pycache__/quopri.cpython-38.opt-2.pyc (deflated 44%)
updating: usr/lib64/python3.8/__pycache__/quopri.cpython-38.pyc (deflated 46%)
updating: usr/lib64/python3.8/__pycache__/random.cpython-38.opt-1.pyc (deflated 53%)
updating: usr/lib64/python3.8/__pycache__/random.cpython-38.opt-2.pyc (deflated 49%)
updating: usr/lib64/python3.8/__pycache__/random.cpython-38.pyc (deflated 53%)
updating: usr/lib64/python3.8/__pycache__/re.cpython-38.opt-1.pyc (deflated 59%)
updating: usr

updating: usr/lib64/python3.8/__pycache__/telnetlib.cpython-38.pyc (deflated 56%)
updating: usr/lib64/python3.8/__pycache__/tempfile.cpython-38.opt-1.pyc (deflated 59%)
updating: usr/lib64/python3.8/__pycache__/tempfile.cpython-38.opt-2.pyc (deflated 58%)
updating: usr/lib64/python3.8/__pycache__/tempfile.cpython-38.pyc (deflated 59%)
updating: usr/lib64/python3.8/__pycache__/textwrap.cpython-38.opt-1.pyc (deflated 56%)
updating: usr/lib64/python3.8/__pycache__/textwrap.cpython-38.opt-2.pyc (deflated 48%)
updating: usr/lib64/python3.8/__pycache__/textwrap.cpython-38.pyc (deflated 56%)
updating: usr/lib64/python3.8/__pycache__/this.cpython-38.opt-1.pyc (deflated 40%)
updating: usr/lib64/python3.8/__pycache__/this.cpython-38.opt-2.pyc (deflated 40%)
updating: usr/lib64/python3.8/__pycache__/this.cpython-38.pyc (deflated 40%)
updating: usr/lib64/python3.8/__pycache__/threading.cpython-38.opt-1.pyc (deflated 63%)
updating: usr/lib64/python3.8/__pycache__/threading.cpython-38.opt-2.pyc (def

updating: usr/lib64/python3.8/asyncio/__pycache__/selector_events.cpython-38.opt-1.pyc (deflated 60%)
updating: usr/lib64/python3.8/asyncio/__pycache__/selector_events.cpython-38.opt-2.pyc (deflated 60%)
updating: usr/lib64/python3.8/asyncio/__pycache__/selector_events.cpython-38.pyc (deflated 60%)
updating: usr/lib64/python3.8/asyncio/__pycache__/sslproto.cpython-38.opt-1.pyc (deflated 58%)
updating: usr/lib64/python3.8/asyncio/__pycache__/sslproto.cpython-38.opt-2.pyc (deflated 56%)
updating: usr/lib64/python3.8/asyncio/__pycache__/sslproto.cpython-38.pyc (deflated 58%)
updating: usr/lib64/python3.8/asyncio/__pycache__/staggered.cpython-38.opt-1.pyc (deflated 46%)
updating: usr/lib64/python3.8/asyncio/__pycache__/staggered.cpython-38.opt-2.pyc (deflated 34%)
updating: usr/lib64/python3.8/asyncio/__pycache__/staggered.cpython-38.pyc (deflated 46%)
updating: usr/lib64/python3.8/asyncio/__pycache__/streams.cpython-38.opt-1.pyc (deflated 60%)
updating: usr/lib64/python3.8/asyncio/__pycac

updating: usr/lib64/python3.8/config-3.8-x86_64-linux-gnu/Makefile (deflated 76%)
updating: usr/lib64/python3.8/configparser.py (deflated 78%)
updating: usr/lib64/python3.8/contextlib.py (deflated 78%)
updating: usr/lib64/python3.8/contextvars.py (deflated 35%)
updating: usr/lib64/python3.8/copy.py (deflated 70%)
updating: usr/lib64/python3.8/copyreg.py (deflated 65%)
updating: usr/lib64/python3.8/crypt.py (deflated 58%)
updating: usr/lib64/python3.8/csv.py (deflated 71%)
updating: usr/lib64/python3.8/ctypes/ (stored 0%)
updating: usr/lib64/python3.8/ctypes/__init__.py (deflated 72%)
updating: usr/lib64/python3.8/ctypes/__pycache__/ (stored 0%)
updating: usr/lib64/python3.8/ctypes/__pycache__/__init__.cpython-38.opt-1.pyc (deflated 57%)
updating: usr/lib64/python3.8/ctypes/__pycache__/__init__.cpython-38.opt-2.pyc (deflated 55%)
updating: usr/lib64/python3.8/ctypes/__pycache__/__init__.cpython-38.pyc (deflated 57%)
updating: usr/lib64/python3.8/ctypes/__pycache__/_aix.cpython-38.opt-1.

updating: usr/lib64/python3.8/distutils/__pycache__/dist.cpython-38.opt-2.pyc (deflated 58%)
updating: usr/lib64/python3.8/distutils/__pycache__/dist.cpython-38.pyc (deflated 59%)
updating: usr/lib64/python3.8/distutils/__pycache__/errors.cpython-38.opt-1.pyc (deflated 62%)
updating: usr/lib64/python3.8/distutils/__pycache__/errors.cpython-38.opt-2.pyc (deflated 73%)
updating: usr/lib64/python3.8/distutils/__pycache__/errors.cpython-38.pyc (deflated 62%)
updating: usr/lib64/python3.8/distutils/__pycache__/extension.cpython-38.opt-1.pyc (deflated 53%)
updating: usr/lib64/python3.8/distutils/__pycache__/extension.cpython-38.opt-2.pyc (deflated 43%)
updating: usr/lib64/python3.8/distutils/__pycache__/extension.cpython-38.pyc (deflated 53%)
updating: usr/lib64/python3.8/distutils/__pycache__/fancy_getopt.cpython-38.opt-1.pyc (deflated 51%)
updating: usr/lib64/python3.8/distutils/__pycache__/fancy_getopt.cpython-38.opt-2.pyc (deflated 49%)
updating: usr/lib64/python3.8/distutils/__pycache__

updating: usr/lib64/python3.8/distutils/errors.py (deflated 64%)
updating: usr/lib64/python3.8/distutils/extension.py (deflated 68%)
updating: usr/lib64/python3.8/distutils/fancy_getopt.py (deflated 69%)
updating: usr/lib64/python3.8/distutils/file_util.py (deflated 67%)
updating: usr/lib64/python3.8/distutils/filelist.py (deflated 72%)
updating: usr/lib64/python3.8/distutils/log.py (deflated 65%)
updating: usr/lib64/python3.8/distutils/msvc9compiler.py (deflated 72%)
updating: usr/lib64/python3.8/distutils/msvccompiler.py (deflated 72%)
updating: usr/lib64/python3.8/distutils/spawn.py (deflated 67%)
updating: usr/lib64/python3.8/distutils/sysconfig.py (deflated 70%)
updating: usr/lib64/python3.8/distutils/text_file.py (deflated 69%)
updating: usr/lib64/python3.8/distutils/unixccompiler.py (deflated 69%)
updating: usr/lib64/python3.8/distutils/util.py (deflated 64%)
updating: usr/lib64/python3.8/distutils/version.py (deflated 66%)
updating: usr/lib64/python3.8/distutils/v

In [45]:
!zip -9 /tmp/pylib.zip -u {' '.join(allzipfiles[1024:])}

In [46]:
!du /tmp/pylib.zip

11260	/tmp/pylib.zip


In [47]:
ns(11260*1024)

'11.0 MiB'

In [48]:
ns(uncompressed-11260*1024)

'17.8 MiB'

## How many lines?

In [49]:
!cloc /usr/lib64/python3.8/

    2005 text files.
    1975 unique files.                                          
     342 files ignored.

github.com/AlDanial/cloc v 1.82  T=5.79 s (291.3 files/s, 131184.4 lines/s)
---------------------------------------------------------------------------------------
Language                             files          blank        comment           code
---------------------------------------------------------------------------------------
Python                                1602         110542         133457         509412
HTML                                     3             70             11           1628
make                                     1            213            266           1431
Bourne Shell                             4             90            164            599
Windows Module Definition                4             22              0            547
C                                        2             77             72            376
XML                  

## Do bytecodes differ?

In [50]:
import filecmp

pairs = []
saved = 0
same_counts = collections.Counter()

for path in stdlib_files:
    if ext(path) == '.pyc' and path in pkg_files['python3-libs']:
        opt1 = path[:-3] + 'opt-1.pyc'
        opt2 = path[:-3] + 'opt-2.pyc'
        matches = 0
        if filecmp.cmp(path, opt1, shallow=False):
            pairs.append((path, opt1))
            matches += 1
            same_counts[0, 1] += 1
            saved += file_sizes[path]
        if filecmp.cmp(path, opt2, shallow=False):
            pairs.append((path, opt2))
            same_counts[0, 2] += 1
            matches += 1
            saved += file_sizes[path]
        if filecmp.cmp(opt1, opt2, shallow=False):
            pairs.append((opt1, opt2))
            same_counts[1, 2] += 1
            matches += 1
            saved += file_sizes[opt1]
        if matches == 3:
            saved -= file_sizes[path]
            same_counts[0, 1, 2] += 1

In [51]:
len(pairs)

584

In [52]:
ns(saved)

'4.0 MiB'

In [53]:
same_counts.most_common()

[((0, 1), 454), ((1, 2), 68), ((0, 2), 62), ((0, 1, 2), 62)]

### On my workstation with Pyhon 3.7

In [54]:
all_my_37_pycs = !locate .cpython-37.pyc | grep ^/usr/lib

In [55]:
all_my_37_pycs[:10]

['/usr/lib/cura/plugins/3MFReader/__pycache__/ThreeMFReader.cpython-37.pyc',
 '/usr/lib/cura/plugins/3MFReader/__pycache__/ThreeMFWorkspaceReader.cpython-37.pyc',
 '/usr/lib/cura/plugins/3MFReader/__pycache__/WorkspaceDialog.cpython-37.pyc',
 '/usr/lib/cura/plugins/3MFReader/__pycache__/__init__.cpython-37.pyc',
 '/usr/lib/cura/plugins/3MFWriter/__pycache__/ThreeMFWorkspaceWriter.cpython-37.pyc',
 '/usr/lib/cura/plugins/3MFWriter/__pycache__/ThreeMFWriter.cpython-37.pyc',
 '/usr/lib/cura/plugins/3MFWriter/__pycache__/__init__.cpython-37.pyc',
 '/usr/lib/cura/plugins/CuraDrive/__pycache__/__init__.cpython-37.pyc',
 '/usr/lib/cura/plugins/CuraDrive/src/__pycache__/DriveApiService.cpython-37.pyc',
 '/usr/lib/cura/plugins/CuraDrive/src/__pycache__/DrivePluginExtension.cpython-37.pyc']

In [60]:
my_saved = 0
total = 0
my_same_counts = collections.Counter()

for path in all_my_37_pycs:
    size = pathlib.Path(path).stat().st_size
    total += size
    opt1 = path[:-3] + 'opt-1.pyc'
    opt2 = path[:-3] + 'opt-2.pyc'
    matches = 0
    try:
        if filecmp.cmp(path, opt1, shallow=False):
            my_same_counts[0, 1] += 1
            matches += 1
            my_saved += size
        total += size
    except FileNotFoundError:
        pass
    try:
        if filecmp.cmp(path, opt2, shallow=False):
            my_same_counts[0, 2] += 1
            matches += 1
            my_saved += size
        total += size
    except FileNotFoundError:
        pass
    try:
        size = pathlib.Path(opt1).stat().st_size
        if filecmp.cmp(opt1, opt2, shallow=False):
            my_same_counts[1, 2] += 1
            matches += 1
            my_saved += size
        total += size
    except FileNotFoundError:
        pass
    if matches == 3:
        my_saved -= pathlib.Path(path).stat().st_size
        my_same_counts[0, 1, 2] += 1

In [61]:
ns(my_saved)

'107.5 MiB'

In [63]:
ns(total)

'360.4 MiB'

In [62]:
my_same_counts.most_common()

[((0, 1), 13622), ((1, 2), 492), ((0, 2), 466), ((0, 1, 2), 466)]