# The Python Standard Library

## Subjective overview

PSL is the biggest part of https://docs.python.org/3.6/contents.html
Some things are obsolete (Template?), some specialized (media, shlex, NNTP!?) but some just overlooked

## 8.3. collections

* 8.3.1. ChainMap objects
  * 8.3.1.1. ChainMap Examples and Recipes
* 8.3.2. Counter objects
* 8.3.3. deque objects
  * 8.3.3.1. deque Recipes
* 8.3.4. defaultdict objects
  * 8.3.4.1. defaultdict Examples
* 8.3.5. namedtuple() Factory Function for Tuples with Named Fields
* 8.3.6. OrderedDict objects
  * 8.3.6.1. OrderedDict Examples and Recipes
* 8.3.7. UserDict objects
* 8.3.8. UserList objects
* 8.3.9. UserString objects



In [4]:
a = {'a': 1}
b = {'b': 2}
a.update(b)
print(a['b'])

2


In [6]:
from collections import ChainMap
a = [{'a': 1},
     {'b': 2},
     {'c': 3}]
c = ChainMap(*a)
print(c['c'])

3


In [7]:
from collections import Counter

c = Counter(['red', 'blue', 'red', 'green', 'blue', 'blue'])
c.most_common(2)


[('blue', 3), ('red', 2)]

## 8.3.3. deque objects

*class* collections.**deque**([*iterable*[, *maxlen*]])

Deques are a generalization of stacks and queues (the name is pronounced “deck” and is short for “double-ended queue”). Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same **O(1) performance in either direction**.

Though list objects support similar operations, they are optimized for fast fixed-length operations and incur O(n) memory movement costs for pop(0) and insert(0, v) operations which change both the size and position of the underlying data representation.

If maxlen is not specified or is None, deques may grow to an arbitrary length. Otherwise, the deque is bounded to the specified maximum length. **Once a bounded length deque is full, when new items are added, a corresponding number of items are discarded from the opposite end.** Bounded length deques provide functionality similar to the tail filter in Unix. They are also useful for tracking transactions and other pools of data where only the most recent activity is of interest.

In [8]:
# https://gist.github.com/hrldcpr/2012250
from collections import defaultdict

def tree(): return defaultdict(tree)

taxonomy = tree()
taxonomy['Animalia']['Chordata']['Mammalia']['Carnivora']['Felidae']['Felis']['cat']
taxonomy['Animalia']['Chordata']['Mammalia']['Carnivora']['Felidae']['Panthera']['lion']
taxonomy['Animalia']['Chordata']['Mammalia']['Carnivora']['Canidae']['Canis']['dog']
taxonomy['Animalia']['Chordata']['Mammalia']['Carnivora']['Canidae']['Canis']['coyote']
taxonomy['Plantae']['Solanales']['Solanaceae']['Solanum']['tomato']
taxonomy['Plantae']['Solanales']['Solanaceae']['Solanum']['potato']
taxonomy['Plantae']['Solanales']['Convolvulaceae']['Ipomoea']['sweet potato']

import json
json.dumps(taxonomy)

'{"Animalia": {"Chordata": {"Mammalia": {"Carnivora": {"Felidae": {"Felis": {"cat": {}}, "Panthera": {"lion": {}}}, "Canidae": {"Canis": {"dog": {}, "coyote": {}}}}}}}, "Plantae": {"Solanales": {"Solanaceae": {"Solanum": {"tomato": {}, "potato": {}}}, "Convolvulaceae": {"Ipomoea": {"sweet potato": {}}}}}}'

In [10]:
from collections import namedtuple
EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade')

import csv
for emp in map(EmployeeRecord._make, csv.reader(open("employees.csv", "r"))):
    print(emp.name, emp.title)

John CEO
Jack CTO


## 8.3.6. OrderedDict objects

## 4. Built-in Types

* 4.1. Truth Value Testing
* 4.2. Boolean Operations — and, or, not
* 4.3. Comparisons
* 4.4. Numeric Types — int, float, complex
* 4.5. Iterator Types
* 4.6. Sequence Types — list, tuple, range
* 4.7. Text Sequence Type — str
* 4.8. Binary Sequence Types — bytes, bytearray, memoryview
* **4.9. Set Types — set, frozenset**
* 4.10. Mapping Types — dict
* 4.11. Context Manager Types
* 4.12. Other Built-in Types
* 4.13. Special Attributes


In [64]:
a = set([1, 2, 3, 4, 5, 6, 7])
b = set([1, 2, 3, 4, 5, 8, 9])

intersection = a & b
exclusive = a ^ b

print('Same: %s, different: %s' % (intersection, exclusive))

a |= {100, 200}
print(a)


Same: {1, 2, 3, 4, 5}, different: {6, 7, 8, 9}
{1, 2, 3, 4, 5, 6, 7, 200, 100}


In [15]:
a = frozenset([1 ,2, 3])
b = frozenset([2, 3, 4])
{a: 'a', b: 'b'}

{frozenset({1, 2, 3}): 'a', frozenset({2, 3, 4}): 'b'}

## 8. Data Types

* 8.1. datetime — Basic date and time types
* 8.2. calendar — General calendar-related functions
* **8.3. collections — Container datatypes**
* **8.4. collections.abc — Abstract Base Classes for Containers**
* 8.5. heapq — Heap queue algorithm
* 8.6. bisect — Array bisection algorithm
* 8.7. array — Efficient arrays of numeric values
* 8.8. weakref — Weak references
* 8.9. types — Dynamic type creation and names for built-in types
* 8.10. copy — Shallow and deep copy operations
* **8.11. pprint — Data pretty printer**
* **8.12. reprlib — Alternate repr() implementation**
* 8.13. enum — Support for enumerations


## 6. Text Processing Services

* **6.1. string — Common string operations**
* **6.2. re — Regular expression operations**
* **6.3. difflib — Helpers for computing deltas**
* **6.4. textwrap — Text wrapping and filling**
* 6.5. unicodedata — Unicode Database
* 6.6. stringprep — Internet String Preparation
* 6.7. readline — GNU readline interface
* 6.8. rlcompleter — Completion function for GNU readline


In [None]:
"My quest is {name}"              # References keyword argument 'name'
'Coordinates: {latitude}, {longitude}'.format(latitude='37.24N', longitude='-115.81W')

"Weight in tons {0.weight}"       # 'weight' attribute of first positional arg
"Units destroyed: {players[0]}"   # First element of keyword argument 'players'.

"Harold's a clever {0!s}"        # Calls str() on the argument first
"Bring out the holy {name!r}"    # Calls repr() on the argument first
"More {!a}"                      # Calls ascii() on the argument first

'Correct answers: {:.2%}'.format(points/total)

![https://xkcd.com/208/](img/regular_expressions.png)

https://xkcd.com/208/

![https://xkcd.com/1171/](img/perl_problems.png)

https://xkcd.com/1171/

```
In [14]: %timeit 'sentence' in 'Some long sentence'
24.7 ns ± 0.0553 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [15]: %timeit re.search(r'sentence', 'Some long sentence')
390 ns ± 2.42 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [16]: %timeit 'Some long sentence'.replace('sentence', 'text')
121 ns ± 0.896 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [17]: %timeit re.sub(r'sentence', 'text', 'Some long sentence')
501 ns ± 1.52 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [19]: expr = re.compile('sentence')

In [20]: %timeit expr.sub('text', 'Some long sentence')
218 ns ± 1.56 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```

## 10. Functional Programming Modules

* 10.1. itertools — Functions creating iterators for efficient looping
* 10.2. functools — Higher-order functions and operations on callable objects
* 10.3. operator — Standard operators as functions


In [None]:
from itertools import chain, combinations, permutations, groupby

list(chain('ABC', 'DEF'))
list(chain.from_iterable(['ABC', 'DEF']))
# ['A', 'B', 'C', 'D', 'E', 'F']


list(combinations('ABCD', 2))
# [('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D')]
list(permutations('ABCD', 2))
# [('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'A'), ('B', 'C'), ('B', 'D'), ('C', 'A'), ('C', 'B'), ('C', 'D'), ('D', 'A'), ('D', 'B'), ('D', 'C')]

[list(g) for k, g in groupby('AAAABBBCCD', key=str)]
# [['A', 'A', 'A', 'A'], ['B', 'B', 'B'], ['C', 'C'], ['D']]

In [66]:
from functools import lru_cache

@lru_cache(maxsize=None)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

[fib(n) for n in range(16)]
# [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610]

fib.cache_info()

CacheInfo(hits=28, misses=16, maxsize=None, currsize=16)

In [78]:
from time import sleep, asctime

class A:
    @property
    @lru_cache(maxsize=None)
    def expensive(self):
        sleep(5)
        return 1
    
a = A()
print('Started at %s' % asctime())
print('Got %d at %s' % (a.expensive, asctime()))
print('Got %d at %s' % (a.expensive, asctime()))

Started at Sun Jan  7 20:39:02 2018
Got 1 at Sun Jan  7 20:39:07 2018
Got 1 at Sun Jan  7 20:39:07 2018


In [10]:
from functools import partial
basetwo = partial(int, base=2)
basetwo.__doc__ = 'Convert base 2 string to an int.'
basetwo('10010')

18

In [9]:
# https://hynek.me/articles/serialization/

from datetime import datetime
from functools import singledispatch

@singledispatch
def to_serializable(val):
    """Used by default."""
    return str(val)
    
@to_serializable.register(datetime)
def ts_datetime(val):
    """Used if *val* is an instance of datetime."""
    return val.isoformat() + "Z"

import json
json.dumps({"msg": "hi", "ts": datetime.now()}, default=to_serializable)

'{"msg": "hi", "ts": "2018-01-07T18:26:20.746777Z"}'

## 11. File and Directory Access

* **11.1. pathlib — Object-oriented filesystem paths**
* 11.2. os.path — Common pathname manipulations
* 11.3. fileinput — Iterate over lines from multiple input streams
* 11.4. stat — Interpreting stat() results
* 11.5. filecmp — File and Directory Comparisons
* 11.6. tempfile — Generate temporary files and directories
* 11.7. glob — Unix style pathname pattern expansion
* 11.8. fnmatch — Unix filename pattern matching
* 11.9. linecache — Random access to text lines
* 11.10. shutil — High-level file operations
* 11.11. macpath — Mac OS 9 path manipulation functions

In [23]:
# http://journalpanic.com/pyhi/posts/pathlib-and-ospath/
import os
from pathlib import Path

old_path = os.path.join('/', 'home', 'jwas', 'src')
new_path = Path('/', 'home', 'jwas', 'src')
new_path = Path('/') / 'home' / 'jwas' / 'src'

input_path = os.path.join(old_path, 'pystok-stdlib', 'employees.csv')
with open(input_path) as foo:
    print(foo.readline())

input_path = new_path / 'pystok-stdlib' / 'employees.csv'
with input_path.open() as foo:
    print(foo.readline())
    
print([str(path) for path in Path.cwd().rglob('*.py')])

print(input_path.parts)

John,20,CEO,IT,a lot

John,20,CEO,IT,a lot

['/home/jwas/src/pystok-stdlib/demo.py', '/home/jwas/src/pystok-stdlib/decorators.py', '/home/jwas/src/pystok-stdlib/conf.py']
('/', 'home', 'jwas', 'src', 'pystok-stdlib', 'employees.csv')


## 13. Data Compression and Archiving

* 13.1. zlib — Compression compatible with gzip
* **13.2. gzip — Support for gzip files**
* 13.3. bz2 — Support for bzip2 compression
* 13.4. lzma — Compression using the LZMA algorithm
* 13.5. zipfile — Work with ZIP archives
* 13.6. tarfile — Read and write tar archive files

## 14. File Formats

* **14.1. csv — CSV File Reading and Writing**
* 14.2. configparser — Configuration file parser
* 14.3. netrc — netrc file processing
* 14.4. xdrlib — Encode and decode XDR data
* 14.5. plistlib — Generate and parse Mac OS X .plist files


In [26]:
import csv, gzip
path = Path('employees.csv.gz')
with gzip.open(path, 'rt', encoding='utf8') as file:
    reader = csv.reader(file)
    print({row[0] for row in reader})


{'John', 'Jack'}


## 17. Concurrent Execution

* 17.1. threading — Thread-based parallelism
* 17.2. multiprocessing — Process-based parallelism
* 17.3. The concurrent package
* **17.4. concurrent.futures — Launching parallel tasks**
* 17.5. subprocess — Subprocess management
* 17.6. sched — Event scheduler
* 17.7. queue — A synchronized queue class
* 17.8. dummy_threading — Drop-in replacement for the threading module
* 17.9. _thread — Low-level threading API
* 17.10. _dummy_thread — Drop-in replacement for the _thread module


In [40]:
import concurrent.futures
from itertools import groupby
import time
from random import randint

batches = [{'name': 'first', 'data': 'aaabb'},
           {'name': 'second', 'data': 'ddggiii'},
           {'name': 'third', 'data': 'xx'}]

def process(batch, name):
    result = [list(g) for k, g in groupby(batch)]
    time.sleep(randint(300, 500) / 1000)
    print('Done with {}'.format(name))
    return result

with concurrent.futures.ThreadPoolExecutor(max_workers=len(batches)) as executor:
    future_to_batch = {}
    for batch in batches:
        future = executor.submit(process, batch['data'], batch['name'])
        future_to_batch[future] = batch['name']
        
    results = list(concurrent.futures.as_completed(future_to_batch))
    print('Collected all results')
    for future in results:
        name = future_to_batch[future]
        groups = future.result()
        print(name, groups)

Done with third
Done with second
Done with first
Collected all results
third [['x', 'x']]
second [['d', 'd'], ['g', 'g'], ['i', 'i', 'i']]
first [['a', 'a', 'a'], ['b', 'b']]


## 18. Interprocess Communication and Networking

* 18.1. socket — Low-level networking interface
* 18.2. ssl — TLS/SSL wrapper for socket objects
* 18.3. select — Waiting for I/O completion
* 18.4. selectors — High-level I/O multiplexing
* **18.5. asyncio — Asynchronous I/O, event loop, coroutines and tasks**
* 18.6. asyncore — Asynchronous socket handler
* 18.7. asynchat — Asynchronous socket command/response handler
* 18.8. signal — Set handlers for asynchronous events
* 18.9. mmap — Memory-mapped file support

requests > urllib.request

aiohttp > asyncio > socket + select

https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html

https://github.com/Opentopic/http-mamba/blob/master/http-mamba.py

## 21. Internet Protocols and Support

* ...
* **21.8. urllib.parse — Parse URLs into components**
* ...
* 21.20. uuid — UUID objects according to RFC 4122
* ...




In [59]:
from urllib.parse import urlparse, urlunparse, urljoin, parse_qs
# must start with scheme or //
url = '//www.cwi.nl:80/%7Eguido/Python.html?a=b'
parts = urlparse(url)
print(parts)

# don't have to manually split netloc!
print(parts.hostname, parts.port, parts.username, parts.password)

scheme, netloc, path, params, query, fragment = parts
print(urlunparse([scheme or 'http', netloc, 'FAQ.html', params, query, fragment]))

print(urljoin(url, 'FAQ.html'))

# use urlencode to reconstruct back to query string
print(parse_qs(parts.query))

ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', params='', query='a=b', fragment='')
www.cwi.nl 80 None None
http://www.cwi.nl:80/FAQ.html?a=b
//www.cwi.nl:80/%7Eguido/FAQ.html
{'a': ['b']}


## 16. Generic Operating System Services

* 16.1. os — Miscellaneous operating system interfaces
* 16.2. io — Core tools for working with streams
* **16.3. time — Time access and conversions**
* **16.4. argparse — Parser for command-line options, arguments and sub-commands**
* 16.5. getopt — C-style parser for command line options
* **16.6. logging — Logging facility for Python**
* 16.7. logging.config — Logging configuration
* 16.8. logging.handlers — Logging handlers
* 16.9. getpass — Portable password input
* 16.10. curses — Terminal handling for character-cell displays
* 16.11. curses.textpad — Text input widget for curses programs
* 16.12. curses.ascii — Utilities for ASCII characters
* 16.13. curses.panel — A panel stack extension for curses
* 16.14. platform — Access to underlying platform’s identifying data
* 16.15. errno — Standard errno system symbols
* 16.16. ctypes — A foreign function library for Python


## 27. Debugging and Profiling

* 27.1. bdb — Debugger framework
* 27.2. faulthandler — Dump the Python traceback
* **27.3. pdb — The Python Debugger**
* **27.4. The Python Profilers**
* **27.5. timeit — Measure execution time of small code snippets**
* 27.6. trace — Trace or track Python statement execution
* 27.7. tracemalloc — Trace memory allocations

In [None]:
#!/usr/bin/env python
# -*- coding: utf-8 -*-

import argparse


def doit():
    print('Done')


def main(dry_run=False):
    doit()


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description="Demo script")
    parser.add_argument('-d', '--dry_run', action='store_true',
                        help='don\'t execute any actions, just log them')
    options = parser.parse_args()
    main(dry_run=options.dry_run)


In [None]:
#!/usr/bin/env python
# -*- coding: utf-8 -*-

import argparse


def doit():
    import pdb; pdb.set_trace()
    print('Done')


def main(dry_run=False):
    doit()


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description="Demo script")
    parser.add_argument('-d', '--dry_run', action='store_true',
                        help='don\'t execute any actions, just log them')
    options = parser.parse_args()
    main(dry_run=options.dry_run)


```
% python demo.py

> /home/jwas/src/pystok-stdlib/demo.py(9)doit()
-> print('Done')
(Pdb) 
```

```
% python -m pdb demo.py 

> /home/jwas/src/pystok-stdlib/demo.py(4)<module>()
-> import argparse
(Pdb) b doit
Breakpoint 1 at /home/jwas/src/pystok-stdlib/demo.py:7
(Pdb) run -d
Restarting demo.py with arguments:
	demo.py
> /home/jwas/src/pystok-stdlib/demo.py(4)<module>()
-> import argparse
(Pdb) 
```

```
% python -m timeit '"%s" % 5'
100000000 loops, best of 3: 0.00672 usec per loop
```

```
% python -m timeit '"%s" % 5'
100000000 loops, best of 3: 0.00672 usec per loop

% python -m timeit '"{}".format(5)'
10000000 loops, best of 3: 0.115 usec per loop
```

```
% ipython
Python 3.6.4 (default, Dec 23 2017, 19:07:07) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: %timeit "{}".format(5)
117 ns ± 0.319 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
```

In [None]:
import logging
import time
from functools import wraps

logger = logging.getLogger(__name__)


def timelog(method):
    @wraps(method)
    def wrapper(*args, **kwargs):
        before = time.perf_counter()
        result = method(*args, **kwargs)
        after = time.perf_counter()

        logger.debug('Timed %s (%s, %s): %.2f s',
                     method.__name__, args, kwargs, after - before)
        return result

    return wrapper

In [None]:
@timelog
def doit():
    time.sleep(5)
    print('Done')


```
% ./demo.py 
Done
2018-01-04 22:37:50,908 DEBUG decorators Timed doit ((), {}): 5.01 s
```

In [None]:
# https://gist.github.com/nealtodd/2489618
def proflog(sort_args=['cumulative'], print_args=[10]):
    profiler = Profile()

    def decorator(method):
        @wraps(method)
        def wrapper(*args, **kwargs):
            try:
                result = profiler.runcall(method, *args, **kwargs)
            finally:
                s = io.StringIO()
                stats = pstats.Stats(profiler, stream=s)
                stats.strip_dirs().sort_stats(*sort_args).\
                    print_stats(*print_args)
                logger.debug('Profiled %s (%s, %s): %s',
                             method.__name__, args, kwargs, s.getvalue())
            return result
        return wrapper
    return decorator

```
% ./demo.py 
Done
2018-01-04 22:45:24,738 DEBUG decorators Profiled doit ((), {}):          4 function calls in 5.005 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    5.005    5.005 demo.py:11(doit)
        1    5.005    5.005    5.005    5.005 {built-in method time.sleep}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
```

## 26. Development Tools

* 26.1. typing — Support for type hints
* 26.2. pydoc — Documentation generator and online help system
* **26.3. doctest — Test interactive Python examples**
* 26.4. unittest — Unit testing framework
* 26.5. unittest.mock — mock object library
* 26.6. unittest.mock — getting started
* 26.7. 2to3 - Automated Python 2 to 3 code translation
* 26.8. test — Regression tests package for Python
* 26.9. test.support — Utilities for the Python test suite


In [None]:
@timelog
def doit(n=None):
    """Returns the n argument or prints Done.

    >>> [doit(5) for n in range(3)]
    [5, 5, 5]"""
    time.sleep(1)
    if n is None:
        print('Done')
    return n

```
% ./demo.py                
Done
2018-01-04 22:56:55,359 DEBUG decorators Timed doit ((), {}): 1.00 s

% python -m doctest demo.py
2018-01-04 22:56:40,537 DEBUG decorators Timed doit ((5,), {}): 1.00 s
2018-01-04 22:56:41,539 DEBUG decorators Timed doit ((5,), {}): 1.00 s
2018-01-04 22:56:42,540 DEBUG decorators Timed doit ((5,), {}): 1.00 s
```

## Summary

* RTFM even if doing SODD
* %timeit
* KISS


## Q&A