# The Python Standard Library

The Python Standard Library is a collection of modules which come "pre-baked" when you install python. It includes lots of really useful modules such as

* **os, pathlib, sys, argparse**: Interface with your operating system. Work with files etc.
* **string, re**: Working with strings and regular expressions
* **math, random, statistics, decimal, fractions**: Basic mathematics (we'll stick with `numpy`)
* **time, datetime**: Work with dates and times, datetime interfaces well with `pandas`
* **zlib, gzip, bz2, lzma, zipfile, tarfile**: Working with compression
* **json, csv**: dealing with files (see also `pandas)
* **email, smtplib, urllib**: Internet protocols
* **hashlib, hmac, secrets**: Working with cryptography
* **unittest**: Testing!                                            
* **collections, itertools**:
* **pickle, shelve, dmb, sqlite3**: 
* **subprocess, threading, multiprocessing**:
* **asyncio, socket, ssl**:


But there is [*much* more](https://docs.python.org/3/library/), more than we could ever hope to cover. The [documentation for the standard library](https://docs.python.org/3/library/) is pretty good so I'm just going to cherry a few I find interesting. You've used some of these (e.g. `string`) implicitly already.

## os, sys & pathlib

Together these modules can make interfacing with the rest of your computer easier. 

### OS
OS deals with miscellaneous tasks with for your operating system. The idea is that whether you are running Python on Mac, Windows or Linux there are some common tasks you'll probably want to do and you shouldn't have to worry too much about the specific OS. The `os` module provides a portable way of accessing those system dependent tasks.

A good example is the environment. All 3 OS's have some concept of environment variables and os can help you examine them

In [None]:
import os
os.environ

So it returns a dictionary with all of my environment variables. I can also set variables

In [None]:
os.environ['NAME'] = 'iana'
os.getenv('NAME')

In [None]:
os.getenv('SECOND_NAME', 'Not defined')

it also has methods for dealing with files and directories

In [None]:
os.getcwd()

`os` also has functions for working with files (in gory detail!), and for working with processes (though `subprocess` is often better).

## Pathlib

[`pathlib`](https://docs.python.org/3/library/pathlib.html) is a good partner module to `os`. As the name suggest it deals with representing filesystem paths

In [None]:
from pathlib import Path

In [None]:
p = Path('.')
p

That's POSIX for "this directory". We can look for files in this directory

In [None]:
files = list(p.glob('*.ipynb'))
files

(`.glob` returns a generator so we passed that to list to actually generate a list of the files)

In [None]:
p.exists()

In [None]:
p.is_dir()

In [None]:
p.name

In [None]:
files[0].name

In [None]:
files[0].suffix

You can combine paths, figure out file types (socket, fifo, ...) and do much more. Take a look at the [paths docs](https://docs.python.org/3/library/pathlib.html) for more details.

## Regular expressions (`re`)

Regular expressions are sequences of characters that define a search pattern. If you've never used them before they can look like gibberish, but with a little practice they can be extremely powerful. The [`re`](https://docs.python.org/3/library/re.html) module includes facilities for search, replacing and otherwise manipulating strings with regular expressions.

In [None]:
import re

There are two main methods for searching `search` and `match`. `search` is more general and will scan through the entire string looking for a match, `match` will only look at the start of the string. The idea with `re` is to "compile" your regular expression to create a regular expression object. That object then has methods for searching and other tasks.

In [None]:
opening = """Many years later, as he faced the firing squad, Colonel Aureliano Buendia was to remember that distant afternoon when his father took him to discover ice.
"""
re1 = re.compile("the")
match1 = re1.search(opening)
match1

`search` stops at the first occurrence and returns a `re.Match`. The span of the match tells us where the occurrence was

In [None]:
opening[30:33]

The power of regular expressions is not in succincly encoding all of the properties of the class of things you want to match. e.g. it should be case-independent, there should only be one space between words, the letter follwing `s` must be either `t` or `h`, .... Here is an example

In [None]:
githubRE = re.compile(
    'https://(?P<domain>[^,/]+)/(?P<owner>[^,/]+)/(?P<repo>[^,/]+)(\.git)?'
)

If you're nfamiliar with regular expressions this can be hard to digest but basically this regular expression tries to match something like

  * https://github.com/ianabc/WestGridRSSNotes
  
and splits it up so that we can play with the various parts (the `<domain>`, `<owner>` and `<repo>`. Here it is in action

In [None]:
teststr = "Take a look at https://github.com/ianabc/WestGridRSSNotes"

repo_info = githubRE.search(teststr)
repo_info

In [None]:
print(f"The repository is on {repo_info.group('domain')}, "
      f"owned by {repo_info.group('owner')} ",
      f"and is called {repo_info.group('repo')}"
)


# Hashlib

These modules are obviously important for working with cryptography, but hashing comes up in lots of contexts where you want to be sure you know when something has changed

In [None]:
import hashlib

In [None]:
message = "We slept in what had once been the gymnasium."

hashlib.sha224(message.encode()).hexdigest()

In [None]:
message = message + '\n'
hashlib.sha256(message.encode()).hexdigest()

One common use for hashlib is when working with internet protocols. When you are send messages back and forth to some API it is a good idea (and is often required) to sign and hash message. The upside of this is increased security but the downside is that it can be ***extremely*** challenging to make sure that you are doing the same hashing calculation on both sides of the conversation. In the example above I added a newline to my text and look at what it did to the hash!

## Collections

As the name suggest collections adds a few more specialized container datatypes to the usual `list`, `dict`, `set` and `tuple`. Here is a summary

* `namedtuple`: Like a tuple
* `dequeue`: A list like container with fast appends and pops
* `Chainmap`: Combine several dictionaries 
* `Counter`: Turns an iterable into a dictionary of keys and value counts
* `OrderedDict`: Like a dictionary but ordered!
* `defaultdict`: Like a dictionary but works with missing values
* `UserDict`, `UserList`, `UserString`: Subclasses of existing collections

The 3 items in the last bullet-point are a bit different. They are meant as a starting point for subclassing. The idea is that if you want to build something that looks like a `dict` but with some fancy features, making a new class which inherits from `UserDict` will be a lot easier than trying to inherit directly from dict.

### namedtuple

`namedtuple` returns a tuple with names for each position in the tuple.

In [None]:
from collections import namedtuple

In [None]:
Animal = namedtuple('Animal', 'name, legs, wings, eyes')
wasp = Animal('Wasp', 6, 2, 3)
elephant = Animal('Elephant', 4, 0, 2)

In [None]:
print(f"An {elephant.name} has {elephant.legs} legs")

It also has a useful `__repr__()`

In [None]:
elephant

### deque

I haven't actually used this much, but the idea is to work like lists but with extra speed!


In [None]:
from collections import deque

mylist = ['a', 'b', 'c', 'd']
mydeque = deque(mylist)

You get the usual `append` method and also an `appendleft` for prepending items

In [None]:
mydeque.append('e')
mydeque.appendleft('z')
mydeque

You can remove the elements with `pop` and `popleft`.

In [None]:
mydeque.pop()
mydeque.popleft()
mydeque

You can also, `clear`, `reverse` or `count` elements in your deque

In [None]:
mydeque.append('a')
mydeque.count('a')

### ChainMap

A `ChainMap` combines dictionaries returning a list of dictionaries. That doesn't sound too exciting, but it can be pretty convenient

In [None]:
from collections import ChainMap

In [None]:
dict1 = { 'a' : 1, 'b' : 2 }
dict2 = { 'c' : 3, 'b' : 4 }
chain_map = ChainMap(dict1, dict2)
chain_map

In [None]:
chain_map['c']

I didn't have to specify which dictonary I wanted or manually loop over the possibilities and test for existence.

In [None]:
chain_map.keys()

### Counter

This might be my favourite. Technically it is a subclass of the dictionary but it adds some cool features

In [None]:
from collections import Counter

In [None]:
message = "we slept in what had once been the gymnasium"

letter_count = Counter(message)

Without any work we can figure out exactly how many of each letter occur in the string. `Counter` expects an iterable and in this case we passed it a string so the iteration is over the letters of the string. The counter walks that iterable, keeping track of how many of each element it has seen before. I don't know how many times I manually wrote that piece of logic before I found `Counter`, but let's say ... embarassingly often.

An top of this there are some convenience methods. The result above was unordered, but `.most_common(<count>)` orders by value count.

In [None]:
letter_count.most_common()

### OrderedDict

Python dictionaries are unordered but `OrderedDict` lets you add order to your hashes. The reason that Python doesn't order all dictionaries is that unordered hashes can be implemented much more efficiently. In general it's best to use ordinary dicts and sort the keys or values as you need but *sometimes* you just really need a dictionary where order matters

In [None]:
from collections import OrderedDict

In [None]:
od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3

od

In [None]:
for key, value in od.items():
    print(key, value)

### defaultdict

Normally when you ask for a key of a dictionary which doesn't exist you get an error

In [None]:
from collections import defaultdict

In [None]:
mydict = {
    'first': 1,
    'second' : 2
}

In [None]:
mydict['third']

With a `defaultdict` you can define another behaviour for keys that don't exist.

In [None]:
mydefaultdict = defaultdict(int)
mydefaultdict['first'] = 1
mydefaultdict['second'] = 2

mydefaultdict['third']

We had to tell the `defaultdict` the type of the values, but when we did it picked a sensible default to return. Actually `defaultdict` will accept a generic factory (in the object oriented programming sense) to define what "sensible" should be. A typical use would be to use a function (or a lambda function) to compute what the value to use.