# A Brief Tour of the Python Standard Library

## Topics
* [What is the Standard Library?](#What-is-the-Standard-Library?)
* [Scripting Modules](#Scripting-Modules)
    * `os` module
    * `os.path` module
    * `sys` module
    * `shutil` module
    * `glob` module
    * `argparse` module
    * `re` module
* [Special Data Types](#Special-Data-Types)
    * `collections` module
    * `datetime` module
    * `decimal` module
* [Concurrency](#Concurrency)
    * `subprocess` module
    * `threading` and `multiprocessing` modules
    * `asyncio` module
    

## What is the Standard Library?
The Python Standard Library fulfills Python's "Batteries Included" philosophy. It is a set of packages and modules contributed by the Python community and adopted into the core Python distribution.
* Installed by default with most distributions of Python
* Just regular modules and packages
* Some of it may require extra system packages
* Continually evolving
* Hundreds of modules! https://docs.python.org/3/library/index.html

In [1]:
import sys
sys.path

['/home/llacarbonara/iea-cohort-07/05_python_for_devops',
 '/usr/lib64/python37.zip',
 '/usr/lib64/python3.7',
 '/usr/lib64/python3.7/lib-dynload',
 '',
 '/usr/local/lib64/python3.7/site-packages',
 '/usr/local/lib/python3.7/site-packages',
 '/usr/lib64/python3.7/site-packages',
 '/usr/lib/python3.7/site-packages',
 '/usr/local/lib/python3.7/site-packages/IPython/extensions',
 '/home/llacarbonara/.ipython']

## Scripting Modules

### The `os` and `os.path` modules
* operating system stuff
* i.e., dealing with files, directories, etc.
* handles cross-platform path issues (don't do this manually!)
* also running commands outside of Python

In [4]:
import os
os.system('ls') # doesn't print anything in the notebook, 
# but try it in Python shell

0

In [5]:
os.system('pluma')

0

In [7]:
os.system('touch newfile')
os.system('ls newfile')

0

In [6]:
# get the current working directory
os.getcwd()

'/home/llacarbonara/iea-cohort-07/05_python_for_devops'

In [8]:
# Does the file 'newfile' exist?
os.path.exists('newfile')

True

In [9]:
# create a directory
os.mkdir('newdir')

In [10]:
# is 'newdir' a file?
os.path.isfile('newdir')

False

In [11]:
#is 'newdir' a directory?
os.path.isdir('newdir')

True

In [17]:
print(os.path.join('/home', 'jr/code'))

#with and without preceding '/'#

print(os.path.join('/home', '/jr/code'))

/home/jr/code
/jr/code


### The __`sys`__ module
* system-specific parameters and functions
* we've already seen some examples, __`argv`__ and __`path`__

In [18]:
import sys
sys.path

['/home/llacarbonara/iea-cohort-07/05_python_for_devops',
 '/usr/lib64/python37.zip',
 '/usr/lib64/python3.7',
 '/usr/lib64/python3.7/lib-dynload',
 '',
 '/usr/local/lib64/python3.7/site-packages',
 '/usr/local/lib/python3.7/site-packages',
 '/usr/lib64/python3.7/site-packages',
 '/usr/lib/python3.7/site-packages',
 '/usr/local/lib/python3.7/site-packages/IPython/extensions',
 '/home/llacarbonara/.ipython']

In [19]:
sys.maxsize

9223372036854775807

In [20]:
2 ** 63 - 1

9223372036854775807

In [21]:
# To exit a Python script, use sys.exit()
# Won't work here, because we're in the notebook
sys.exit()

SystemExit: 

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


### __`shutil`__ module
* shell utilities
* e.g., high-level file operations

In [22]:
import os
print(os.system('ls newfileCopy'))

0


In [23]:
import shutil
# create a copy of a file
shutil.copy('newfile', 'newfileCopy')
# os.system('cp newfile newfileCopy')

'newfileCopy'

In [24]:
os.system('ls newfileCopy')

0

In [25]:
shutil.move('newfileCopy', 'newerfile')

'newerfile'

In [26]:
os.system('ls newerfile')

0

### __`glob`__ module
* __`glob()`__ function matches file or directory names using Linux shell rules rather than regular expression syntax

In [27]:
import glob
glob.glob('n*')

['newerfile', 'newfile', 'newdir']

In [28]:
glob.glob('*e')

['datetime', 'newerfile', 'newfile']

In [29]:
glob.glob('???')

['abc']

In [30]:
import os
os.system('touch abc')

0

In [31]:
glob.glob('???')

['abc']

In [32]:
glob.glob("x*")

[]

### `argparse` module
* Allow command line argument parsing for more complex command
* Follows standards for Linux commands
* Provides automatic help, nicely formatted output

In [40]:
import argparse

parser = argparse.ArgumentParser(
    description='argparse example')

parser.add_argument('-a', action="store_true",
                    default=False)
parser.add_argument('-b', action="store", dest="blog")
parser.add_argument('-c', action="store", dest="c",
                    type=int)
parser.add_argument('--version', action='version', 
                    version='%(prog)s 2.0')

# parse args from command line, which won't work in the notebook
#args = parser.parse_args()

# $ python3 myscript.py -a -b happy
args = parser.parse_args(['-a', '-b happy'])

print(args)

if args.a:
    print("-a was passed")
else:
    print('a wasnt')
if args.blog:
    print("-b", args.blog, "was passed")
else:
    print('b wasnt')
if args.c:
    print("-c", args.c, "was passed (int)")
else:
    print('c wasnt')
    
os.system('ls -al')

Namespace(a=True, blog=' happy', c=None)
-a was passed
-b  happy was passed
c wasnt
0


In [41]:
parser.parse_args(["--help"])

usage: ipykernel_launcher.py [-h] [-a] [-b BLOG] [-c C] [--version]

argparse example

optional arguments:
  -h, --help  show this help message and exit
  -a
  -b BLOG
  -c C
  --version   show program's version number and exit


SystemExit: 0

### `re` module
* `re` module allow regular expression processing inside Python
* Several different functions for matching text patterns
* Support subgroups in matching

### Quick Review: Regular Expressions
* special sequence of characters that helps you find specific text sequences in strings, files, etc.
* "wildcard" characters take the place of a group of characters

### RE Metacharacters
```
. = any character except newline
^ = beginning of line/string
$ = end of line/string
* = 0+ of the preceding RE
+ = 1+ of the preceding RE
? = 0 or 1 instances of preceding RE
{n} = exactly n instances of the preceding RE
[] = match character set or range, e.g., [aeiou], [a-z], etc.
(…) = matches the RE inside the parens, and creates a group 
```

In [None]:
import re
re.match('a.*a', 'alphabet')

In [None]:
re.match('h.*t', 'alphabet')

In [None]:
re.search('h.*t', 'alphabet')

In [None]:
re.search('a.*z', 'alphabet')

In [None]:
# you can search for fixed strings, rather than using wildcards...
import re
linenum = 0

for line in open('poem.txt'):
    linenum += 1
    if re.search('the', line):
        print('{}: {}'.format(linenum, 
                re.sub('the', '---', line)), end='')

In [None]:
!cat poem.txt

In [None]:
import re
o = re.search('l.*e', 'alphabet')
o.re

In [None]:
o.re.pattern

In [None]:
o.string

In [None]:
o.start(), o.end()

In [None]:
o.string[o.start():o.end()]

## Lab: Write a Cheap Imitation of __`grep`__ in Python
* using the modules we've learned, write a Python program which takes two command line arguments, a filename and a regex pattern
* your program should act like __`grep`__ in that it should search for the pattern in each line of the file
* if the pattern matches a given line, print out the line
* BONUS: Provide extra options for your script to change the behavior.

## Bonus Lab: Pluralization
* write a program (or function) which takes a word as a command line argument and outputs the plural of that word
* your program should follow these rules:
  * if the word ends in 's', 'x', or 'z', the plural adds 'es', e.g., ax => axes, loss => losses
  * if the word ends in an 'h', which is not preceded by a vowel or 'd', 'g', 'k', 'p', 'r', or 't', the plural adds 'es', e.g., moth => moths, but match => matches
  * if the word ends in a 'y' which is not preceded by a vowel, then the plural strips the 'y' and adds 'ies', e.g., baby => babies, but boy => boys
  * otherwise just add 's'

## Special Data Types

### The `collections` module
* contains specialized data structures
* specialized dictionaries `defaultdict` and `Counter`
* double-ended queue `deque`

In [44]:
employees = [("Accounting", 'Steve'), ("Engineering",'Susan'), ("Accounting", 'Bob'), ("Marketing",'Dan')]
by_dept = dict(employees)
by_dept

{'Accounting': 'Bob', 'Engineering': 'Susan', 'Marketing': 'Dan'}

In [45]:
regular_dict = {}
for dept, name in employees:
    if dept not in regular_dict:
        regular_dict[dept] = []
    regular_dict[dept].append(name)
regular_dict

{'Accounting': ['Steve', 'Bob'],
 'Engineering': ['Susan'],
 'Marketing': ['Dan']}

In [49]:
import collections

name_by_org = collections.defaultdict(list)

In [50]:
type(list)

type

In [51]:
name_by_org["Accounting"] = ["Joe", "Bob", "Steve"]
name_by_org["Accounting"].append("Jeff")
name_by_org

defaultdict(list, {'Accounting': ['Joe', 'Bob', 'Steve', 'Jeff']})

In [52]:
name_by_org["Platforms"].append("Lisa")
name_by_org["Engineering"].append("Sam")
name_by_org

defaultdict(list,
            {'Accounting': ['Joe', 'Bob', 'Steve', 'Jeff'],
             'Platforms': ['Lisa'],
             'Engineering': ['Sam']})

In [54]:
deq = collections.deque(["First", "Second", "Third", "Fourth"])
deq

deque(['First', 'Second', 'Third', 'Fourth'])

In [55]:
deq.rotate(2)
deq

deque(['Third', 'Fourth', 'First', 'Second'])

In [56]:
deq.popleft()

'Third'

In [57]:
deq.popleft()

'Fourth'

### The `decimal` module
* provides a fixed-point decimal type
* import when you can NOT have unexpected rounding (i.e. financials)
* follows a set standard

In [58]:
import decimal
d = decimal.Decimal(5)
d

Decimal('5')

In [59]:
d = decimal.Decimal(5.34)
d

Decimal('5.339999999999999857891452847979962825775146484375')

In [60]:
d = decimal.Decimal("5.34")
d

Decimal('5.34')

In [61]:
decimal.getcontext().prec

28

In [62]:
decimal.getcontext()

Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[FloatOperation], traps=[InvalidOperation, DivisionByZero, Overflow])

### Rounding Modes
```
decimal.ROUND_CEILING
Round towards Infinity.

decimal.ROUND_DOWN
Round towards zero.

decimal.ROUND_FLOOR
Round towards -Infinity.

decimal.ROUND_HALF_DOWN
Round to nearest with ties going towards zero.

decimal.ROUND_HALF_EVEN
Round to nearest with ties going to nearest even integer.

decimal.ROUND_HALF_UP
Round to nearest with ties going away from zero.

decimal.ROUND_UP
Round away from zero.

decimal.ROUND_05UP
Round away from zero if last digit after rounding towards zero would have been 0 or 5; otherwise round towards zero.
```

In [63]:
float_pi = 22 / 7
dec_pi = decimal.Decimal(22) / decimal.Decimal(7)
print(float_pi, dec_pi)

3.142857142857143 3.142857142857142857142857143


In [64]:
decimal.Decimal(355) / decimal.Decimal(113)

Decimal('3.141592920353982300884955752')

### The `datetime` module
* handles date and time math
* provides `date`, `time`, and `datetime` types
* flexible string formatting
* does NOT provide timezone lists (they change a lot!)

In [None]:
import datetime

python_birthday = datetime.datetime.strptime("02/20/91", "%x")

print(python_birthday.year)
print(python_birthday.month)
print("Day of the week (Monday = 0)", python_birthday.weekday())

In [None]:
python_birthday.isoformat()

In [None]:
now = datetime.date.today()
three_weeks_ago = now - datetime.timedelta(weeks=3)
three_weeks_ago

**NOTE**: Timezones change frequently for social, political, and various other reasons.  You can manage these manually, or the third party package `dateutil` provides a timezone database and functionality and is compatible with regular the `datetime` module.  

### Lab: datetime manipulation
* Using functions from the `datetime` module, write a small script called `convert_date.py` that converts an epoch timestamp to something human readable.
* Have your script prompt the user for an epoch time, or allow the user to pipe in an epoch time from bash like so:  `date +%s | python3 convert_date.py`
* BONUS: Provide extra options to your script to switch output between a "friendly" timestamp and an ISO 8601 format timestamp

In [71]:
import datetime
epoch=204868944
current = datetime.datetime.fromtimestamp(epoch)
print(current)

1976-06-29 00:02:24


In [87]:
#help(datetime)
print(datetime.datetime.now())
#print((datetime.datetime.now()).isoformat())

2022-05-13 10:01:22.975838


In [83]:
current +%s

SyntaxError: invalid syntax (<ipython-input-83-468eeee5feff>, line 1)

## Concurrency

### `subprocess` module
* supplants __`os.system()/os.spawn()`__, both of which used to be standard way to run programs outside of Python
* Allow running and controlling other programs, even interactively

In [88]:
import subprocess
ret = subprocess.getoutput('date')
ret

'Fri May 13 10:08:13 EDT 2022'

In [89]:
ret = subprocess.getoutput('ls')
ret

'01 Introduction - Python for DevOps.ipynb\n01 Introduction - Python for DevOps - LL.ipynb\n02 More Python for DevOps - LL.ipynb\n03 A Brief Tour of the Standard Library - LL.ipynb\n04 The Python Ecosystem.ipynb\nabc\ndatetime\nhamlet.txt\nimages\nmymodule.py\nnewdir\nnewerfile\nnewfile\npoem.txt\nprog1.py\n__pycache__\nrequirements.txt\nsubprocess'

In [90]:
print(ret)

01 Introduction - Python for DevOps.ipynb
01 Introduction - Python for DevOps - LL.ipynb
02 More Python for DevOps - LL.ipynb
03 A Brief Tour of the Standard Library - LL.ipynb
04 The Python Ecosystem.ipynb
abc
datetime
hamlet.txt
images
mymodule.py
newdir
newerfile
newfile
poem.txt
prog1.py
__pycache__
requirements.txt
subprocess


In [91]:
help(subprocess.run)

Help on function run in module subprocess:

run(*popenargs, input=None, capture_output=False, timeout=None, check=False, **kwargs)
    Run command with arguments and return a CompletedProcess instance.
    
    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.
    
    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.
    
    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.
    
    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "std

In [92]:
procinfo = subprocess.run(["grep", "Python", "hamlet.txt"])
print(type(procinfo))
print(procinfo.args, "returned", procinfo.returncode)
print(procinfo.stdout, procinfo.stderr)

<class 'subprocess.CompletedProcess'>
['grep', 'Python', 'hamlet.txt'] returned 1
None None


In [93]:
procinfo = subprocess.run("cat poem.txt | grep wood", shell=True, capture_output=True)
procinfo

CompletedProcess(args='cat poem.txt | grep wood', returncode=0, stdout=b'Two roads diverged in a yellow wood,\t\nTwo roads diverged in a wood, and I\xe2\x80\x94\t\n', stderr=b'')

### The `threading` and `multiprocessing` modules
* Similar interfaces - one creates *threads* and one creates *processes*
* Fine-grained control with `Thread` and `Process` types
    * Simpler concurrency with `ThreadPoolExecutor` and `ProcessPoolExecutor` available in the `concurrent.futures` module
* Tradeoffs
    * Threading - The GIL restricts multiple cores
    * Multiprocessing - memory and communication
* Concurrent code is hard - don't make it your hammer!

### What are threads, anyway?
* We studied **processes**, which are any running program and all the associated information.
* The kernel schedules processes to run on the CPU
* Within a given process, a program can create *threads of execution* which can run on a CPU (or core)
* Threads **share all resources** within the process, so we must write *thread-safe* code

In [97]:
import os
import threading
import time

# This function gets started in a new thread
def worker():
    thread_name = threading.current_thread().name
    pid = os.getpid()
    print("Hello from thread", thread_name, "in process", pid)
    print(f"Worker {pid}:{thread_name} is done!")
    time.sleep(2)
    print('sleeping for 2')
# This code runs first, in the main thread
print(
    "Starting in the main thread:", 
    threading.current_thread().name,
    "pid =",
    os.getpid())
# Create the threads
# "target" is what the thread should run
threads = [threading.Thread(target=worker) for i in range(10)]
print("Running worker threads and going to sleep for 10 seconds")

# Start the threads running
for thread in threads:
    thread.start()
time.sleep(10)
print("Waiting for threads to finish")
# Wait for the threads to finish up
for thread in threads:
    thread.join()
print("Main thread is done!")

Starting in the main thread: MainThread pid = 7503
Running worker threads and going to sleep for 10 seconds
Hello from thread Thread-34 in process 7503Hello from thread
Worker 7503:Thread-34 is done!
 Thread-35 in process 7503
Worker 7503:Thread-35 is done!
Hello from thread Thread-36 in process 7503
Worker 7503:Thread-36 is done!
Hello from thread Thread-37 in process 7503
Worker 7503:Thread-37 is done!
Hello from thread Thread-38 in process 7503
Worker 7503:Thread-38 is done!
Hello from thread Thread-39 in process 7503
Worker 7503:Thread-39 is done!
Hello from thread Thread-40 in process 7503
Worker 7503:Thread-40 is done!
Hello from thread Thread-41 in process 7503
Worker 7503:Thread-41 is done!
Hello from thread Thread-42 in process 7503
Worker 7503:Thread-42 is done!
Hello from thread Thread-43 in process 7503
Worker 7503:Thread-43 is done!
sleeping for 2
sleeping for 2
sleeping for 2
sleeping for 2
sleeping for 2
sleeping for 2
sleeping for 2
sleeping for 2
sleeping for 2
sleepin

In [98]:
import os
import multiprocessing
import threading
import time

# This function gets started in a new subprocess
def worker():
    thread_name = threading.current_thread().name
    pid = os.getpid()
    print("Hello from thread", thread_name, "in process", pid)
    print(f"Worker {pid}:{thread_name} is done!")
    

# This code runs first, in the parent process    
print(
    "Starting in the main thread:", 
    threading.current_thread().name,
    "pid =",
    os.getpid())

# Create the subprocesses
# "target" is what the child process should run
processes = [multiprocessing.Process(target=worker) for i in range(10)]
print("Running worker threads and going to sleep for 10 seconds")
# Start the processes running
for proc in processes:
    proc.start()
time.sleep(10)
print("Waiting for threads to finish")
# Wait for the processes to finish up
for proc in processes:
    proc.join()
print("Main thread is done!")

Starting in the main thread: MainThread pid = 7503
Running worker threads and going to sleep for 10 seconds
Hello from thread MainThread in process 19246
Worker 19246:MainThread is done!
Hello from thread MainThread in process 19247
Hello from thread MainThread in process 19248
Hello from thread MainThread in process 19250
Worker 19247:MainThread is done!
Hello from thread MainThread in process 19249
Worker 19250:MainThread is done!
Worker 19248:MainThread is done!
Worker 19249:MainThread is done!
Hello from thread MainThread in process 19279
Hello from thread MainThread in process 19259
Hello from thread MainThread in process 19273
Worker 19279:MainThread is done!
Worker 19259:MainThread is done!
Worker 19273:MainThread is done!
Hello from thread MainThread in process 19282
Worker 19282:MainThread is done!
Hello from thread MainThread in process 19291
Worker 19291:MainThread is done!
Waiting for threads to finish
Main thread is done!


In [105]:
import os
import multiprocessing
import queue
import time

def worker(q):
    pid = os.getpid()
    task=0
    while True:
        try:
            next_task = q.get(timeout=1)
            task+=1
        except queue.Empty:
            print("Worker", pid, "quitting.")
            break
        print("Worker", pid, "processing:", "Task #",task, next_task)
        time.sleep(1)

# Create work tasks in the main process
# and use a Queue to distribute work to the
# child processes
to_do = multiprocessing.Queue()
for i in range(100):
    to_do.put(f"Record #{i}")

processes = [
    multiprocessing.Process(target=worker, args=(to_do,)) 
    for i in range(10)]

for proc in processes:
    proc.start()
for proc in processes:
    proc.join()
print("All work complete!")

Worker 28933 processing: Task # 1 Record #0
Worker 28934 processing: Task # 1 Record #1
Worker 28935 processing: Task # 1 Record #2
Worker 28936 processing: Task # 1 Record #3
Worker 28939 processing: Task # 1 Record #4
Worker 28947 processing: Task # 1 Record #6
Worker 28944 processing: Task # 1 Record #5
Worker 28955 processing: Task # 1 Record #8
Worker 28950 processing: Task # 1 Record #7
Worker 28958 processing: Task # 1 Record #9
Worker 28933 processing: Task # 2 Record #10
Worker 28934 processing: Task # 2 Record #11
Worker 28935 processing: Task # 2 Record #12
Worker 28936 processing: Task # 2 Record #13
Worker 28947 processing: Task # 2 Record #14
Worker 28944 processing: Task # 2 Record #15
Worker 28939 processing: Task # 2 Record #16
Worker 28950 processing: Task # 2 Record #17
Worker 28958 processing: Task # 2 Record #19
Worker 28955 processing: Task # 2 Record #18
Worker 28933 processing: Task # 3 Record #20
Worker 28934 processing: Task # 3 Record #21
Worker 28935 process

### The `asyncio` module
* Allows asynchronous processing WITHOUT creating threads or processes
* Uses the new `async` and `await` keywords (Python 3.5+)
* Uses an *event loop* to run tasks in *coroutines*

### Cooperative Multitasking vs. Preemptive Multitasking
* `threading` and `multiprocessing` both use *preemptive multitasking*
    * Operating system is aware of the threads and processes
    * The OS (kernel) can preempt (interrupt) a thread or process **at any time** and we have no control over it!
    * Requires OS-level synchronization objects and can lead to subtle bugs like *race conditions*
* `asyncio` uses *cooperative multitasking*
    * Only runs in a single thread, no OS-level synchronization
    * Event loop keeps track of ready tasks vs. waiting tasks
    * Currently running task must *voluntarily yield control* back to the event loop

In [None]:
import asyncio

# Async declares this as a coroutine
async def blip_on_2():
    # Await yields control to the event loop
    for i in range(10):
        await asyncio.sleep(2)
        print(f"Blip #{i}!")

# Coroutines can take args, just like regular functions
async def bloop_on_X(x):
    for i in range(10):
        await asyncio.sleep(x)
        print(f"Bloop #{i}!")
        
async def read_poem():
    with open("poem.txt") as poem:
        for line in poem:
            print(line)
            await asyncio.sleep(0.5)
            
            
async def main():
    # Create tasks so these coroutines can all run concurrently
    blip_task = asyncio.create_task(blip_on_2())
    bloop_task = asyncio.create_task(bloop_on_X(3))
    read_poem_task = asyncio.create_task(read_poem())
    
    print("Main: Waiting on tasks to complete!")
    # Wait for all tasks to complete
    await blip_task
    await bloop_task
    await read_poem_task
    print("All Done!")

# Starts the event loop with the main() coroutine
asyncio.run(main())

### Concurrency Recap
* Concurrency doesn't magically speed up code - it simply takes advantage the time your code is *already sitting idle* waiting on I/O
* Only `multiprocessing` truly can run *parallel* code on multiple cores, but you pay a resource cost
* Concurrent code syncronization can be difficult - look for easier cases
    * Tasks that are completely independent
    * Tasks that wait around for file or network I/O
    * Tasks that can be easily broken into batches and combined at the end