# Lab 6: Standard Library

## Overview

The goal of this lab is to become familiar with the tools provided by Python's standard library. We want you to gain practice with the most common utilities of the standard library and also to be aware of the rest of the tools in case you ever need them.

**We expect that most of the time in this lab will be spent _reading_ documentation for the modules in the standard library that you find intriguing or applicable to your interests.** If you have any questions about any standard library features, please ask us!

## Read

We get it. At first, reading documentation doesn't sound like a fun way to spend an afternoon. However, this is one of a rare few times when you will have dedicated class time to take a deep dive into a library tool. Python's standard library is huge, and although your interests may not span the whole library, we're willing to bet that you can find something you enjoy in the library.

Remember that you can follow along with the documentation's examples in the interactive interpreter - we recommend this approach, so that you're both reading about and practicing with the modules you like.

Several of the documentation pages have links to the module's source code - if you're interested in seeing examples of well-crafted Python modules, there's no better place to look than the standard library!

Above all, explore and ask questions! You should plan to spend **around the first half of lab time in this section**, reading about features of the standard library.

If you don't know which modules to look at, we have a list of some of our favorite modules that *weren't* covered in lecture, based on common general interests. Ask us about what you'd like to learn more about, and we'll point you in the right general direction.

The top-level categories of tools in the standard library are:

- Built-in [Functions](https://docs.python.org/3/library/functions.html), [Constants](https://docs.python.org/3/library/constants.html), [Types](https://docs.python.org/3/library/stdtypes.html), and [Exceptions](https://docs.python.org/3/library/exceptions.html)
- [Text Processing Services](https://docs.python.org/3/library/text.html)
- [Binary Data Services](https://docs.python.org/3/library/binary.html)
- [Data Types](https://docs.python.org/3/library/datatypes.html)
- [Numeric and Mathematical Modules](https://docs.python.org/3/library/numeric.html)
- [Functional Programming Modules](https://docs.python.org/3/library/functional.html)
- [File and Directory Access](https://docs.python.org/3/library/filesys.html)
- [Data Persistence](https://docs.python.org/3/library/persistence.html)
- [Data Compression and Archiving](https://docs.python.org/3/library/archiving.html)
- [File Formats](https://docs.python.org/3/library/fileformats.html)
- [Cryptographic Services](https://docs.python.org/3/library/crypto.html)
- [Generic Operating System Services](https://docs.python.org/3/library/allos.html)
- [Concurrent Execution](https://docs.python.org/3/library/concurrency.html)
- [Context Variables](https://docs.python.org/3/library/contextvars.html)
- [Networking and Interprocess Communication](https://docs.python.org/3/library/ipc.html)
- [Internet Data Handling](https://docs.python.org/3/library/netdata.html)
- [Structured Markup Processing Tools](https://docs.python.org/3/library/markup.html)
- [Internet Protocols and Support](https://docs.python.org/3/library/internet.html)
- [Multimedia Services](https://docs.python.org/3/library/mm.html)
- [Internationalization](https://docs.python.org/3/library/i18n.html)
- [Program Frameworks](https://docs.python.org/3/library/frameworks.html)
- [Graphical User Interfaces with Tk](https://docs.python.org/3/library/tk.html)
- [Development Tools](https://docs.python.org/3/library/development.html)
- [Debugging and Profiling](https://docs.python.org/3/library/debug.html)
- [Software Packaging and Distribution](https://docs.python.org/3/library/distribution.html)
- [Python Runtime Services](https://docs.python.org/3/library/python.html)
- [Custom Python Interpreters](https://docs.python.org/3/library/custominterp.html)
- [Importing Modules](https://docs.python.org/3/library/modules.html)
- [Python Language Services](https://docs.python.org/3/library/language.html)

### [Take Me To The Standard Library (Click Me!)](https://docs.python.org/3/library/)

## Write

In this section, you'll gain practice with some of the common modules in the Python standard library.

### Manipulating `collections`

**Before continuing, read the [`collections` documentation](https://docs.python.org/3/library/collections.html) at least through the section on `namedtuple()`.**

##### Working with `collections.namedtuple`

In this section, we modify code that prints out a message about each of a bunch of animals.

Rewrite the following code to be more Pythonic by using `collections.namedtuple` to add readable attribute references. The attributes for these animals are `'name'`, `'species'`, `'color'`, and `'age'`.

In [None]:
# Rewrite me to be more Pythonic!
import collections

lassie = ('Lassie', 'dog', 'black', 12)
buddy = ('Buddy', 'pupper', 'red', 0.5)  # Woof! Follow me on insta @buddypelu
astro = ('Astro', 'doggo', 'grey', 15)
mrpb = ('Mr. Peanutbutter', 'dog', 'golden', 35)
bojack = ('BoJack Horseman', 'horse', 'brown', 52)
pc = ('Princess Carolyn', 'cat', 'pink', 34)
tinkles = ('Mr. Tinkles', 'cat', 'white', 7)
pupper = ('Bella', 'pupper', 'brown', 0.5)
doggo = ('Max', 'doggo', 'brown', 5)
seuss = ('The Cat in the Hat', 'cat', 'stripey', 27)
pluto = ('Pluto (Disney)', 'dog', 'orange', 3)
plu2o = ('Pluto (space)', 'planet', 'brownish', 4500000000)
yertle = ('Yertle', 'turtle', 'green', 130)
horton = ('Horton', 'elephant', 'blue', 79)

for animal in [lassie, buddy, astro, mrpb, bojack, pc, tinkles, pupper, doggo, seuss, pluto, plu2o, yertle, horton]:
    if animal[1] == 'dog' or animal[1] == 'doggo' or animal[1] == 'pupper':
        if animal[3] > 5:
            print(animal[0] + ' is an old ' + animal[2] + ' ' + animal[1] + ' who is ' + str(animal[3]) + ' years old.')
        else:
            print(animal[0] + ' is a young ' + animal[2] + ' ' + animal[1] + ' who is ' + str(animal[3]) + ' years old.')
    else:
        print(animal[0] + ' is a ' + str(animal[3]) + '-year-old non-canine ' + animal[2] + ' ' + animal[1] + '.')
        
# Prints out:
# Lassie is an old black dog who is 12 years old.
# Buddy is a young red pupper who is 0.5 years old.
# Astro is an old grey doggo who is 15 years old.
# Mr. Peanutbutter is an old golden dog who is 35 years old.
# BoJack Horseman is a 52-year-old non-canine brown horse.
# Princess Carolyn is a 34-year-old non-canine pink cat.
# Mr. Tinkles is a 7-year-old non-canine white cat.
# Bella is a young brown pupper who is 0.5 years old.
# Max is a young brown doggo who is 5 years old.
# The Cat in the Hat is a 27-year-old non-canine stripey cat.
# Pluto (Disney) is a young orange dog who is 3 years old.
# Pluto (space) is a 4500000000-year-old non-canine brownish planet.
# Yertle is a 130-year-old non-canine green turtle.
# Horton is a 79-year-old non-canine blue elephant.

#### Using `collections.defaultdict` and `collections.Counter`

Using `/usr/share/dict/words` (alternatively, `https://stanfordpython.com/res/misc/words` if you are on Windows) as a data source, what are the three most common word lengths in the English language? Remember to strip off trailing whitespace.

In [None]:
# Change me to another file location if you've downloaded a copy of the word list.
# Recall that this file has one word per line.
FILENAME = '/usr/share/dict/words'

# TODO(you): Print the three most common word lengths in the English language.

In [None]:
import collections

def mask(word, letter):
    return ''.join('-' if letter != ch else letter for ch in word)


def largest_families(words, letter, num_families=3):
    pass


# Quick test
words = ['sees', 'says', 'sass']
print(largest_families(words, 's', num_families=1)[0])  # => Should print ['sees', 'says']

#### Working Together

Use tools from the `collections` module to implement an `Employee` database, which maintains organizational relationships among employees. Suppose that your data is provided in a tab-separated file:

```
employee_name    employee_manager    salary    department    title
employee_name    employee_manager    salary    department    title
...
employee_name    employee_manager    salary    department    title
```

If you'd like sample data to work with, you can use the following
```
sredmond	poohbear	0	CS	Instructor
poohbear	sahami	500	CS	Lecturer
tigger	poohbear	100	CS	Tiger
htiek	sahami	500	CS	Lecturer
sahami	mtl	5000	CS	Professor
guido	guido	50000	PSF	BDFL
```
Save the above text to a file, making sure that your text editor doesn't automatically replace all of tabs with spaces!

After writing code to load this information from a file, implement the following functions.

```Python
def directly_reports_to(employee, manager):
    """Return whether or not employee directly reports to manager"""
    pass

def indirectly_reports_to(employee, manager):
    """Return whether or not employee indirectly reports to manager"""
    pass
    
def in_department(dept):
    """Return a collection of all employees of a given department"""
    pass
    
def cost_of(dept):
    """Return the sum total of salaries for all employees of a given department""""
    pass
```

The primary portion of this section is parsing the file and storing the employees in a your choice of data structure keyed by some of the employees' information.

In [None]:
import collections

# Replace me with the name of a file containing employment data.
FILENAME = 'replace-me.txt'

# TODO(you): Read the data file and store the data in a data structure.


def directly_reports_to(employee, manager):
    """Return whether or not employee directly reports to manager"""
    pass


def indirectly_reports_to(employee, manager):
    """Return whether or not employee indirectly reports to manager"""
    pass


def in_department(dept):
    """Return a collection of all employees of a given department"""
    pass


def cost_of(dept):
    """Return the sum total of salaries for all employees of a given department""""
    pass

### Extracting data with `re`

If you're fairly new to regular expressions, we recommend you read through [the official Python HOWTO](https://docs.python.org/3/howto/regex.html) and walk through those examples instead of solving this portion of the lab.

Otherwise, **read through the official [`re` documentation](https://docs.python.org/3/library/re.html) through "Match Objects"** (although the next section provides some neat examples).

#### Wordplay

Using the list of words found at `/usr/share/dict/words` (or alternatively, `http://stanfordpython.com/res/misc/words`), determine all words that have all five vowels in order. That is, words that contain an `'a'`, `'e'`, `'i'`, `'o'`, and `'u'` in order, with any number (including 0) of non-vowel word characters before the 'a', between the vowels, and after the 'u'.

For example, your list should contain both `"abstemious"` and `"facetious"`. We found a total of 14 matches.

In [None]:
import re

# Change me to another file location if you've downloaded a copy of the word list.
# Recall that this file has one word per line.
FILENAME = '/usr/share/dict/words'
pattern = re.compile('your-regular-expression-here')

# TODO(you): Print out any words that have five vowels in order.

#### Regex Crossword Checker

Take a moment to play one round of [Regex Crossword](https://regexcrossword.com/) (a highly entertaining site, if you've got hours to spare).

In the spirit of Regex Crossword, we will write a function that checks arbitrary regex crosswords. Your function should take in two lists, one representing horizontal clues and one representing vertical clues, as well as the potential solution to crossword in the form a list-of-lists in row-major order (i.e. the elements are lists representing rows of the crossword. You should return whether or not the potential solution is in fact valid.

```Python
def regex_crossword_check(horizontal_patterns, vertical_patterns, candidate):
    pass  # Your implementation here
```

For example, the call corresponding to the first "Beginner" puzzle (it's called "Beatles") would look like:

```Python
horiz = [r'HE|LL|O+', r'[PLEASE]+']
vert = [r'[^SPEAK]+', r'EP|IP|EF']
candidate = [
    ['H', 'E'],
    ['L', 'P']
]
regex_crossword_check(horiz, vert, candidate)  # => True
```

and the call corresponding to the second "Experiences" puzzle (it's called "Royal Dinner") would look like:

```Python
horiz = [r'(Y|F)(.)\2[DAF]\1', r'(U|O|I)*T[FRO]+', r'[KANE]*[GIN]*']
vert = [r'(FI|A)+', r'(YE|OT)K', r'(.)[IF]+', r'[NODE]+', r'(FY|F|RG)+']
candidate = [
    ['F', 'O', 'O', 'D', 'F'],
    ['I', 'T', 'F', 'O', 'R'],
    ['A', 'K', 'I', 'N', 'G']
]
regex_crossword_check(horiz, vert, candidate)  # => True
```

Some implementation notes:

* You may want to use `re.fullmatch` instead of `re.match` or `re.search`. The former matches a pattern string against an entire string, whereas the latter methods check to see if any prefix string or any substring, respectively, match the pattern.
* You can get the width and height of the crossword from the length of the vertical and horizontal clue lists, respectively.
* Remember your friend, `zip`!

In [None]:
import re
import string


def regex_crossword_check(horizontal_patterns, vertical_patterns, candidate):
    pass  # Your implementation 


# Quick tests.
horiz = [r'HE|LL|O+', r'[PLEASE]+']
vert = [r'[^SPEAK]+', r'EP|IP|EF']
candidate = [
    ['H', 'E'],
    ['L', 'P']
]
print(regex_crossword_check(horiz, vert, candidate))  # => True


horiz = [r'(Y|F)(.)\2[DAF]\1', r'(U|O|I)*T[FRO]+', r'[KANE]*[GIN]*']
vert = [r'(FI|A)+', r'(YE|OT)K', r'(.)[IF]+', r'[NODE]+', r'(FY|F|RG)+']
candidate = [
    ['F', 'O', 'O', 'D', 'F'],
    ['I', 'T', 'F', 'O', 'R'],
    ['A', 'K', 'I', 'N', 'G']
]
print(regex_crossword_check(horiz, vert, candidate))  # => True

#### Regex Crossword Solver (challenge)

This problem is hard - skip it unless you're feeling up for an algorithmic challenge.

Write a function to solve arbitrary regular expression crosswords.

Your function should take in two lists, one representing horizontal clues and one representing vertical clues, as well as a keyword argument representing the possible alphabet. Return (or lazily generate) a list of all answers consistent with the constraints, where an answer is formed by joining the characters in row-major order (consistent with their website).

```Python
import re
import string
def regex_crossword_solve(horizontal_patterns, vertical_patterns, alphabet=string.ascii_uppercase):
    pass
```

For example, the call corresponding to the first "Beginner" puzzle (it's called "Beatles") would look like:

```Python
horiz = [r'HE|LL|O+', r'[PLEASE]+']
vert = [r'[^SPEAK]+', r'EP|IP|EF']
regex_crossword_solve(horiz, vert)
```

and would return the final answer `['HELP']` derived from the (unique, in this case) solution `[['H', 'E'], ['L', 'P']]`. If there are multiple answers, return them all.

In [None]:
import re
import string


def regex_crossword_solve(horizontal_patterns, vertical_patterns, alphabet=string.ascii_uppercase):
    pass


# Quick test.
horiz = [r'HE|LL|O+', r'[PLEASE]+']
vert = [r'[^SPEAK]+', r'EP|IP|EF']
print(regex_crossword_solve(horiz, vert))

#### Multidirectional (super challenge)

If you look though the Regex Crossword site linked above, you'll see that some puzzles (starting from "Double Cross" onwards), support multiple directions. Update your function above to work first with bidirection clues (as in "Double Cross", "Cities", "Volapük", and "Hamlet"). If you finish that, see if you can solve the types of puzzles shown in "Hexagonal."

In [None]:
import re
import string


def regex_crossword_solve_multidimensional(horizontal_patterns_lr, vertical_patterns_tb, horizontal_patterns_rl, vertical_patterns_bt, alphabet=string.ascii_uppercase):
    pass

#### Minimal Regex (super challenge)

Given a finite set of positive samples and a finite set of negative examples, can we build a regular expression that matches the positives but rejects the negatives? Of course! We could just explicitly include the positives and explicitly reject the negatives. However, this approach leads to regexes that are quite long. For this part, write an algorithm that approximately generates the smallest regular expression that matches a list of positive samples and rejects a list of negative samples. Our metric for smallest will default to shortest, but feel free to come up with your own metric.

*Note: this problem is NP-hard, and is tied to some deep results in complexity theory. For more information, check out [this CSTheory.SE post](http://cstheory.stackexchange.com/questions/1854/is-finding-the-minimum-regular-expression-an-np-complete-problem)*

In [None]:
import re
import string

# This is a super challenging problem!
def minimal_regex(positives, negatives):
    pass

### Working with `itertools`

**Before continuing, make sure you read all of the [`itertools` documentation](https://docs.python.org/3/library/itertools.html).**

#### Tabulation

Write a `tabulate` function to generate a computation lookup table. `tabulate` should take in three arguments, a function, a start number (default 0), and a step size (default 1)

```Python
def tabulate(f, start=0, step=1):
    pass
```

This function can be used as follows:

```Python
sqgen = tabulate(lambda x: x ** 2)
next(sqgen)  # => 0 (which is equal to f(0))
next(sqgen)  # => 1 (which is equal to f(1))
next(sqgen)  # => 4 (which is equal to f(2))
next(sqgen)  # => 9 (which is equal to f(3))
```

For reference, our implmentation is one line and 43 characters.

Hint: take a look at the `itertools.count` function!

In [None]:
import itertools


def tabulate(f, start=0, step=1):
    pass


sqgen = tabulate(lambda x: x ** 2)
print(next(sqgen))  # => 0 (which is equal to f(0))
print(next(sqgen))  # => 1 (which is equal to f(1))
print(next(sqgen))  # => 4 (which is equal to f(2))
print(next(sqgen))  # => 9 (which is equal to f(3))

### JSON

**Before continuing, make sure you read the [`json` documentation](https://docs.python.org/3/library/json.html) through "Basic Usage."**

Think of any broad topic that interests you. Spend a few minutes on the internet looking for a JSON file that has data that is related to your interest. If you can, download this data as a `.json` file and load it up in Python. Can you print out any information about this data set?

For example, you can get the current posts on the `/r/wallpapers` subreddit by visiting `https://www.reddit.com/r/wallpapers.json`.

The U.S. Government also makes JSON datasets publicly available at `https://catalog.data.gov/dataset?res_format=JSON`.

In [None]:
import json

JSON_FILE = 'my-downloaded-file.json'  # Rename me after downloading a file.

# Load the JSON file into a Python data structure.

# Process something about the data.

### `random`

**Before continuing, make sure you read the [`random` documentation](https://docs.python.org/3/library/random.html) through "Functions for Sequences."**

There's no code in this section - just read the documentation! It's rather short.

### Using `sys` for command-line tools.

#### Addition

Write a Python script `add.py` (a new file) that can be run on the command line with any number of additional arguments representing numbers that you want to add up. Your script should print the sum of numeric arguments. If there are arguments that can't be converted to floats, ignore them. You can use what we learned about exceptional control flow to determine if a number is convertible to a float. If there are no additional arguments to your script, you should print an error message and exit.

Recall you can use `sys.argv` to access the command-line arguments.

You should be able to invoke your script from the command line as follows:

```
(cs41-env)$ python add.py 4 1
5.0
(cs41-env)$ python add.py 17 38 "Hey wassup" "hello"
55.0
(cs41-env)$ python add.py 8 6 7 5 3 0 9
38.0
(cs41-env)$ python add.py
Usage: python add.py <nums>
    
    Add some numbers together
```

##### Argument Parsing with `argparse`

Python's [`argparse` module](https://docs.python.org/3/library/argparse.html) provides a nicer way to define scripts that accept commmand-line arguments. Read through the `argparse` documentation and then rewrite the above program using the tools provided by `argparse`.

#### `tree` (challenge)

Write a program that emulates the command-line utility `tree`, which pretty-prints the directory structure rooted by an argument name. If there is no argument, use the current working directory. For example,

```
$ python3 tree.py python-labs/
python-labs/
├── LICENSE
├── NOTES.md
├── README.md
├── markdown
│   ├── lab1-warmup.md
│   ├── lab2-datastructures.md
│   ├── lab3-functions.md
│   ├── lab4-fp.md
│   ├── lab5-oop.md
│   ├── lab6-standardlibrary.md
│   ├── lab7-thirdparty.md
│   └── lab8-pythonecosystem.md
└── notebooks
    ├── lab1-warmup-notebook.ipynb
    ├── lab2-datastructures-notebook.ipynb
    ├── lab3-functions-notebook.ipynb
    ├── lab4-fp-notebook.ipynb
    ├── lab5-oop-notebook.ipynb
    ├── lab6-standardibrary-notebook.ipynb
    ├── lab7-thirdparty-notebook.ipynb
    └── lab8-pythonecosystem-notebook.ipynb
```

The above is just an example - don't worry if your actual `python-labs/` directory doesn't look like this.

Use the [`pathlib` library](https://docs.python.org/3/library/pathlib.html) for filesystem navigation. For implementation details, check out `tree`'s [man page](http://linux.die.net/man/1/tree) or this [more helpful description](http://www.computerhope.com/unix/tree.htm). You don't need to implement any of the command-line flags for this part - just focus on navigating the file system.

#### Improving `tree` (super challenge)

Update your `tree` program to handle more advanced use cases, listed in the man page above. Can you handle symbolic links, maximum depth recursion, or pattern matching?

You can make this tool as powerful as you'd like.

### All Together Now

This final problem will incorporate all of the modules we've seen so far. We'll build a tool to determine the shortest airport journey between any two airports.

#### Airport Data
First, let's look at our data. OpenFlights publishes the following data files:

* [Airlines](https://raw.githubusercontent.com/jpatokal/openflights/master/data/airlines.dat)
* [Airports](https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat)
* [Routes](https://raw.githubusercontent.com/jpatokal/openflights/master/data/routes.dat)

For information about the data itself, [DataHub](https://datahub.io/dataset/open-flights) has a good writeup on the schema.

The information [by OpenFlights itself](https://openflights.org/data.html) is also quite good for getting an overview of the data.

You will write a script that, when given two airport codes (like SFO and JFK) and a maximum segment count, prints all possible ways to get from the source airport to the destination airport in at most that many segments:

```
$ python3 flights.py SFO JFK 2
SFO -> JFK
SFO -> LAX -> JFK
SFO -> ORD -> JFK
SFO -> DFW -> JFK
...
SFO -> PDX -> JFK
```

How powerful can you make this script? Consider adding extra features that utilize all of the standard library modules we've seen here.

### Cute Modules

#### `turtle` - Turtle graphics

Run the following code. A graphical window should appear that shows your new turtle friend! What other interesting shapes can you make?

In [None]:
import turtle

turtle.left(180)
turtle.forward(200)
turtle.left(180)

turtle.color('red', 'yellow')
turtle.begin_fill()

for _ in range(36):
    turtle.forward(400)
    turtle.left(170)
    if abs(turtle.pos()) < 1:
        break

turtle.end_fill()
turtle.done()

#### `unicodedata` - Unicode Database

Think about your favorite emoji. Can you guess its official name?

In [None]:
import unicodedata

print(unicodedata.lookup('SLICE OF PIZZA'))  # => '🍕'

print(unicodedata.name('👌'))  # => 'OK HAND SIGN'

#### `this` and `antigravity`

Just run the following lines of code.

In [None]:
import this

In [None]:
import antigravity

## Import Semantics

If you've made it through this far, congratulations! This was a long lab. If you're interested in the nitty-gritty details of Python's import mechanics, you can read through the [specification of the import system in the official language reference](https://docs.python.org/3/reference/import.html). It's a fairly long read but it can precisely answer any lingering questions you might have about exactly how Python imports modules and packages.

> With <3 by @sredmond