# Writing Scripts

Several modules in the Python standard library are especially useful when writing scripts to run at the command line.  That is not to say that you will not also use these when writing large applications, micro-services, code in notebooks, or in other contexts.  But the special emphasis of this lesson is on scripting small, repeatable tasks, such as in system administration.

For this purpose, we look at `fileinput`, `argparse`, `time`, `secrets`, and `tempfile`. 

## Module: fileinput

The `fileinput` module really just does one thing, but it avoids boilerplate and potential errors in a very common pattern for command-line tools, especially on Unix-like systems.  

Many system tools like `cut` or `grep` or `tac` use a pattern where they process lines input from all files named as arguments, and if no arguments are named, they take input from standard input.  Often these tools also follow a convention where `'-'` indicates standard input as a pseudo-filename.

As a toy example, let us write a script that does the following:
    
* Produce a histogram of the lengths of input lines
* Ignore lines marked as citation indicators by starting with `*** Source:`
* Write citation lines to standard error
* Accept inputs from filenames and/or standard input

### line-histogram

```python
#!/usr/bin/env python
import fileinput
from sys import stderr
from collections import Counter
```
```python
lengths = list()
for line in fileinput.input():
    if line.startswith('*** Source: '):
        print(line[12:].strip(), file=stderr)
    else:
        lengths.append(len(line))
        
print(Counter(lengths).most_common())
```

In [1]:
%%bash
./line-histogram *.txt

[(30, 4), (31, 3), (27, 2), (33, 1), (16, 1), (29, 1), (34, 1), (26, 1), (22, 1), (14, 1), (17, 1), (25, 1), (15, 1), (21, 1)]


Henry Carey, "Namby Pamby", 1725
William King, "Useful Transactions in Philosophy", 1708
John Newbery, "Mother Goose's Melody", 1765


In [2]:
%%bash
./line-histogram king.txt carey.txt newberry.txt

[(30, 4), (31, 3), (27, 2), (16, 1), (29, 1), (34, 1), (26, 1), (22, 1), (33, 1), (14, 1), (17, 1), (25, 1), (15, 1), (21, 1)]


William King, "Useful Transactions in Philosophy", 1708
Henry Carey, "Namby Pamby", 1725
John Newbery, "Mother Goose's Melody", 1765


In [3]:
%%bash
cat *.txt | ./line-histogram

[(30, 4), (31, 3), (27, 2), (33, 1), (16, 1), (29, 1), (34, 1), (26, 1), (22, 1), (14, 1), (17, 1), (25, 1), (15, 1), (21, 1)]


Henry Carey, "Namby Pamby", 1725
William King, "Useful Transactions in Philosophy", 1708
John Newbery, "Mother Goose's Melody", 1765


In [4]:
%%bash
cat newberry.txt | ./line-histogram king.txt carey.txt

[(30, 4), (31, 2), (27, 2), (16, 1), (29, 1), (34, 1), (26, 1), (22, 1), (33, 1)]


William King, "Useful Transactions in Philosophy", 1708
Henry Carey, "Namby Pamby", 1725


In [5]:
%%bash
cat newberry.txt | ./line-histogram king.txt - carey.txt 

[(30, 4), (31, 3), (27, 2), (16, 1), (29, 1), (34, 1), (26, 1), (22, 1), (14, 1), (17, 1), (25, 1), (15, 1), (21, 1), (33, 1)]


William King, "Useful Transactions in Philosophy", 1708
John Newbery, "Mother Goose's Melody", 1765
Henry Carey, "Namby Pamby", 1725


## Module: argparse

The `argparse` module provides general purpose parsing of command-line arguments and options.

The standard library provides an older `getopt` module as well, but this is mostly only relevant for direct porting from the C equivalent.  Several third-party tools like `click`, `prompt_toolkit` and `docopt` each have their own philosophy about doing this.

Unless you have compelling reason to do otherwise, `argparse` should be your first and usual choice.  We enhance the `line-histogram` tool to add some capabilities.

```python
from sys import stderr, stdin
from collections import Counter
import argparse
import fileinput
```
```python
parser = argparse.ArgumentParser()
parser.add_argument('-v', '--verbose', action='store_true',
                    help="Display output in 'pretty' format")
parser.add_argument('-c', '--cite', action='store_true',
                    help="Echo citation information to STDERR")
parser.add_argument('files', nargs='*',
                    help="Files to process (default to STDIN)")
parser.add_argument('--limit', type=int, default=99,
                    help="Do not process more than this number of files")
args = parser.parse_args()
```

```python
lengths = list()
for line in fileinput.input(args.files[:args.limit] or '-'):
        if line.startswith('*** Source: '):
            if args.cite:
                print(line[12:].strip(), file=stderr)
        else:
            lengths.append(len(line))
```
```python
if args.verbose:
    for length, num in Counter(lengths).most_common():
        print(f"Length {length}: {num}")
else:
    print(Counter(lengths).most_common())
```

In [6]:
%%bash
./line-histoplus --help

usage: line-histoplus [-h] [-v] [-c] [--limit LIMIT] [files [files ...]]

positional arguments:
  files          Files to process (default to STDIN)

optional arguments:
  -h, --help     show this help message and exit
  -v, --verbose  Display output in 'pretty' format
  -c, --cite     Echo citation information to STDERR
  --limit LIMIT  Do not process more than this number of files


In [7]:
%%bash
# By default we now do not output citation to STDERR
cat carey.txt | ./line-histoplus newberry.txt king.txt -

[(30, 4), (31, 3), (27, 2), (14, 1), (17, 1), (25, 1), (15, 1), (21, 1), (16, 1), (29, 1), (34, 1), (26, 1), (22, 1), (33, 1)]


In [8]:
%%bash
# We can enable the citation lines if we like
cat carey.txt | ./line-histoplus --cite newberry.txt king.txt -

[(30, 4), (31, 3), (27, 2), (14, 1), (17, 1), (25, 1), (15, 1), (21, 1), (16, 1), (29, 1), (34, 1), (26, 1), (22, 1), (33, 1)]


John Newbery, "Mother Goose's Melody", 1765
William King, "Useful Transactions in Philosophy", 1708
Henry Carey, "Namby Pamby", 1725


In [9]:
%%bash
# A more human readable histogram might be desirable
./line-histoplus -v *.txt

Length 30: 4
Length 31: 3
Length 27: 2
Length 33: 1
Length 16: 1
Length 29: 1
Length 34: 1
Length 26: 1
Length 22: 1
Length 14: 1
Length 17: 1
Length 25: 1
Length 15: 1
Length 21: 1


In [10]:
%%bash
# Combine multiple switches
./line-histoplus --verbose --limit 2 -c *.txt

Length 30: 4
Length 27: 2
Length 31: 2
Length 33: 1
Length 16: 1
Length 29: 1
Length 34: 1
Length 26: 1
Length 22: 1


Henry Carey, "Namby Pamby", 1725
William King, "Useful Transactions in Philosophy", 1708


## Module: time

The `time` module deals with basic details of the system clock, and allows for timing of operations.  The `datetime` module is subject of a separate course, and deals with times in the sense of calendars and durations; the topics overlap and that course as more information.  A typical use of `time` in scripts is simply timing how long steps take and recording when events occurred.

In [11]:
import time
start = time.time()
print("Started at:", time.ctime(start))
print("Waiting...")
time.sleep(5)
end = time.time()
print("Ended at:", time.ctime(end))
print(f"Duration: {end-start:.4f} seconds")

Started at: Mon Sep  7 20:43:28 2020
Waiting...
Ended at: Mon Sep  7 20:43:33 2020
Duration: 5.0058 seconds


We can view the clock either as a number of seconds, or in a more structured way.

In [12]:
import time
print("Time in seconds-since-epoch:")
print(time.time(), '\n-----')
print("Time as a structure:")
print(time.localtime())

Time in seconds-since-epoch:
1599525813.6711996 
-----
Time as a structure:
time.struct_time(tm_year=2020, tm_mon=9, tm_mday=7, tm_hour=20, tm_min=43, tm_sec=33, tm_wday=0, tm_yday=251, tm_isdst=1)


We can convert between these formats and format times as strings in explicit formats.

In [13]:
now = time.localtime(time.time())
now

time.struct_time(tm_year=2020, tm_mon=9, tm_mday=7, tm_hour=20, tm_min=43, tm_sec=33, tm_wday=0, tm_yday=251, tm_isdst=1)

In [14]:
time.strftime("%a, %d %b %Y %H:%M:%S", now)

'Mon, 07 Sep 2020 20:43:33'

Should we want to look at it, some fairly low-level details are available.  For example, clocks exist for the system since last boot, within each thread, and so on.  These details are less commonly needed, but can be when you want to, for example, profile the time spent within particular threads.

In [15]:
(
time.clock_getres(time.CLOCK_MONOTONIC),
time.clock_getres(time.CLOCK_THREAD_CPUTIME_ID),
time.clock_gettime_ns(time.CLOCK_MONOTONIC),
time.clock_gettime_ns(time.CLOCK_THREAD_CPUTIME_ID),    
)

(1e-09, 1e-09, 2782174481816218, 547932891)

## Module: secrets

Python has a module `random` that produces pseudo-random numbers, item selections, or distributions.  It is very useful, but the emphasis is on **pseudo**.  By design, `random` is precisely reproducible when a seed is used.  This makes it unsuitable for many actual security/cryptographic (but extremely useful for things like simulating noise and Monte Carlo simulations).

When you want actual **secrets**, the `secrets` module is what you should use.  It is much less general purpose by design. For generating passwords, nonces, security tokens, and the like, it has just those capabilities you need, but not others that would distract from those purposes.

In [29]:
import random
# I have no idea what will be produced here...
random.seed()
print([random.randrange(1, 100) for i in range(15)])

# With the seed 42, we will always choose 82
rand_nums = []
for i in range(15):
    random.seed(42)
    rand_nums.append(random.randrange(1, 100))
print(rand_nums)

[30, 48, 26, 15, 55, 64, 66, 56, 36, 44, 67, 25, 86, 83, 87]
[82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82]


The basic functions are simply generating tokens in a few flavors.

In [17]:
import secrets
secrets.token_bytes(20)

b'\x9c\xf8\xd8\xb7\xb2\x08\xe6Y\xf9\xbf\xa2\xf0g\xd0\xf3j\x93\x92\xa4\x08'

In [18]:
secrets.token_hex(30)

'448c73cc884f5601f5676354cddce519d88035432e8a0b1c374f09375e96'

In [19]:
secrets.token_urlsafe(30)

'w9bHlbyl9T8dEsE3WIW3LduTjRplXb2IFegzjT9c'

There are just a couple more general "random number" functions.  It is not hard to generalize these to other distributions or the like, but you rarely need to.

In [20]:
secrets.choice(range(100, 200))

150

In [21]:
secrets.randbelow(1_000_000)

935273

In [33]:
# A random 32-bit integer
secrets.randbits(32)

2599241311

## Module: tempfile

Working with temporary files is often useful in scripts.  For a variety of reasons, you may need a file interface to put data into and pull data out of; but sometimes you do not have a persistent and predictable file path that makes sense for that.

One common, but by no means exclusive, situation when this comes up is in writing unit tests.  In this scenario, data gets written to a file that would be explicitly specified in "normal" operation, but it needs to be a fresh unused file each time the test is run.

There are just *two* main classes we will look at in the `tempfile` module, `TemporaryFile` and `NamedTemporaryFile`.  A few supporting classes or functions are documented in the module documentation (for example, creating temporary directories rather than files).  The difference is only that the latter is guaranteed to have a name within the filesystem.

Whether named or unnamed, a temporary file is removed when the file handle is closed, including during garbage collection if the name falls out of scope.  `NamedTemporaryFile` has a parameter `delete` which defaults to True, but can be set False to allow the filesystem to retain the file after the program completes.

In [23]:
import tempfile
nameless = tempfile.TemporaryFile()
print("No actual name, but a handle number:", nameless.name)

No actual name, but a handle number: 46


In [24]:
named = tempfile.NamedTemporaryFile(mode='w+t', delete=False)
tmpname = named.name
print("A name in an OS-appropriate location:", named.name)

A name in an OS-appropriate location: /tmp/tmpk7iq4qfs


There is nothing at all special about these files other than knowing they did not exist before.  The choice of unique names is not cryptographically secure; if you need that, the `secrets` module is a way to get a securely unique name.

In [25]:
for i in range(1000):
    print(i, file=named)
    named.write('---\n')
named.seek(102)
print(named.read(20))

16
---
17
---
18
---


In [26]:
named.close()
!tail -5 $tmpname

---
998
---
999
---
