# Assignment 1: Auto-Correct

_See [Assignment 1: Auto-correct](https://sikoried.github.io/sequence-learning/01/autocorrect/)._

## String Distances

Gemischtes Doppel 1 | Gemischtes Doppel 2 | Gemischtes Doppel 3
-|-|-
![Gemischtes Doppel 1](res/gem_doppel_1.jpg) | ![Gemischtes Doppel 2](res/gem_doppel_2.jpg) | ![Gemischtes Doppel 3](res/gem_doppel_2.jpg)

In the first part of the exercise, we will compute the Hamming and edit distances for the string pairs above (source: Gemischten Doppel, Süddeutsche).
Let's start with the simpler one: [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance).
Since this distance is only defined for strings of equal length; for your implementation, make a reasonable modification to support different lengths

In [10]:
import sys

gem_doppel = [
    ("GCGTATGAGGCTAACGC", "GCTATGCGGCTATACGC"),
    ("kühler schrank", "schüler krank"),
    ("the longest", "longest day"),
    ("nicht ausgeloggt", "licht ausgenockt"),
    ("gurken schaben", "schurkengaben")
]

In [11]:
def hamming(x, y):
    if len(x) != len(y):
        print("lengths do not match; using approximation", file=sys.stderr)

    diffs = 0
    for i in range(0, min(len(x), len(y))):
        if x[i] != y[i]:
            diffs += 1

    # add the length difference as approximation
    return diffs + abs(len(x) - len(y))

for (a, b) in gem_doppel:
    print("hamming('%s', '%s') = %d" % (a, b, hamming(a, b)))


# hamming('GCGTATGAGGCTAACGC', 'GCTATGCGGCTATACGC') = 10
# hamming('kühler schrank', 'schüler krank') = 13
# hamming('the longest', 'longest day') = 11
# hamming('nicht ausgeloggt', 'licht ausgenockt') = 4
# hamming('gurken schaben', 'schurkengaben') = 14

hamming('GCGTATGAGGCTAACGC', 'GCTATGCGGCTATACGC') = 10
hamming('kühler schrank', 'schüler krank') = 13
hamming('the longest', 'longest day') = 11
hamming('nicht ausgeloggt', 'licht ausgenockt') = 4
hamming('gurken schaben', 'schurkengaben') = 14


lengths do not match; using approximation
lengths do not match; using approximation


As you can see, the Hamming distances are quite large, since only viable "operations" are _match_ (no cost) and _replace_ (cost 1). For a more nuanced measure, implement the [edit distance](https://en.wikipedia.org/wiki/Edit_distance) also allows for insertions and deletions. Make sure to make the cost of those operations configurable.

_Hint:_ This is a good opportunity to familiarize yourself with [`numpy`](https://numpy.org/) and its matrices and range operators; we'll use those throughout the semester.

In [12]:
import numpy as np

def edit(x, y, cost={'m': 0, 's': 1, 'i': 1, 'd': 1}):
    D = np.zeros((len(x) + 1, len(y) + 1), dtype=int)

    # for the empty word, costs match the length of the other string
    D[0, 1:] = range(1, len(y) + 1)
    D[1:, 0] = range(1, len(x) + 1)
    
    for i in range(1, len(x) + 1):
        for j in range(1, len(y) + 1):
            delta = cost['m'] if x[i-1] == y[j-1] else cost['s']
            D[i, j] = min(
                D[i-1, j] + cost['d'],
                D[i, j-1] + cost['i'],
                D[i-1, j-1] + delta
            )

    return D[len(x), len(y)]


for (a, b) in gem_doppel:
    print("edit('%s', '%s') = %d" % (a, b, edit(a, b)))

    
# edit('GCGTATGAGGCTAACGC', 'GCTATGCGGCTATACGC') = 3
# edit('kühler schrank', 'schüler krank') = 6
# edit('the longest', 'longest day') = 8
# edit('nicht ausgeloggt', 'licht ausgenockt') = 4
# edit('gurken schaben', 'schurkengaben') = 7

edit('GCGTATGAGGCTAACGC', 'GCTATGCGGCTATACGC') = 3
edit('kühler schrank', 'schüler krank') = 6
edit('the longest', 'longest day') = 8
edit('nicht ausgeloggt', 'licht ausgenockt') = 4
edit('gurken schaben', 'schurkengaben') = 7


As you can see, the edit distances relate much better to the similarity of the strings, but they still don't really tell us where and how the strings differ.
Extend your implementation from above by also computing a backtrace of the operations which can be used to print the alignment of the two strings.

In [13]:
import operator

from functools import reduce
from numpy import argmin


def edit2(x, y, cost={'m': 0, 's': 1, 'i': 1, 'd': 1}):
    D = np.zeros((len(x) + 1, len(y) + 1), dtype=int)

    # for the empty word, costs match the length of the other string
    D[0, 1:] = range(1, len(y) + 1)
    D[1:, 0] = range(1, len(x) + 1)

    # this array will hold the journal of operations for backtracking
    T = np.zeros((len(x) + 1, len(y) + 1), dtype=np.object_)
    T[0, 0] = 'ε'
    T[0, 1:] = 'i'
    T[1:, 0] = 'd'
    
    for i in range(1, len(x) + 1):
        for j in range(1, len(y) + 1):
            diag = 'm' if x[i-1] == y[j-1] else 's'
            
            costs = [
                ('d', D[i-1, j] + cost['d']),
                ('i', D[i, j-1] + cost['i']),
                (diag, D[i-1, j-1] + cost[diag])
            ]
        
            op, c = min(costs, key=operator.itemgetter(1))
            D[i, j] = c
            T[i, j] = op
    
    # compute trace
    a, b = len(x), len(y)
    tr = []
    while a > 0 or b > 0:
        op = T[a, b]
        tr.append(op)
        if op == 'm' or op == 's':
            a -= 1
            b -= 1
        elif op == 'd':
            a -= 1
        elif op == 'i':
            b -= 1
        else:
            raise ValueError('Invalid operator: ' + str(op))
    
    return D[len(x), len(y)], reduce(operator.add, reversed(tr))


for (a, b) in gem_doppel:
    d, tr = edit2(a, b)
    print("edit('%s', '%s') = %d (%s)" % (a, b, d, tr))

    for i, op in enumerate(tr):
        if op == 'i':
            a = a[:i] + '-' + a[i:]
        if op == 'd':
            b = b[:i] + '-' + b[i:]

    print('  ' + a)
    print('  ' + tr.replace('m', ' ').replace('s', '*'))
    print('  ' + b)

# edit('GCGTATGAGGCTAACGC', 'GCTATGCGGCTATACGC') = 3 (mmdmmmmsmmmmmimmmm)
#   GCGTATGAGGCTA-ACGC
#     d    *     i    
#   GC-TATGCGGCTATACGC
# edit('kühler schrank', 'schüler krank') = 6 (ssmimmmmsddmmmm)
#   küh-ler schrank
#   ** i    *dd    
#   schüler k--rank
# edit('the longest', 'longest day') = 8 (ddddmmmmmmmiiii)
#   the longest----
#   dddd       iiii
#   ----longest day
# edit('nicht ausgeloggt', 'licht ausgenockt') = 4 (smmmmmmmmmmsmssm)
#   nicht ausgeloggt
#   *          * ** 
#   licht ausgenockt
# edit('gurken schaben', 'schurkengaben') = 7 (siimmmmmsdddmmmm)
#   g--urken schaben
#   *ii     *ddd    
#   schurkeng---aben

edit('GCGTATGAGGCTAACGC', 'GCTATGCGGCTATACGC') = 3 (mmdmmmmsmmmmmimmmm)
  GCGTATGAGGCTA-ACGC
    d    *     i    
  GC-TATGCGGCTATACGC
edit('kühler schrank', 'schüler krank') = 6 (ssmimmmmsddmmmm)
  küh-ler schrank
  ** i    *dd    
  schüler k--rank
edit('the longest', 'longest day') = 8 (ddddmmmmmmmiiii)
  the longest----
  dddd       iiii
  ----longest day
edit('nicht ausgeloggt', 'licht ausgenockt') = 4 (smmmmmmmmmmsmssm)
  nicht ausgeloggt
  *          * ** 
  licht ausgenockt
edit('gurken schaben', 'schurkengaben') = 7 (siimmmmmsdddmmmm)
  g--urken schaben
  *ii     *ddd    
  schurkeng---aben


## Spelling Correction

For spelling correction, we will use prior knowledge, to put some learning into our system.
The underlying idea is the Noisy Channel Model, that is: The user intends to write a word w, but through some noise in the process, happens to type the word x.

The correct word ŵ  is that word, that is a valid candidate and has the highest probability:

$$
\begin{eqnarray}
\DeclareMathOperator*{\argmax}{argmax}
\hat{w} & = & \argmax_{w \in V} P(w | x) \\
        & = & \argmax_{w \in V} \frac{P(x|w) P(w)}{P(x)} \\
        & = & \argmax_{w \in V} P(x|w) P(w)
\end{eqnarray}
$$

- The candidates V can be obtained from a vocabulary.
- The probability $P(w)$ of a word w can be learned (counted) from data.
- The probability $P(x|w)$ is more complicated... It could be learned from data, but we could also use a heuristic that relates to the edit distance, e.g. rank by distance.

You can find word statistics and training data at: <http://norvig.com/ngrams/> (The single word counts are part of this repo).

### Further Reading

- <http://norvig.com/spell-correct.html>
- Mays, Eric, Fred J. Damerau and Robert L. Mercer. 1991. Context based spelling correction. _Information Processing and Management,_ 23(5), 517–522. (IBM)
- Kernighan, Mark D., Kenneth W. Church, and William A. Gale. 1990. A spelling correction program based on a noisy channel model. _Proceedings of COLING 1990,_ 205-210. (Bell Labs)

### Step 1: Read in vocabulary and counts

In [14]:
from numpy import log

# contains lines of "word <count>"
counts_fn = 'data/count_1w.txt.gz'

# read in vocabulary
voc = {}
all_counts = 0
num = 0

import gzip

with gzip.open(counts_fn, "rb") as f:
    for line in f:
        w, c = line.strip().split()
        voc[w.decode('ascii')] = int(c)
        all_counts += int(c)
        num += 1

# normalize the counts
for k in voc:
    voc[k] = log(voc[k] / all_counts)
    
print("Read in %d lemmas." % len(voc))


Read in 333333 lemmas.


### Step 2: Baseline implementation

Implement a (pretty inefficient) spell corrector that, for a given word `w`, suggests at most `max_cand=5` candidate words.
To speed up the computation a little bit, consider only words that differ at most `max_dist=3` in length.

In [15]:
from operator import itemgetter

# suggest a list of candidates for the entered word
def suggest(w, max_cand=5, max_dist=3):
    # if we have an exact hit, just return that.
    if w in voc:
        return [(w, 0, voc[w])]
    
    # maps a word to (other, edit-dist, other-rel-freq)
    def check(w, kv):
        ed = edit(w, kv[0])
        return (kv[0], ed, kv[1])
    
    # compute edit distance of all words that differ at most max_dist in length
    res = [check(w, kv) for kv in voc.items() if abs(len(w) - len(kv[0])) < max_dist]
    
    # now sort descending by relative frequency then ascending by edit distance
    res = sorted(res, key=itemgetter(2), reverse=True)
    res = sorted(res, key=itemgetter(1))

    
    return res[:max_cand]


examples = [
    "pirates",    # in-voc
    "pirutes",    # pirates?
    "continoisly",  # continuosly?
]

for w in examples:
    print(w, suggest(w, max_cand=3))

pirates [('pirates', 0, -11.408058827802126)]
pirutes [('pirates', 1, -11.408058827802126), ('minutes', 2, -8.717825438953103), ('viruses', 2, -11.111468702571859)]
continoisly [('continously', 1, -15.735337826575178), ('continuously', 2, -11.560071979871001), ('continuosly', 2, -17.009283000138204)]


### Step 3: Better Heuristic

Let's use a more sophisticated heuristic, that doesn't sort the data twice, but combines the distance with the relative frequency.
Here's the gist of it: while the distances are (typically) 0, 1, 2, 3..., the relative frequencies are very small numbers.

To model the Bayesian rule above, we need two quantities:

- $P(w)$: this is just the relative frequency
- $P(x|w)$: let's assume that about 1/3 of the time, we're just one symbol off; 1/2 for two, etc. (These don't really form probabilities, but we just want something probability-like :-)
- we may want to balance those quantities, since they might be orders of magnitude different

Mathematically speaking, we want something like

$$
\hat{w} = \argmax_{w \in V} = P(x|w) * P(w)^\beta \quad .
$$

For numerical reasons, we can apply the `log` and discard factors that are equal for all words:

$$
\begin{eqnarray}
\DeclareMathOperator*{\argmax}{argmax}
\hat{w} & = & \argmax_{w \in V} \log(P(x|w)) + \beta \log(P(w)) \\
        & = & \argmax_{w \in V} \left[ -\log(\frac{1}{2 + \text{edit}(w, x)}) + \beta \log \frac{\text{count}(w)}{\sum_x \text{count}(x)} \right] \\
        & = & \argmax_{w \in V} \left[ \beta \log \text{count}(w) - \log\left(2 + \text{edit}(w, x)\right) \right]
\end{eqnarray}
$$

In [16]:
# reload the voc since we're now working with counts directly
with gzip.open(counts_fn, "rb") as f:
    for line in f:
        w, c = line.strip().split()
        voc[w.decode('ascii')] = int(c)

print("Read %d lemmas." % len(voc))

Read 333333 lemmas.


In [17]:
def suggest2(w, beta, max_cand=5, max_dist=3, cost={'m': 0, 's': 1, 'i': 1, 'd': 1}):
    # if we have an exact hit, just return that.
    if w in voc:
        return [(w, log(2), beta*log(voc[w]))]
    
    def check(w, kv):
        ed = edit(w, kv[0])
        return (kv[0], log(2 + ed), beta*log(kv[1]))

    res = [check(w, kv) for kv in voc.items() if abs(len(w) - len(kv[0])) < max_dist]
    res = sorted(res, key=lambda x: x[2] - x[1], reverse=True)
    return res[:max_cand]
  

examples = [
    "pirates",    # in-voc
    "pirutes",    # pirates?
    "continoisly",  # continuosly?
]

for w in examples:
    print(w, suggest2(w, 0.1, max_cand=3))

pirates [('pirates', 0.6931471805599453, 1.569214519355234)]
pirutes [('pirates', 1.0986122886681098, 1.569214519355234), ('minutes', 1.3862943611198906, 1.8382378582401362), ('prices', 1.6094379124341003, 1.9310363514927111)]
continoisly [('continuously', 1.3862943611198906, 1.5540132041483465), ('continously', 1.0986122886681098, 1.1364866194779288), ('continuity', 1.6094379124341003, 1.561483500710584)]


## Efficient Implementation

Use a prefix tree to efficiently compute the edit distance for a large number of words.

In [18]:
# reload the voc since we're now working with counts directly

class PrefixTree:    
    voc = dict()
    def __init__(self, parent, prefix, count=None):
        self.parent = parent
        self.prefix = prefix
        self.count = count
        self.succ = dict()
        self.hypref = None
    
    def size(self):
        return len(voc)
    
    def clean_hyprefs(self):
        agenda = [self]
        while agenda:
            n = agenda.pop()
            n.hypref = {}
            agenda.extend([s for (k, s) in n.succ.items()])    
    
    def insert(self, word, count):
        it = self
        for (i, c) in enumerate(word):
            if not c in it.succ:
                it.succ[c] = PrefixTree(it, word[:i+1])
            it = it.succ[c]
        if it.count:
            raise ValueError("%s already in tree" % word)
        it.count = count
        self.voc[word] = count

    # query the score of a word in the tree
    def query(self, word):
        it = self
        for i in word:
            if not i in it.succ:
                raise ValueError("%s not found" % word)
            it = it.succ[i]
        return it.count
    
    def __str__(self):
        return str({'prefix': self.prefix, 'succ': list(self.succ.keys()), 'count': self.count})
    
    def __repr__(self):
        return self.prefix if self.prefix else 'None'

    def to_string_lines(self):
        res = []
        agenda = [(n, 1) for (k, n) in sorted(self.succ.items(), reverse=True)]
        while agenda:
            n, d = agenda.pop()
            if not n.count:
                res.append("%s %s" % (' '*d, n.prefix))
            else:
                res.append("%s %s %d" % (' '*d, n.prefix, n.count))
                
            for (k, s) in sorted(n.succ.items(), reverse=True):
                agenda.append((s, d+1))
        
        return res

    

# read all entries into the prefix tree
root = PrefixTree(None, 'ε')

# populate with some words
# for (w1, w2) in gem_doppel:
#     root.insert(w1, 1)
#     root.insert(w2, 1)
    

# tiny example
# root.insert('haus', 1)
# root.insert('habe', 1)
# root.insert('hau', 1)
# root.insert('auto', 1)
# root.insert('autark', 1)

# root.insert('toupi', 1)

# root.insert('pirates', 1)
# root.insert('pirutes', 1)

import gzip
with gzip.open(counts_fn, "rb") as f:
    for line in f:
        w, c = line.strip().split()
        root.insert(w.decode('ascii'), int(c))

# print('\n'.join(root.to_string_lines()))

# print(root.query('pirate'))
# print(root.query('continuous'))
# print(root.query('continous'))  # lol.

print("Indexed %d words" % len(root.voc.keys()))

Indexed 333333 words


In [8]:
# efficient implementation of edit distance for larg vocabulary
def edit3(root, w, max_dist=3, cost={'m': 0, 's': 1, 'i': 1, 'd': 1}):
    # effectively, we'll build a shadow tree with refs to the original nodes 
    # initial, eps-row in D
    eps = {
        'token': 'ε',
        'noderef': root,
        'backref': None,
        'succ': dict(),
        'D': list(range(len(w)+1))
    }
    
    # populate the eps-cols; use items to be able to sort by key
    agenda = [(eps, 0)]
    while agenda:
        cur, depth = agenda.pop()
    
        for (c, n) in sorted(cur['noderef'].succ.items()):
            node = {
                'token': c,
                'noderef': n,
                'backref': cur,
                'succ': dict(),
                'D': [depth+1]
            }
            cur['succ'][c] = node
            agenda.append((node, depth+1))
    
    # we'll do a depth-first search, one char at a time
    eds = {}
    for (j, c) in enumerate(w, start=1):
        # start at depth=1
        agenda = [(t, sn, 1) for (t, sn) in sorted(eps['succ'].items())]
        while agenda:
            token, shadow_node, depth = agenda.pop()
            delta = cost['m'] if c == token else cost['s']
            
            # costs for each step
            cost_del = shadow_node['backref']['D'][j] + cost['d']  # D[i-1, j] one letter "up" = backref!
            cost_ins = shadow_node['D'][j-1] + cost['i']           # D[i, j-1] one letter "left" = same line
            cost_dia = shadow_node['backref']['D'][j-1] + delta    # D[i-1, j-1] one up+left

            # ...decide
            step = min(cost_del, cost_ins, cost_dia)
        
            shadow_node['D'].append(step)
            
            agenda.extend([(t, sn, depth+1) for (t, sn) in sorted(shadow_node['succ'].items())])                
            
            # at the end of the input word, if we have a word, update the edit distances accordingly
            if j == len(w) and step < max_dist:
                n = shadow_node['noderef']
                if n.count:
                    eds[n.prefix] = step  # (step, n.count)
            
    return eds

print(edit3(root, 'pirutes'))


# compare timings
# for e in examples[0:1]:
#     print(e)
    
#     #%time individual = [(w, edit(e, w)) for w in root.voc]
#     #print(sorted(individual.items(), key=itemgetter(1))[:10])
    
#     %time prefixed = edit3(root, e)
#     print(sorted(prefixed.items(), key=itemgetter(1))[:10])

{}


In [39]:
class Hyp:
    _cost = {'m': 0, 's': 1, 'i': 1, 'd': 1}
    def __init__(self, d, j, noderef):
        self.d = d  # depth (=row number)
        self.j = j  # character offset (=col number)
        self.noderef = noderef
        
        self.c = -1
        # some cost can be found right there
        if d == 0 and j == 0:
            self.c = 0
        elif d == 0:
            self.c = j * Hyp._cost['i']
        elif j == 0:
            self.c = d * Hyp._cost['d']
        
        # back-refs
        self.refi = None
        self.refd = None
        self.refs = None
    
    def __str__(self):
        return "(%s, %d, %d, %s)" % (self.noderef.prefix, self.d, self.j, self.c)
    
    def __eq__(self, other):
        return self.d == other.d and self.j == other.j and self.noderef == other.noderef

    def cost(self, word):
        if self.c < 0:
            vals = []
            if self.refs:
                delta = Hyp._cost['m'] if self.noderef.prefix[-1] == word[self.j-1] else Hyp._cost['s']
                vals.append(self.refs.cost(word) + delta)
            if self.refd:
                vals.append(self.refd.cost(word) + Hyp._cost['d'])
            if self.refi:
                vals.append(self.refi.cost(word) + Hyp._cost['i'])
            if not vals:
                raise ValueError("Unable to obtain cost; check agenda sorting! %s %s" % (str(self), word))

            self.c = min(vals)

        return self.c

    
def edit4(root, w, max_cost=3):
    # clear out any hypref in the tree
    root.clean_hyprefs()
    
    # edit distances to return
    eds = {}
    
    # initial hypothesis
    eps = Hyp(0, 0, root)
    root.hypref[0] = eps
    
    agenda = [eps]
    while agenda:
        h = agenda.pop(0)
    
        # print(h.cost(w), h)
        
        if h.cost(w) > max_cost:
            # print("Abandoning %s" % h)
            continue
            
        if h.j == len(w):
            n = h.noderef
            if n.count:
                eds[n.prefix] = h.cost(w)
        
        # don't expand longer than the word
        if h.j in h.noderef.hypref:
            i = h.noderef.hypref[h.j]
            i.refi = h
            # print("~%s.refi = %s" %(i, h))
        else:
            if h.j < len(w) and h.j * Hyp._cost['i'] < max_cost:
                i = Hyp(h.d, h.j+1, h.noderef)
                i.refi = h
                if h.refd:
                    i.refs = h.refd
                agenda.append(i)
                h.noderef.hypref[h.j] = i
                # print("+i %s" % i)
        
        # only expand "downwards" if there's more successors (depth)
        for (t, n) in h.noderef.succ.items():
            if h.j in n.hypref:
                # get(n.prefix, h.d+1, h.j)
                d = n.hypref[h.j]
                d.refd = h
                # print("~%s.refd = %s" %(d, h))
            else:
                if h.d * Hyp._cost['d'] < max_cost:
                    d = Hyp(h.d+1, h.j, n)
                    d.refd = h
                    if h.refi:
                        d.refs = h.refi
                    agenda.append(d)
                    n.hypref[h.j] = d
                    # print("+d %s" % d)
        
            if (h.j+1) in n.hypref:
                # get(n.prefix, h.d+1, h.j+1)
                s = n.hypref[h.j+1]
                s.refs = h
                # print("~%s.refs = %s" %(s, h))
            else:
                if h.j < len(w):
                    s = Hyp(h.d+1, h.j+1, n)
                    s.refs = h
                    agenda.append(s)
                    n.hypref[h.j+1] = s
                    # print("+s %s" % s)
    
    return eds


# for e in ['hans']:
for e in examples:
    print(e, list(edit4(root, e).items())[:4])

pirates [('tomates', 3), ('timated', 3), ('timates', 2), ('ticktes', 3)]
pirutes [('timates', 3), ('ticktes', 3), ('tirades', 3), ('airlies', 3)]
continoisly [('continuosly', 2), ('continually', 3), ('continously', 1), ('continuously', 2)]


In [40]:
# final benchmark
for e in examples:
    print(e)
    
    # old-school
    %time individual = [(w, edit(e, w)) for w in root.voc]
    
    # matrix in prefix tree
    %time prefixed = edit3(root, e)

    # sort-of-beamsearch
    %time beamed = edit4(root, e)

    #print(sorted(individual.items(), key=itemgetter(1))[:10])
    #print(sorted(prefixed.items(), key=itemgetter(1))[:10])
    #print(sorted(beamed.items(), key=itemgetter(1))[:10])

pirates
CPU times: user 32.5 s, sys: 380 ms, total: 32.9 s
Wall time: 36 s
CPU times: user 13.9 s, sys: 259 ms, total: 14.1 s
Wall time: 14.3 s
CPU times: user 18 s, sys: 154 ms, total: 18.1 s
Wall time: 18.2 s
pirutes
CPU times: user 30.4 s, sys: 165 ms, total: 30.6 s
Wall time: 31.1 s
CPU times: user 13.9 s, sys: 211 ms, total: 14.1 s
Wall time: 14.3 s
CPU times: user 17.6 s, sys: 128 ms, total: 17.8 s
Wall time: 17.9 s
continoisly
CPU times: user 45.4 s, sys: 347 ms, total: 45.7 s
Wall time: 45.8 s
CPU times: user 18.9 s, sys: 223 ms, total: 19.1 s
Wall time: 19.4 s
CPU times: user 18.6 s, sys: 117 ms, total: 18.7 s
Wall time: 18.8 s
