# Description

This exercise will be somewhat different from those in other lessons.  We will continue to use the same basic premise of creating multiple files, each containing 20 Natural Numbers.  For this exercise, we again generate 1000 of them.  However, since the Global Interpreter lock is a design feature of CPython itself rather than an API provided, it makes less sense to program according to some API or pattern to dive deeper into this lesson.

The largest lesson of this lesson is probably that I/O bound code "releases the GIL" and will often benefit from threading on multiple cores.  CPU bound code, in contrast, is, in pure Python, can only be concurrent, not truly parallel.

Let us return to the exercise in lesson 3 which asked you to read in all 1000 files with names like `tmp-?????.numbers` and perform an accumulation across corresponding lines of each file (considering them in alphabetical order).  The essence of the algorithm was provided in the solution.  It assumes that you have read a given line number of each of the 1000 files into a list of 1000 integers.  So, for each line 17, for example, we calculate:

```python
top = ((99 + 99) * 99) ** 99  # modulo bigger than any number
data = line17_numbers + [top]
accum = data[0]
for j in range(1, len(data), 4):
    b, c, d, e = data[j:j+4]
    accum = (((accum + b) * c) ** d) % e
```

There is no need even to use the `threading` module for this exercise.  You simply want to evaluate how much time is spent in the I/O operations (reading from 1000 files) versus how much is spent in the mathematical operations.  This answer will vary depending on the kind of CPU and kind of disk that exists on the machine where you run this exercise.  Write general functions `time_io()` and `time_cpu()` simply to return two numbers estimating those times in microseconds.  This will give you some sense of the theoretically best possible thread parallelism in pure-Python.

# Setup

In [1]:
from generate import create_files
create_files('lesson-5')

def time_io():
    return 50_000   # microseconds

def time_cpu():
    return 50_000   # microseconds

# Solution

In [2]:
from time import time
from glob import glob
from statistics import median
from random import randint

def time_io():
    times = []
    files = glob('tmp-*.numbers')
    for _ in range(11):  # Typical read (try several times)
        start = time()
        for fname in files:
            numbers = open(fname).readlines()
        times.append(time()-start)
        
    return int(median(times) * 1_000_000)

def time_cpu():
    times = []
    top = ((99 + 99) * 99) ** 99 
    for _ in range(20):  # Simulate the 20 lines
        # Perhaps different numbers change timing significantly
        data = [randint(1, 99) for _ in range(1000)] + [top]
        accum = data[0]
        start = time()
        for j in range(1, len(data), 4):
            b, c, d, e = data[j:j+4]
            accum = (((accum + b) * c) ** d) % e
        times.append(time()-start) 
        
    return int(sum(times) * 1_000_000)

# Test Cases

In [3]:
def test_plausible_io():
    assert 1000 < time_io() < 100_000
    
test_plausible_io()

In [4]:
def test_plausible_cpu():
    assert 1000 < time_cpu() < 100_000
    
test_plausible_cpu()