# Description

In this exercise we continue with a similar setup to the other exercises.  We generate 50 files on disk, each of them containing 20 Natural Numbers below 100, one per line.  In a similar manner to the prior exercise, our program wishes to operate with 20 threads, each of them dealing with one line number of each corresponding file.

For this task, each thread, named 'Line-N' (where N is the number of the line it handles), will read a series of instructions that consist of:

1. Read the content of three specified files, 'A', 'B', and 'C'.
2. Shuffle the values on line N between the files in a "clockwise" fashion.
3. Write each file to disk with adjusted content.

For example, if line 17 of the files initially contain A=23, B=14, and C=99 , after one operation, the several line 17s will become A=99, B=23, and C=14.

As with other examples, for these small files on a fast disk, a serial approach remains faster.  But as the problem scales to larger files with slower access, threads would begin to win.

There are twin dangers here.  If you simply write to the same file from different threads, without locks, you will most likely encounter a race condition where one thread's shuffle overwrites a file that should have a different line shuffled in another thread.  On the other hand, if every file is locked before use, different threads may try to acquire competing resources in a circular manner, causing deadlock.

The `oplist` variable contains entries like below, describing an action.  Each thread should only act on those actions addressed to it and disregard all others (as mentioned, a serial approach would achieve the functional purpose, but is not what this exercise is for).  This poses no danger because reading through `oplist` in each thread is **read-only** and no race condition can hence occur. Thinking about the problem will let you realize that shuffles must be performed in the exact listed order, per line, to obtain the same final state.  Operation descriptions look something like this:

```python
[...,
 ['Line-19', 'tmp-Abume.numbers', 'tmp-DTfsx.numbers', 'tmp-jXmRn.numbers'],
 ['Line-2', 'tmp-TiyKw.numbers', 'tmp-QwKin.numbers', 'tmp-pFocs.numbers'],
 ['Line-13', 'tmp-DTSWm.numbers', 'tmp-yJmoQ.numbers', 'tmp-DTfsx.numbers'],
 ...
]
```

Create a suitable `shuffle()` function that will neither deadlock nor create a race condition.

# Setup

In [1]:
from threading import Thread, Lock
from pathlib import Path

from generate import operations
names, oplist = operations('lesson-4')

# Create lock for each file
locks = {name: Lock() for name in names}

def shuffle(lineno):
    # This version will DEADLOCK!
    line_name = f"Line-{lineno}"
    for opnum, op in enumerate(oplist):
        if op[0] == line_name:
            A, B, C = op[1:]
            with locks[A], locks[B], locks[C]:
                linesA = Path(A).read_text().split('\n')
                linesB = Path(B).read_text().split('\n')
                linesC = Path(C).read_text().split('\n')
                # 1-based line numbers, 0-based list
                linesA[lineno-1] = linesC[lineno-1]
                linesB[lineno-1] = linesA[lineno-1]
                linesC[lineno-1] = linesB[lineno-1]
                # Write the shuffled data back
                Path(A).write_text('\n'.join(linesA))
                Path(B).write_text('\n'.join(linesB))
                Path(C).write_text('\n'.join(linesC))

# After running all threads, `results_list` should contain right answers
threads = [Thread(target=shuffle, args=(i,), name=f"Line-{i}") for i in range(1, 21)]

# Should be able to run these to get correct modifications
# [t.start() for t in threads]
# [t.join() for t in threads]

# Solution

In [2]:
# Note, the solution presented here is the SIMPLEST refactoring
# For extra credit consider other approaches!
#  * For example, one thread could handle I/O while others shuffle
#  * Or, first pull off all "line N" values, and operate each
#       shuffle thread on independent data (assemble at end)
#  * Or ....

def shuffle(lineno):
    line_name = f"Line-{lineno}"
    for opnum, op in enumerate(oplist):
        if op[0] == line_name:
            A, B, C = op[1:]
            all_locks = (locks[A], locks[B], locks[C])
            
            # Keep trying to get all locks
            while not all(locks_good := [l.acquire(timeout=0.01) for l in all_locks]):
                for i, acquired in enumerate(locks_good):
                    if acquired:
                        all_locks[i].release()
            
            linesA = Path(A).read_text().split('\n')
            linesB = Path(B).read_text().split('\n')
            linesC = Path(C).read_text().split('\n')
            # 1-based line numbers, 0-based list
            linesA[lineno-1] = linesC[lineno-1]
            linesB[lineno-1] = linesA[lineno-1]
            linesC[lineno-1] = linesB[lineno-1]
            # Write the shuffled data back
            Path(A).write_text('\n'.join(linesA))
            Path(B).write_text('\n'.join(linesB))
            Path(C).write_text('\n'.join(linesC))
            
            # Release all locks
            for lock in all_locks:
                lock.release()

# Test Cases

In [3]:
def test_thread_count():
    assert len(threads) >= 20
    
test_thread_count()

In [4]:
def test_deadlock():
    from timeout import timeout
    
    # Setup original state of files
    names, oplist = operations('lesson-4')
    # Create fresh threads
    threads = [Thread(target=shuffle, args=(i,), name=f"Line-{i}") for i in range(1, 21)]
    
    # Try to process all operations
    with timeout:
        try:
            [t.start() for t in threads]
            [t.join() for t in threads]
        except TimeoutError as err:
            assert False, "Timeout indicates probable deadlock"
        
test_deadlock()

In [6]:
def test_modifications():
    # DO NOT run this test until test_deadlock() passes!
    from hashlib import md5
    from pathlib import Path
    # Setup original state of files
    names, oplist = operations('lesson-4')
    
    # Should NEVER fail since it only verifies utility func
    cat = ''.join(Path(name).read_text() for name in names)
    hash_orig = md5(cat.encode()).hexdigest()
    assert hash_orig == '32f11af64b0391f624d954b2988695c6', f"Wrong MD5sum {hash_orig}"

    # Create fresh threads
    threads = [Thread(target=shuffle, args=(i,)) for i in range(1, 21)]
    [t.start() for t in threads]
    [t.join() for t in threads]
    cat = ''.join(Path(name).read_text() for name in names)
    hash_mod = md5(cat.encode()).hexdigest()
    assert hash_mod == 'f1fc8076c8905c27a196fb41d8454b2a', f"Wrong MD5sum {hash_mod}"  
    
test_modifications()