# Lab 2
In this lab, we will implement [Conway's Game of Life](https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life) in a naïve way using the GPU in TensorFlow. Like we discussed in the lecture, there are more efficient algorithms that use the presence of reoccurring patterns within the Life board to speed up the processing.

First, we will prepare our environment, then we'll get back to what the rules for the game are. We are using a small auxiliary script `lifereader.py` that you can inspect yourself.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import lifereader
import numpy as np
import tensorflow as tf

# The following two lines DISABLE GPU usage and logs all activities executed by TensorFlow.
##tf.config.set_visible_devices([], 'GPU')
##tf.debugging.set_log_device_placement(True)


We will download a zip file with lots of game of life patterns, and try at least one of them in our code. Specifically,
you should download [lifep.zip](http://www.ibiblio.org/lifepatterns/lifep.zip) from [Alan Hensel's](http://www.ibiblio.org/lifepatterns/) page.

On an UPPMAX machine, you can download and unzip this file with the following commands:

    wget http://www.ibiblio.org/lifepatterns/lifep.zip
    unzip lifep.zip
    
Now we can try loading one file.

In [None]:
board = lifereader.readlife('BREEDER3.LIF', 2048)

Let's check what this looks like. 

In [None]:
plt.figure(figsize=(20,20))
plt.imshow(board[768:1280,768:1280])

You can check qualitatively that this looks similar to the initial step in the Wikipedia [Breeder](https://en.wikipedia.org/wiki/Breeder_(cellular_automaton) ) page

Let's zoom out a bit and check the full picture.

In [None]:
plt.figure(figsize=(20,20))
plt.imshow(board)

Since we will be using TensorFlow, we should convert this board to a tensor. We will even do it three times,
in two different formats. Feel free to decide which one you use in your implementation.



In [None]:
boardtfbool = tf.cast(board, dtype=tf.bool)
boardtfuint8 = tf.cast(board, dtype=tf.uint8)
boardtfint32 = tf.cast(board, dtype=tf.int32)
boardtffloat16 = tf.cast(board, dtype=tf.half)

The standard rules of Game of Life are pretty simple:
* Each cell has 8 neighbors, i.e. the 8 adjacent cells in each direction (including diagonals). All behavior is defined from the current state of a cell and it's neighbors.
* A live cell is a cell containing `1` or `True`. The opposite is a dead cell.
* In each iteration of the game, all cells are updated based on the state of that cell and its neighbors in the previous iteration. It doesn't matter which neighbors are turning dead/live during the same iteration.
* Any live cell with tho or three neighbors survive
   * All other live cells die
* Any dead cell with three live neighbors gets alive
   * All other dead cells stay dead
   
You should implement the function `runlife` below. It accepts a Game of Life board tensor and the number of iterations. It should return a new tensor with the relevant updates. Try to use existing functions in the [TensorFlow API](https://www.tensorflow.org/versions/r2.1/api_docs/python/tf) rather than rolling your own. Note that inspecting the state of your neighbors and yourself might be possible to express as a convolution, but it might not be the fastest way. There might be a bug in some configurations with doing GPU convolutions for `int32` data.

We tag this function as `@tf.function` in order to make TensorFlow optimize the full graph. You might want to remove that for making debugging easier (feel free to copy code out of Jupyter if you want to debug in another environment, as well).

Note: You do not have to implement any specific behavior for cells right at the edge, as long as dead cells with only dead neighbors stay dead.

In [None]:
@tf.function
def runlife(board, iters):
    # Init work
    for _ in range(iters):
        # Per iteration
        pass
    # Final work
    return board

We will now run the code. In this version, it was adapted to the `float16` board.
If you used another version instead, change the code.

In [None]:
%time boardresult = runlife(boardtffloat16, 1500)
boardresult = np.cast[np.int32](boardresult)
plt.figure(figsize=(20,20))
plt.imshow(boardresult)

What you might notice is that if you rerun the script, it is far faster. Is this due to some caching?

You can verify that by rerunning the script again, but loading another .LIF file in the start.

What actually happens is that the compiled TensorFlow graph is adapted to the exact number of iterations, but once that's done, you can run any model with that number of iterations very quickly.

## Things to try
* What speed do you get if you run on CPU instead? Remember that the default notebook setting is 4 CPU cores.

* What happens if you remove @tf.function?

* What happens if you change the size of the grid to something other than 2048? What happens if you change the number of iterations? Does the runtime change match your expectations?

## How to report
E-mail your *saved* final runlife method to [martin.kronbichler@it.uu.se](mailto:martin.kronbichler@it.uu.se) (either inside this notebook or as a separate text file), with some comments entered on the speed of your implementation on GPU vs CPU, what you tried, etc.