<a href="https://colab.research.google.com/github/samialabed/r244_dataflow_tutorial/blob/main/Dataflow_programming_using_TensorFlow_Student_Copy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Dataflow programming using TensorFlow 


## Introduction
The goal of this tutorial is to understand how dataflow programming can be used to construct computational graphs, and how these graphs can be executed using distributed workers and parameter servers.

This tutorial is structured as follow: first we will use a high-level TensorFlow API which hides a lot of the data flow programming complexity, the goal here is to get familiar with TF and understand what is the expectation from end-user perspective, then we will unpack the abstraction layers one by one, looking at what TF is doing behind the scene and then attempt to run this in a distributed fashion.


The final exercise on data structures in computation graphs is meant for students with substantial prior experience. There is no expectation to complete all exercises, and if you are entirely new to TensorFlow, you should prioritize understanding the first exercises well.


### Extra help:
* [TensorFlow v1.x Tutorials](https://github.com/tensorflow/docs/tree/master/site/en/r1/tutorials)
* [Distributed TensorFlow V1.5 Tutorial](https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/distributed.md)
* [TensorFlow v1.15 API Docs](https://www.tensorflow.org/versions/r1.15/api_docs/python/tf)

#### What is this "Colab" environment: 
  * You can think of it as a collaborative Jupyter notebook, you can write and execute Python code on the cloud without any setup required and provide a real-time sharing/pair programming capabilities. It comes preloaded with most of the popular data science and machine learning libraries.
  * If you need to install extra packages (outside the scope of this tutorial) you can do so by executing a cell with this command `!pip install <name of package>`  
  * [Introductionary video on Colab](https://www.youtube.com/watch?v=inN8seMm7UI)


# Setup and loading packages
The next cell imports all the needed libraries for this exercise as well as loads the correct version of TensorFlow (v1.15) and not the new TensorFlow (v2.3).

This tutorial is meant to demonstrate the construction of *static* computational graphs whereby a computation is first designed by connection graph fragments, and later execution by explicitly invoking this graph.
This is in contrast to an imperative ('eager'/'define-by-run' execution) where the code is executed directly like any other Python code. Eager TensorFlow relies on various utilities to transform the executed code to a static graph (for purposes of deployment and optimization). The goal of the exercise is hence to understand how static and eager implementations correspond. Eager execution is default in TensorFlow >= 2.0.
We will explore eager execution at the end of this tutorial


In [None]:
#@title Execute this cell! 
# We will be using TF v1.5 in this tutorial 
%tensorflow_version 1.x

# Load the TensorBoard notebook extension.
%load_ext tensorboard

# import package as normal, TF is installed by default 
import tensorflow as tf
import numpy as np
import tensorboard
from datetime import datetime
import seaborn as sns
import matplotlib.pyplot as plt
# Clear any tensorboard logs from previous runs
!rm -rf ./logs/ 
# Define the TensorBoard settings, will be explained later.
stamp = datetime.now().strftime("%Y%m%d-%H%M%S")
logdir = 'logs/func/%s' % stamp


In [None]:
#@title Hello TensorFlow: Example executing TFv1.15 on Colab


# check python version is indeed 1.x
print(f"TensorFlow version: {tf.__version__}")


# taking parameter input as widget - feel free to change on the right side.
scaler_input = 5 #@param {type:"number"}
# executing simple matrix multipication using TF
ones_vector = tf.ones(shape=(3,))
# Create a TF session to execute the TF graph.
sess = tf.compat.v1.Session()

print(sess.run(ones_vector * scaler_input))

# Linear regression using TensorFlow

In this task, you will learn about placeholders, Tensor variables and constants in the TensorFlow graph. You will implement a simple linear regression and optimise its weights using TensorFlow's gradient descent optimiser. A machine learning task in TensorFlow typically means to create a loss function on the output of the graph, and then let TensorFlow update the weights in the graph via any gradient based optimisation method (such as stochastic gradient descent). This is possible because TensorFlow has registered gradients for most of its operators and can hence automatically compute gradients of most functions you maye define via [reverse-mode auto differentation.](https://rufflewind.com/2016-12-30/reverse-mode-automatic-differentiation)


In [None]:
#@title Linear regression: Generate data points


# Often in ML you create a fake data generator to test your model, 
# the data generator simplify the process of creating training data.
# Here we create a simple sine function with added noises. 

# Generate samples of a function we are trying to predict:
samples = 100 #@param {'type': 'number'}

# Fit a line from the x-axis -5 to +5
training_X = np.linspace(-5, 5, samples)
training_Y = np.sin(training_X) + np.random.uniform(-0.5, 0.5, samples)


print(f'training_X.shape={training_X.shape}, trianing_Y.shape={training_Y.shape}')


# plot the generator function 
sns.lineplot(x=training_X, y=training_Y)


In [None]:
# @title Linear regression: Creating the model
# @markdown Make sure to complete all todos here: 



# Reset the TF graph session to "forget" the previously defined model.
tf.reset_default_graph()

# First, create TensorFlow placeholders for input data (xs) and
# output (ys) data. Placeholders are inputs to the computation graph.
# When we run the graph, we need to feed values for the placerholders into the graph.

# TODO: create placeholders for inputs and outputs
X = #TODO
Y = #TODO


# We will try minimzing the mean squared error between our predictions and the
# output. Our predictions will take the form X*W + b, where X is input data,
# W are ou weights, and b is a bias term:
# minimize ||(X*w + b) - y||^2
# To do so, you will need to create some variables for W and b. Variables
# need to be initialised; often a normal distribution is used for this.


# TODO create weight and bias variables
W =#TODO
b = #TODO
# Next, you need to create a node in the graph combining the variables to predict
# the output: Y = X * w + b. Find the appropriate TensorFlow operations to do so.

# TODO: Create the prediction variable
predictions = #TODO

# Finally, we need to define a loss that can be minimized using gradient descent:
# The loss should be the mean squared difference between predictions and outputs.

# TODO: Create loss
loss_function =#TODO


# Add tensorboard visualisation 
writer = tf.summary.FileWriter(logdir, graph=tf.get_default_graph())


In [None]:
#@title Linear regression: Optimisation loop

# Use gradient descent to optimise your variables

# Feel free to play with the optimisation loop 
learning_rate = 0.001 #@param {'type':'number'}
# Maybe experiment using other optimisers such as Adam
optimizer = tf.compat.v1.train.GradientDescentOptimizer (learning_rate).minimize(loss_function)

# We create a session to use the graph and initialize all variables
session = tf.compat.v1.Session()
session.run(tf.compat.v1.global_variables_initializer())

# Optimisation loop
epochs = 1000 #@param {'type':'number'} 

previous_loss = 0.0
for epoch in range(epochs):
    #TODO run the optimize op: look up "feed_dict" in session.run api docs
    _, loss = # TODO run session for optimizer and loss

    if epoch % 10 = 0:
      print(f'Epoch: {epoch}, Loss = {loss}')
    # Termination condition for the optimization loop
    if np.abs(previous_loss - loss) < 0.000001:
        break
    previous_loss = loss

In [None]:
#@title Visualisation 

# TODO try plotting the predictions by using the model to predict outputs,
#TODO, run the prediction operation
model_predictions = # TODO
#TODO Plot using plt.plot both the prediction and compare against the real function
plt.show()

# (non)-Linear regression part 2
Try to understand what the program above did. 
How well did your function fit the data?

In the task so far, you tried to fit a linear model (single line) to a non-linear function (the sine-function) using a gradient based optimisation. 
Try to now expanding your model to be able to fit non-linear functions, effectively turning it into a neural network. You can do this by either manually adding weights, layers and activations or using TensorFlow's existing layer API. 

While at it, it is helpful to visualise what TensorFlow is doing behind the scene, for that we will explore TensorBoard - a debugging tool that let you visualise the data flow programming model of TF.

If you look at the first cell in Colab you will see we have already imported tensorboard and invoked its plugin using `%load_ext tensorboard`. If you are doing this exercise locally, you will want to read on [TensorBoard doc](https://github.com/tensorflow/docs/blob/master/site/en/r1/guide/graph_viz.md) how to set it up locally.




In [None]:
#@title starting tensorboard
%tensorboard --logdir logs

## Tensorboard

Examine the Tensorboard, try to pinpoint how each variable and placeholder you defined is structured in the graph.

To understand the nodes representation give the [Tensorboard Docs](https://github.com/tensorflow/docs/blob/master/site/en/r1/guide/graph_viz.md) a read.


### Tensorboard exercise

Answer these questions and attach it in your submission:
* What happens when you declare a TensorFlow variables and assign it random weights? 
* Can you identify points that are not sequentially dependent on each other as oppurtinity for parallelisation? 
* Explore the loss function (you can expand nodes in Tensorboard graph) and try to describe what is happening there.


# Non-linear regression in TensorFlow

In [None]:
#@title Making the Linear regression non-linear

# Reset the TF graph session to "forget" the previously defined model.
tf.reset_default_graph()

hidden_units = 64 #@param{'type': 'number'}
number_of_layers = 2 #@param{'type': 'number'}
learning_rate = 0.0001 #@param{'type': 'number'}

# Create the model 
model =  tf.keras.Sequential()
#TODO add layers to the model.

# Add tensorboard visualisation 
writer = tf.summary.FileWriter(logdir, graph=tf.get_default_graph())
# Display the model
model.summary()



In [None]:
epochs = 1000 #@param {'type':'number'}
#TODO fit the model 

In [None]:
#@title Visualisation 

#@markdown TODO, run the prediction operation for the model and plot prediction against real function 

plt.show()

## Non-linear regression: More ML

Observe what you have just done. Did your model fit the data perfectly? did it overfit the data? Try giving your model less data and see how well it fit the data, try 70/30 train/test split and test.
If your model is not generalising (performing well) on your test data, maybe you should explore using regularisation techniques or maybe increase training epochs.

Maybe consider adding another term to the generator function, how well your model will generalise? 
Is your model too complex and now you have too many parameters? consider optimisation using Bayesian optimisation (check the advanced section).

If you have reached it so far without any prior TensorFlow experience, well done! it is very confusing at the beginning with many terminologies thrown around. Feel freee to ask for any additional materials if needed.



# Distributed TensorFlow

After completion of the linear regression exercise, you should have a basic intuition of how to create and run dataflow graphs. In the following exercise, you will expand upon this by running distributed TensorFlow.

When using distributed TensorFlow to run large computation graphs in clusters, you have to create a training server and communicate with it:
``` python
import tensorflow as tf
c = tf.constant("Hello, distributed TensorFlow!")
server = tf.train.Server.create_local_server()
sess = tf.Session(server.target)  # Create a session on the server.
sess.run(c)
'Hello, distributed TensorFlow!'
```

While the intricacies of the different distributed training modes are beyond the scope of this tutorial, you may want to read the first sections of the [tutorial](https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/distributed.md).

The purpose of distributed TensorFlow is to have fine-grained control over which part of your computation is executed on which device. For example, when training large machine learning models, you may typically want to use your CPU to read and preprocess training data and your GPU to compute updates on the model. In the next task, you will distribute a simple computation and learn about device scopes and cluster specifications.


In [None]:
import tensorflow as tf
c = tf.constant("Hello, distributed TensorFlow!")
server = tf.train.Server.create_local_server()
sess = tf.Session(server.target)  # Create a session on the server.
sess.run(c)
'Hello, distributed TensorFlow!'

# Distributed MapReduce

MapReduce is a parallel programming model wherein a task is split to multiple workers performing map() operations on the input (such as filtering or sorting data), and another task running a reduce() operation to aggregate the results. A typical example for this would be filtering Tweets for certain hashtags in realtime as a map() task, and a reduce() task to sum up and present the result statistics (image source: Wikipedia):
![](https://www.cl.cam.ac.uk/research/srg/netos/projects/cambridgeplus/2020/640px-Mapreduce_Overview.svg.png)


The goal of your task is to distribute a computation implementing the MapReduce paradigm in distributed TensorFlow. First, you will familiarise yourself with cluster specifications. The principal idea behind running a TensorFlow cluster is to create a cluster specification describing jobs and devices, then initialising a server with this spec:

```python
cluster_spec = tf.train.ClusterSpec({"local": ["localhost:2222", "localhost:2223"]})
server = tf.train.Server(cluster_spec, job_name="local", task_index=task)
```



Because Colab environment is actually a virtual machine running on Google servers, it is not trivial to run multiple threads. For that we will use two tricks to get around it.

First, you will need to download the "server" file

The code inside the server is very simple:
```python
import tensorflow.compat.v1 as tf
import sys


def main(argv):
    # Parse command <name of the task>
    task = int(argv[1])

    # allow only two tasks, either 0 or 1,

    # task 0 maps to port 2222, task 1 maps to port 2223
    cluster_spec = tf.train.ClusterSpec({"local": ["localhost:2222", "localhost:2223"]})
    server = tf.train.Server(cluster_spec, job_name="local", task_index=task)

    print("Initialising server {}".format(task))
    server.start()

    # This server will now wait for instructions - n.b. that we did not
    # define an interrupt signal, so you have to close the terminal to kill it
    server.join()


if __name__ == '__main__':
    main(sys.argv)
```


Once the file is downloaded in your colab, we will execute it using a colab magic
`%%bash --bg` means runs bash command in the background.
So we will start two separate "workers" waiting for tasks from a manager. The next two cells just do that.


In [None]:
!wget "https://raw.githubusercontent.com/samialabed/r244-large-scale-systems/master/server_init.py"

# Download the server file. Feel free to inspect it

In [None]:
%%bash --bg

python server_init.py 0

# run a worker on a background thread

In [None]:
%%bash --bg

python server_init.py 1
# run a second worker on a separate thread

In [None]:
cluster_spec = tf.train.ClusterSpec({"local": ["localhost:2222", "localhost:2223"]})
task_input = tf.placeholder(tf.float32, 100)

# First part: compute mean of half of the input data
with tf.device("/job:local/task:0"):
    local_input_task0 = tf.slice(task_input, [50], [-1])
    local_mean_task0 = tf.reduce_mean(local_input_task0)

# TODO do another half of the computation using another device
with # TODO: 
    pass 

# TODO compute the overall result by combining both results
global_mean = # TODO

# TODO Fill in the session specification
with TODO as sess:
    # Sample some data for the computation
    data = np.random.random(100)

    # and the input data. Output the result.
    mean = sess.run(global_mean, feed_dict={task_input: data})
    print(mean)

# Advance tutorial

If you have managed to complete the whole tutorial and still feeling you want to do more:

## Investigate the "new" Eager execution API:
The old TF API (V1.x) used to have explicit data flow programming, which many data scientists and machine learning practioners found difficult to use.
However, Data flow programming is an important computing paradigm, it allows you to provide better runtime optimisations for programmes and easier handling of multi-processor. 

You can set the tensorflow version in Colab by executing this command in its own cell: ` %tensorflow_version 1.x` or `%tensorflow_version 2.x`

Start a new cell with `%tesnroflow_version 2.x` and try to redo the whole tutorial including distributed training and tensorboard.
* [TensorFlow V2.x tutorials](https://www.tensorflow.org/tutorials)


* **Distributed training in V2**: There is a very good documentation on how to do [distributed training on Colab](https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/guide/distributed_training.ipynb)


## End-to-end computation graphs

In the first exercise, you used Python code to create TensorFlow computations. You then repeatedely invoked a TensorFlow session to perform training. Each session call results in a context switch between your Python program and the TensorFlow runtime. A more sophisticated implementation would include the loop as part of a computation graph so only a single call to TensorFlow is necessary. Look at the TensorFlow control flow operations (e.g. a while loop.) and try modifying your code so all control-flow is in-graph. 


## Advanced exercise: data structures in computation graphs

In the prior exercises, you implemented largely functional transformations using different graph paradigms. The computation required you to send input data through a graph, and the optimizer updated internal state automatically (network weights). In some subfields of machine learning like reinforcement learning (RL), algorithms must manage substantial amounts of state. For example, an RL algorithm often needs to write data to a buffer, and later needs to sample from that buffer. Implementing stateful data structures in TensorFlow can be difficult because there are restrictions on variable manipulation and control flow. In this exercise, you will implement a priority queue in pure session-based TensorFlow.

The priority queue should enable users to insert simple data (e.g. a single integer) with an associated priority. The priority queue has a limited size n (e.g. 10), so you need to think about how priority is maintained when elements are inserted and removed if the queue is full. Further, the priority queue must support a dequeue operation which lets users read any number (< n) of records from the queue, ordered by priority. Hint: Think about how data and priorities may be managed separately. 

## Tune the network parameters with Bayesian optimisation
Machine learning in general has many parameters that govern the speed of convergance (or even convergance in the first place) and the quality of the resulting network.
Choosing these parameters is a difficult task, there are several methods for choosing these parameters: random search, grid search, and Bayesian optimisation.

In this course you will come across Bayesian optimisation (BO) in many different papers, as it is a framework that has found applicability in many different fields.
BO attempt to find set of parameters that minimises an objective function.
This can be the RSE 
