**Neural Networks**

    1. Try out an auto-diff library (e.g. tensorflow) to automatically calculate derivatives of some test functions

Let's start by generating some mock data, and a few test functions. For simplicity, I'll choose test functions that are not singular.

In [64]:
import numpy as np
import torch
def generate_mock_data(n):
    return 8 #np.random.uniform(-10,10, n)

def x_squared(x):
    return x**2

def x_squared_diff(x):
    return 2*x

data = generate_mock_data(n = 1000)

Alright, now let's try to take our mock data, turn it into Tensorflow Tensors.

In [65]:
#!pip install tensorflow
import tensorflow as tf;
data_tensorflow = tf.Variable(data, dtype = float)

We can get the original data back by simply calling

In [66]:
data_tensorflow.numpy()

8.0

We can ask Tensorflow to watch out for changes to the data by calling with "tf.GradientTape() as tape: ". In that white space, we can manipulate the data as we like, and then afterwards ask for gradients. Let's try to run the test functions on the tensor and then ask for gradients, and check that they are indeed correct.

In [67]:
with tf.GradientTape() as tape:
    # For the squared function
    y_squared = x_squared(data_tensorflow)
    y_squared_gradient = tape.gradient(y_squared, data_tensorflow)

print(y_squared_gradient.numpy() == x_squared_diff(data_tensorflow.numpy()))

True


OK. So it works! You can read more about the auto differentiation in tensorflow here: https://www.tensorflow.org/guide/autodiff

Now, let's try the same, but in PyTorch instead. First we turn the data into a Pytorch Tensor.

In [68]:
import torch
data_torch = torch.tensor([data], dtype = torch.float, requires_grad = True) 

When we specify "requires_grad = True" in the call to torch.tensor, we ask PyTorch to keep a track of the gradients of data_torch. Conceptually, it's identical to what we did above for tensorflow using "tf.GradientTape()". Let's now call "x_squared" on "data_torch" and compare gradients from torch with our analytical one.

In [69]:
y_squared = x_squared(data_torch)
y_squared.backward(retain_graph = True) # calculates gradients via back-propagations
print(data_torch.grad.item() == x_squared_diff(data_torch.item()))

True


OK! So it works! You can read more about auto diff in pytorch here: https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html

    2. Use This in your simple bisection search from exercise 3.1

Alright. Let's wrap the auto diff code above into functions first. A few changes are made to the torch backend to make it accomodate arrays instead of scalars. 

In [103]:
import torch
def evaluate_function_and_get_gradients(f, data, backend = 'torch'):
    if backend == 'torch':
        data_torch = torch.tensor(data, dtype = torch.float, requires_grad = True)
        y = f(data_torch)
        y.backward(retain_graph = True, gradient = torch.tensor(np.repeat(1.0, len(data_torch))))
        gradient = data_torch.grad.detach().numpy()
        y = y.detach().numpy()
        del data_torch 
    elif backend == 'tensorflow':
        with tf.GradientTape() as tape:
            data_tensorflow = tf.Variable(data, dtype = float)
            y = f(data_tensorflow)
            gradient= tape.gradient(y, data_tensorflow)
            del data_tensorflow
            y = y.numpy()
    else:
        assert backend in ['tensorflow', 'torch'], f'Recieved backend {backend} but this is not supported.'
    return y, gradient   

OK. Let's snatch the code from the old exercise and adapt it to use autodiff:

In [105]:
def bisection_gradient(interval, f, backend = 'torch'):
    y, gradient = evaluate_function_and_get_gradients(f,interval, backend)
    iterations = 1
    stop = False
    allowed_indices = np.arange(0, len(y),1)
    middle_index = len(allowed_indices)//2
    while (stop == False) and (y[middle_index] != 0): 
        if  y[middle_index] > np.min(y):
            allowed_indices[middle_index:] = -1
        else:
            allowed_indices[:middle_index] = -1
        middle_index = allowed_indices[allowed_indices!= -1][len(allowed_indices[allowed_indices!= -1])//2] 
        iterations += 1
        if iterations > 500:
            stop = True
    return middle_index, y[middle_index], iterations

interval = np.arange(0,1000000,1)
backend = 'tensorflow'
middle_index, lowest_point, iterations = bisection_gradient(interval, x_squared, backend= backend)
print(f"Found Gradient to be 0 at x = {middle_index} in {iterations} iterations using {backend}")

backend = 'torch'
middle_index, lowest_point, iterations = bisection_gradient(interval, x_squared, backend= backend)
print(f"Found Gradient to be 0 at x = {middle_index} in {iterations} iterations using {backend}")
print(f"True 0-point at x= {interval[x_squared_diff(interval)==0]}")

Found Gradient to be 0 at x = 0 in 20 iterations using tensorflow
Found Gradient to be 0 at x = 0 in 20 iterations using torch
True 0-point at x= [0]
