In [None]:
%run supportvectors-common.ipynb

# Neural Networks as Universal Approximators

In the theory session, we learned that the neural networks are universal approximators. In this lab, we are going to verify with a few simple univariate function examples that this is indeed true.

To make it easier to play around with the concept, use the ``UnivariateApproximator`` class in the ``svlearn.approximator`` module.

In [None]:
import numpy as np
import torch
from typing import Callable, List
from svlearn.approximator.univariate_approximator import (UnivariateApproximator, 
                                                          UnivariatePrediction)

In [None]:
torch.cuda.is_available()


In [4]:
# Ignore this for now, till we learn about dashboards.
#import wandb
#wandb.login()

## A sigmoid-like function



In [None]:
sigmoid_like = lambda x: 1/(1.0 + np.exp(10-15*x)) 
    
approximator = UnivariateApproximator(sigmoid_like)
approximator.train(1)
approximator.evaluate_model()
correlation = approximator.correlation()
print(f'The Pearson correlation between ground truth and prediction is {correlation}')

### Plot the original function and its neural approximation

In [None]:
fig = approximator.create_plots()

# A Wierd function

Let us now consider something more complex:

\begin{equation}
    y = (7 - 5 x + x^2 - 1.5 x^3) \sin(10 x^2) 
\end{equation}

    

In [None]:
# Create the complex function
wierd_x = lambda x:  (7 - 5 * x + x * x - 1.5 * x ** 3)* np.sin(10*x * x)
approximator = UnivariateApproximator(wierd_x)
approximator.train(1)

# Now evaluate the model, and plot it
approximator.evaluate_model()
correlation = approximator.correlation()
print(f'The Pearson correlation between ground truth and prediction is {correlation}')

In [None]:
fig = approximator.create_plots()

# Sinc (x)

Consider a function:

\begin{equation}
y = \frac{\sin(3 x)}{x}
\end{equation}

Let us consider this function over the domain of $x \in [0,1]$


In [None]:


# define the function
def xsinx (x: float) -> float:
    return np.sinc(3*x)

approximator = UnivariateApproximator(xsinx)
approximator.train(1)
approximator.evaluate_model()
correlation = approximator.correlation()
print(f'The Pearson correlation between ground truth and prediction is {correlation}')

### Plot the original function and its neural approximation

In [None]:
fig = approximator.create_plots()

## An interesting function

\begin{equation}
y = \sin(2 \sin(2 \sin(2 \sin(10 x))))
\end{equation}


In [None]:
# define the function
def to_many_sines (x: float) -> float:
    return np.sin (2*np.sin(2*np.sin(2*np.sin(10*x))))

approximator = UnivariateApproximator(to_many_sines)
approximator.train(1)
approximator.evaluate_model()
correlation = approximator.correlation()
print(f'The Pearson correlation between ground truth and prediction is {correlation}')

### Plot the original function and its neural approximation

In [None]:
fig = approximator.create_plots()

## Homework

### Code walkthrough

Carefully walk through the code in the `svlearn.approximator.univariate_approximator` python module. In particular, look at the function `create_network()` to see how a regression network has been created. Can you explain why the input and output layers have only one node?

Now, review the main training loop in the function: `train()`. See how the main loop interates over the many epochs (each epoch is one complete cycle through the data, while learning). Furthermore, note how there is an inner loop of learning, which works only with a mini-batch from the data-loader.

### Different activation  functions

Which activation function is the `UnivariateApproximator` using? Replace it with some other activation functions, and see how it affects the speed of training, as well as the final model quality (loss).

### Different learning rates

What is the learning rate in the `UnivariateApproximator`? What would happen if you increase or decrease the learning rate by a few orders of magnitude? Try it out, and discuss the results in our course slack channel.

### Structure of the neural network

What would happen if you either increase or decrease the number of layers in the neural network? In particular, what would happen if you consider a network with only one hidden layer? Try and find out. Can you get good results with only one layer? If so, what do you observe about the number of nodes you built that layer from?
