# Neural Nets: the Universal Approximator

It has been proven that a neural net with a single infinitely-wide hidden layer, can fit any training data to an arbitrary level of accuracy.  Let's play with that a little!

The following blocks create 4 datasets.  Each has the same features, namely 200 points scattered across the number line (these are stored in variable `xs`), but very different target values (stored in variables `y1`, `y2`, `y3`, and `y4`.  Those functions are then plotted.

In [None]:
import numpy as np
import plotly.graph_objects as go

In [None]:
xs=np.linspace(-2,2,num=200)

y1=2*np.cos(2*xs)
y2=np.abs(xs)
y3=-.5*xs*xs+xs-2
y4=[]
for x in xs:
    if x<0:
        y4.append(-2)
    else:
        y4.append(2)
y4=np.array(y4)

In [None]:
p1=go.Scatter(x=xs,y=y1,name='cos')
p2=go.Scatter(x=xs,y=y2,name='abs')
p3=go.Scatter(x=xs,y=y3,name='parab')
p4=go.Scatter(x=xs,y=y4,name='step')
layout=go.Layout(height=500)
fig=go.Figure(data=[p1,p2,p3,p4],layout=layout)
fig.show()

If we wanted to do linear regression on all of these, we would need to build some smart features, each hand designed for each function.  Let's just use a neural net instead.  Let's fit the cosine function first.  These next two blocks import all the necessary libraries, and then create Pytorch tensors called `data` and `targets`, which contain the x and y values we're trying to fit.

The `reshape(-1,1)` function is unusual - Pytorch quite understandably expects the input and output data to be matrices of multiple features, not just single elements, so it expects a 2D array. In our toy examples, here, we need to make our data into 2d arrays. `xs.reshape(-1,1)` turns this from an array of size `(200,)` to one of size `(200,1)`.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

In [None]:
data=torch.tensor(xs.reshape(-1,1), dtype=torch.float32)
targets=torch.tensor(y1.reshape(-1,1),dtype=torch.float32)
print(f'data shape: {data.shape}, targets shape: {targets.shape}')

OK, now let's build a Neural Net.  Use [the class notes as a guide](https://www.usna.edu/Users/cs/SD312/notes/10NNs/NeuronWeb.html), and create a net that takes in a data point of 1 feature, and consists of a single hidden layer of 3 neurons, followed by a Sigmoid activation function, followed by an output layer which outputs 1 value, with no activation.

Create an instance of your object, and then print it.  What do you see?

Again using the class notes as a guide, train your network using mean squared error on this data.

Next, we'd like you to create a plot showing the data you're trying to fit and your predictions.  To do this, first make a set of predictions, then call `.detach()` to extract the raw numbers from the more complex object used to track gradients and other information.  This will make a 200x1 matrix - call `.flatten()` to make a size `(200,)` vector for your plot.

How close are your predictions to the true function?  Be sure to use a legend so we can tell which is the truth and which is your prediction.

You can probably get that pretty close, right?  Now train the same network on the absolute value target (`y2`).  How close are the predictions?  Show me a plot again.

Now show me the other two functions (the parabola and step functions).

That last one, the step function, was probably the worst, right?  Well, there's a number of changes we can make.
- We can make the hidden layer wider, by redefining the network to have a wider hidden layer.
- We could change the [activation function](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity) (ReLU is especially popular for its speed in calculating the gradient).
- We could add more layers, with activation functions of their own.
- We can train it longer.

Try some things.  How closely can you fit it?  How low can you get the loss to go?

Now for the real trick - we can build a neural net that regresses upon multiple functions at once.  Take the neural net that worked the best on the step function above, and redefine it below so that rather than outputting a single value, it outputs 4.

The below cell creates a tensor `targets` which is 200x4, where each row contains the cos of that data point, the abs of that data point, the parabola's value of that data point, and the step function of that datapoint.

In [None]:
alltargets=np.zeros((200,4))
alltargets[:,0]=y1
alltargets[:,1]=y2
alltargets[:,2]=y3
alltargets[:,3]=y4
targets=torch.tensor(alltargets,dtype=torch.float32)
print(f'targets shape is {targets.shape}')

Run your neural net, fitting your data to this 200x4 target.

Make a plot showing the actual functions, and your approximations to those functions (maybe as dotted lines to make it readable).  Your graph will have 8 lines on it.