# Neural Networks Introduction

- <https://ml-visualized.com/chapter4/neural_network.html>

An artificial neural network is a series of algorithms that endeavors to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates. 
In this sense, neural networks refer to systems of neurons, either organic or artificial in nature. Neural networks can adapt to changing input; so the network generates the best possible 
result without needing to redesign the output criteria. In general, neural networks are useful to approximate non-linear, multi-dimensional functions. Or basically functions where we don't have any info about their functional form just inputs and outputs.
Early models resembled the biological neural networks (see the preceptron)

[Neural networks](https://en.wikipedia.org/wiki/Neural_network?useskin=vector) are usually arranged into layers, where information passing from layer to the next one firm the first one (input layer), to maybe some intermediate layers (the hidden layers), 
to the last one (the output layer). For each neuron, the input is a linear combination of the outputs of the previous layer connecting to it, and the neuron output is gien by the so-called 
activation function. The strenghst or weights of the network connections change dynamically so to approach optimally (given a minimization procedure) the data used during training usually following an algorithm like [back-propagation](https://en.wikipedia.org/wiki/Backpropagation?useskin=vector).  

<div style="text-align: center;">
    <img src="https://upload.wikimedia.org/wikipedia/commons/9/99/Neural_network_example.svg" alt="Image Description" width="600">
    <figcaption>From: "https://upload.wikimedia.org/wikipedia/commons/9/99/Neural_network_example.svg"</figcaption>
</div>

![Neural network with several layeres (Credit http://alexlenail.me/NN-SVG/index.html)](fig/nn.svg)

The input for each network is computed as 
$$
z = b + \sum_i w_i X_i,
$$
where $b$ is the bias, $w_i$ are the weights, and $X_i$ are the outputs of the neurons in the previous layer and connected t =o this one. Then , the output for this neuron is computed aas
$$
X = f(z),
$$ 
where $f$ is some [activation function](https://en.wikipedia.org/wiki/Activation_function?useskin=vector). Examples for [activations funtions](https://en.wikipedia.org/wiki/Activation_function?useskin=vector#Folding_activation_functions) are
- ReLu
- Tanh
- Sigmoid
- Linear
- ...

The evolution of the training process is measured in *epochs* (an iteration, basically). 



These are some recommended tools to get familiar with neural networks:
- A visual introduction: https://www.youtube.com/watch?v=UOvPeC8WOt8
- But what is a neural network? | Chapter 1, Deep learning: https://www.youtube.com/watch?v=aircAruvnKk
- Neural netoworks full playlist: https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
- Visualization of a fully connected neural network, version 1: https://www.youtube.com/watch?v=Tsvxx-GGlTg
- Watching Neural Networks Learn: https://www.youtube.com/watch?v=TkwXa7Cvfr8
- Neural network Visualization: http://alexlenail.me/NN-SVG/index.html

## Tensorflow playground
Now let's play a bit with a neural network: https://playground.tensorflow.org

![Neural network with several layeres (Credit http://alexlenail.me/NN-SVG/index.html)](fig/tensorflow-playground.png)

## Some applications to basi sciences

### **Biology & Bioinformatics**

1.  Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. *Nature, 596*(7873), 583–589. [https://doi.org/10.1038/s41586-021-03819-2](https://doi.org/10.1038/s41586-021-03819-2)

2.  Avsec, Ž., Agarwal, V., Visentin, D., Ledsam, J. R., Grinsztajn, L., Kelcy, M., ... & Kundaje, A. (2021). Effective gene expression prediction from DNA sequence using deep learning. *Nature Methods, 18*(10), 1173–1182. <https://doi.org/10.1038/s41592-021-01252-x>

3.  Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., ... & Rives, A. (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model. *Science, 379*(6637), 1123-1130. [https://doi.org/10.1126/science.ade2574](https://doi.org/10.1126/science.ade2574)

4.  Watson, J. L., Juergens, D., Bennett, N. R., Trippe, B. L., Yim, J., Eisen, H. E., ... & Baker, D. (2023). De novo design of protein structure and function with RFdiffusion. *Nature, 620*(7976), 1089–1100. [https://doi.org/10.1038/s41586-023-06415-8](https://doi.org/10.1038/s41586-023-06415-8)

### **Chemistry & Materials Science**

-  Satorras, R., Hoogeboom, E., & Welling, M. (2021). E(n) equivariant graph neural networks. *Proceedings of the 38th International Conference on Machine Learning, PMLR 139*, 9323-9332. [https://proceedings.mlr.press/v139/satorras21a.html](https://proceedings.mlr.press/v139/satorras21a.html)

- Żurański, A. M., Martinez Alvarado, J. I., Shields, B. J., & Doyle, A. G. (2021). Predicting Reaction Yields via Supervised Learning. Accounts of Chemical Research, 54(8), 1856–1865. <https://doi.org/10.1021/acs.accounts.0c00770>

- Merchant, A., Batzner, S., Schoenholz, S. S., Aykol, M., Cheon, G., & Cubuk, E. D. (2023). Scaling deep learning for materials discovery. Nature, 624(7990), 80–85. https://doi.org/10.1038/s41586-023-06735-9

- Kulik, H. J., & Tiwary, P. (2022). Artificial intelligence in computational materials science. MRS Bulletin, 47(9), 927–929. https://doi.org/10.1557/s43577-022-00431-1

### **Physics**

- Iten, R., Metger, T., Wilming, H., del Rio, L., & Renner, R. (2020). Discovering Physical Concepts with Neural Networks. Physical Review Letters, 124(1). https://doi.org/10.1103/physrevlett.124.010508

- Karniadakis, G. E., Kevrekidis, I. G., Lu, L., Perdikaris, P., Wang, S., & Yang, L. (2021). Physics-informed machine learning. Nature Reviews Physics, 3(6), 422–440. https://doi.org/10.1038/s42254-021-00314-5

- Kochkov, D., Smith, J. A., Alieva, A., Wang, Q., Brenner, M. P., & Hoyer, S. (2021). Machine learning–accelerated computational fluid dynamics. Proceedings of the National Academy of Sciences, 118(21). https://doi.org/10.1073/pnas.2101784118

- Villaescusa-Navarro, F., Anglés-Alcázar, D., Genel, S., Spergel, D. N., S. Somerville, R., Dave, R., Pillepich, A., Hernquist, L., Nelson, D., Torrey, P., Narayanan, D., Li, Y., Philcox, O., La Torre, V., Maria Delgado, A., Ho, S., Hassan, S., Burkhart, B., Wadekar, D., … Bryan, G. L. (2021). The CAMELS Project: Cosmology and Astrophysics with Machine-learning Simulations. The Astrophysical Journal, 915(1), 71. https://doi.org/10.3847/1538-4357/abf7ba

- Hartnett, G. S., Parker, E., & Geist, E. (2018). Replica symmetry breaking in bipartite spin glasses and neural networks. Physical Review E, 98(2). https://doi.org/10.1103/physreve.98.022116

## Simple neural network: A perceptron
Based on https://www.youtube.com/watch?v=kft1AJ9WVDk

This is what we want to train our neural network with:

![Alt Text](fig/neuralnetwork/inputs.png)

And we want to predict the new output (try to guess the rule)

![Alt Text](fig/neuralnetwork/newoutput.png)

This is the neural network that we are going to use (you can also use http://alexlenail.me/NN-SVG/index.html)


In [None]:
from nnv import NNV

layersList = [
    {"title":"input", "units": 3, "color": "darkBlue"},
    {"title":"hidden 1\n(sigmoid)", "units": 1, "edges_color":"red", "edges_width":2},
    {"title":"output\n(sigmoid)", "units": 1,"color": "darkBlue"},
]

NNV(layersList).render()

To understand better the training, let's show explicitly the weights
![weightds](fig/neuralnetwork/weights.png)


Here $\phi$ is called the activation function, and there are several proposals to it. We will use a sigmoid function
$$
f(x) = \dfrac{1}{1+\exp(-x)},
$$
where $x = \sum x_i w_i$.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
sns.set_context('poster')
sns.set_style("whitegrid")

def sigmoid(x) :
    return 1.0/(1 + np.exp(-x))

xdata = np.linspace(-6.0, 6.0, 100)
plt.plot(xdata, sigmoid(xdata))
# Highlight x=0 and y=0 axes
plt.axhline(0, color='black', linestyle='--', linewidth=2.5)  # Horizontal line at y=0
plt.axvline(0, color='black', linestyle='--', linewidth=2.5)  # Vertical line at x=0
# Add labels to the x-axis and y-axis
plt.xlabel("x")
plt.ylabel(rf"sigmoid(x)")


### Basic Implementation
For this very basic nn, we will:
- set the input or start of the algorithm:
    + Random weights $w_i$
    + Set the training inputs and outputs
- Create an iteration function to perform the training for `nsteps` (initially 1)

The we just iterate once and check what happens

In [None]:
import numpy as np

def sigmoid(x) :
    return 1.0/(1 + np.exp(-x))

def get_training_inputs():
    return np.array([[0, 0, 1],
                     [1, 1, 1], 
                     [1, 0, 1],
                     [0, 1, 1]])

def get_training_outputs():
    return np.array([0, 1, 1, 0]).reshape(4, 1)

def get_init_weights():
    """
    Initially, simply return random weights in [-1, 1)
    """
    return np.random.uniform(-1.0, 1.0, size=(3, 1))

def training_one_step(training_inputs, training_outputs, initial_weights):
    # Forward pass
    # iter only once
    input_layer = training_inputs
    outputs = sigmoid(np.dot(input_layer, initial_weights))
    return outputs

In [None]:
np.random.seed(1) # what happens if you comment this?
inputs_t = get_training_inputs()
outputs_t = get_training_outputs()
weights = get_init_weights()
print(inputs_t)
print(outputs_t)
print(weights)

In [None]:
outputs = training_one_step(inputs_t, outputs_t, weights)
print("Training outputs:")
print(outputs_t)
print("Results after one step training:")
print(outputs)

### Improving the training
These results are not optimal, and depend a lot on the initial weights. Also, we are not yet comparing with the expecting output for the training data. We are now going to include it and add correction terms to the weights, so we will be using back-propagation. Our algorithm is now:
- Take each input from the training data.
- Compute the error, i.e. the difference between the output and the expected one, `output - expectedoutput`, $\Delta = \hat y - y$. 
- According to the error, adjust the weights
- Repeat this many times, hopefully getting convergence , and also being able to apply our nn to new cases not used already.

But how to adjust the weights? There are several techniques based on the actual error $\Delta$. 

First, we will define a cost/loss function as the typical MSE
\begin{equation}
COST = \frac{1}{2} (\hat y - y)^2,
\end{equation}
where $\hat y$ is the predicted value and $y$ is the expected one.  Notice that we are not adding up because we only have one output. 

Second, we will apply gradient descent approximation to our problem, in order to improve our coefficients for our data. This means that we will propagate back the errors into the coefficients!
\begin{equation}
w_i' = w_i - \alpha \frac{\partial COST}{\partial w_i},
\end{equation}
where $\alpha$ is the learning rate. Taking into account that 
\begin{equation}
\hat y = \phi(z) = \frac{1}{1 + \exp(-z)},
\end{equation}
and 
\begin{equation}
z = \sum w_i x_i,
\end{equation}
then, by using the chain rule, we have
\begin{equation}
\frac{\partial COST}{\partial w_i} = \frac{\partial COST}{\partial \hat y} \frac{\partial \hat y}{\partial z} \frac{\partial z}{\partial w_i} = (\hat y - y)\phi (1-\phi)x_i. 
\end{equation}


In [None]:
def sigmoid_prime(x):
    return sigmoid(x)*(1-sigmoid(x))

def train_nn(training_inputs, training_outputs, initial_weights, niter, errors_data, alpha = 1.0):
    """
    training_inputs: asdasdasda
    ...
    errors_data: output - stores the errors per iteration
    """
    w = initial_weights
    for ii in range(niter):
        # Forward propagation
        input_layer = training_inputs
        outputs = sigmoid(np.dot(input_layer, w))
        # Backward propagation
        errors = outputs - training_outputs
        deltaw = errors*sigmoid_prime(outputs)
        deltaw = np.dot(input_layer.T, deltaw)
        w = w - alpha*deltaw
        # Save errors for plotting later
        errors_data[ii] = errors.reshape((4,))
    return outputs, w

In [None]:
np.random.seed(1) # what happens if you comment this?
inputs_t = get_training_inputs()
outputs_t = get_training_outputs()
weights = get_init_weights()

In [None]:
NITER = 50000
errors = np.zeros((NITER, 4))
outputs, weights = train_nn(inputs_t, outputs_t, weights, NITER, errors, alpha=0.9)
print("Training outputs:")
print(outputs_t)
print("Results after training:")
print(outputs)
print(weights)


In [None]:
fig, ax = plt.subplots(1, 2, figsize=(20, 5))
ax[0].plot(range(NITER), errors)
ax[0].set_xlabel("Epoch")
ax[0].set_ylabel("Errors")
ax[1].loglog(range(NITER), np.abs(errors))
ax[1].set_xlabel("Epoch")

It seems that our network is very well trained, But how does it perform with a new input? let's check with `[1, 0, 0]`


In [None]:
#print(weights)
#print(weights.shape)
input_new = np.array([1, 0, 0]).reshape(3, 1)
#print(input_new)
#print(input_new.shape)
#print(np.sum(weights*input_new))
print(sigmoid(np.sum(weights*input_new)))

Which is basically one, as expected.
There are more topics related to this that we have not used, like more layers, more neurons per hidden layer, bias on the activation function, and a lot of other details, but hopefully you now see how a neural network works on the core.

Recommended lectures:
- 3blue1brown Neural Networks: https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
- Neural networks from scratch: https://www.youtube.com/watch?v=9RN2Wr8xvro
- Backprop basic: https://www.youtube.com/watch?v=wqPt3qjB6uA
- https://www.youtube.com/watch?v=khUVIZ3MON8&t=0s
