# Introduction to Neural Networks

### Brief historical context

* 1958: Frank Rosenblatt proposes the perceptron as a bioinspired [conexionist](https://en.wikipedia.org/wiki/Connectionism) model. Neural networks are considered conexionist models because they are inspired by biological neurons, which represent information by a network of processing units and learning by the force of its conections;

* 1969: Minsky e Papert (MIT) present training imitations for the original perceptron;

* 1970-2010 Neural Networks winter: artificial inteligence research was much more focused in specialist models with theoretical basement and mathematical guarantees, like SVMs;

* But with the disponibilization of enourmous density of data for training and the use of GPUs for parallel processing, neural networks and deep learning skyrocketed in research circles. Neural Networks were now capable of obtaining models with less and less error, resulting in a new spring of exitement on the field.


### Machine Learning (ML) vs Deep Learning (DL)

Machine Learning is a general field that includes Deep Learning. In ML, algorithms learn a function $f$ from a space of possible functions and a set of training data. "Shallow" methods infer a single function $f$ that relates the input with the output of the model. "Deep" methods ae organized in a set of hierarchical steps that feed information from one to another with composed functions.

# Introduction to Pytorch

In [1]:
import numpy as np
import torch

### Tensors:

Tensors are generalized vector structures with common use in neural networks applications. Pytorch is a machine learning framework that uses the Torch library to create deep learning models, specially neural networks and the basic data structure for the pytorch paradigm is the Tensor. 

Basic usage:
* Generation;
* Size;
* Reshaping;
* Idexing and slicing.

In [9]:
# Generation
A = torch.arange(1,10)
B = torch.ones([2,5])

A,B

(tensor([1, 2, 3, 4, 5, 6, 7, 8, 9]),
 tensor([[1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.]]))

In [18]:
# Size
A.shape, B.shape[1]

(torch.Size([9]), 5)

In [19]:
A.numel()

9

In [20]:
B.numel()

10

In [22]:
# Reshaping
A.reshape([3,3])

# Obs.: this is not an inplace operation

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

In [23]:
# Indexing and slicing
A[0]

tensor(1)

In [24]:
A[-1]

tensor(9)

In [31]:
A[3:]

tensor([4, 5, 6, 7, 8, 9])

In [32]:
A[:-3]

tensor([1, 2, 3, 4, 5, 6])

Basic operations and functions:
* Unary operations;
* Binary element by element operations;
* Concatenation;
* Broadcasting.

In [33]:
C = torch.arange(1,10, dtype=torch.float32)

In [36]:
# Unary operations
print(torch.log(C))
print(torch.exp(C))

tensor([0.0000, 0.6931, 1.0986, 1.3863, 1.6094, 1.7918, 1.9459, 2.0794, 2.1972])
tensor([2.7183e+00, 7.3891e+00, 2.0086e+01, 5.4598e+01, 1.4841e+02, 4.0343e+02,
        1.0966e+03, 2.9810e+03, 8.1031e+03])


In [41]:
D = torch.ones(9)*3
D

tensor([3., 3., 3., 3., 3., 3., 3., 3., 3.])

In [46]:
# Binary element by element operations (must be performed with tensors of the same size)
C + D, C + D, C**D, C / D

(tensor([ 4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.]),
 tensor([ 4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.]),
 tensor([  1.,   8.,  27.,  64., 125., 216., 343., 512., 729.]),
 tensor([0.3333, 0.6667, 1.0000, 1.3333, 1.6667, 2.0000, 2.3333, 2.6667, 3.0000]))

In [47]:
# Logical operations

D > 50

tensor([False, False, False, False, False, False, False, False, False])

In [48]:
D > C

tensor([ True,  True, False, False, False, False, False, False, False])

In [49]:
C[C > D]

tensor([4., 5., 6., 7., 8., 9.])

In [50]:
# Broadcasting: handles operations between tensors of different size
x = torch.arange(4).reshape(4,1)
y = torch.arange(2).reshape(1,2)

x,y

(tensor([[0],
         [1],
         [2],
         [3]]),
 tensor([[0, 1]]))

In [52]:
x + y # x was expanded trought the columns via broadcasting

tensor([[0, 1],
        [1, 2],
        [2, 3],
        [3, 4]])

Tensors copies:
* Shallow copy;
* Deep copy

In [60]:
C = torch.arange(1,10, dtype=torch.float32)

id_before = id(C)
C = C + D # Does not reuse memory/creates a new memory space

id(C) == id_before

False

In [61]:
C = torch.arange(1,10, dtype=torch.float32)

id_before = id(C)
C[:] = C + D # Reuse memory

id(C) == id_before

True

In [62]:
id_before = id(C)

D = C # Shallow copy: D is a "view" of C

id(D) == id_before

True

In [63]:
id_before = id(C)

D = C.clone() # Deep copy: D is a clone of C

id(D) == id_before

False

Vectors, matrices and array operations

* Definition;
* Tensor arithmetics;
* Reduction;
* Dot product;
* Matrix, vector operations;
* Matrix, matrix operations;
* Norm.

In [64]:
x = torch.arange(5, dtype=torch.float64)
A = torch.arange(25, dtype=torch.float64).reshape((5,5))
x, A

(tensor([0., 1., 2., 3., 4.], dtype=torch.float64),
 tensor([[ 0.,  1.,  2.,  3.,  4.],
         [ 5.,  6.,  7.,  8.,  9.],
         [10., 11., 12., 13., 14.],
         [15., 16., 17., 18., 19.],
         [20., 21., 22., 23., 24.]], dtype=torch.float64))

In [65]:
A.T

tensor([[ 0.,  5., 10., 15., 20.],
        [ 1.,  6., 11., 16., 21.],
        [ 2.,  7., 12., 17., 22.],
        [ 3.,  8., 13., 18., 23.],
        [ 4.,  9., 14., 19., 24.]], dtype=torch.float64)

In [67]:
# Reduction: operations that reduce the matrices dimension.
A.sum(), A.sum(axis=1)

(tensor(300., dtype=torch.float64),
 tensor([ 10.,  35.,  60.,  85., 110.], dtype=torch.float64))

In [68]:
A.mean(axis=0)

tensor([10., 11., 12., 13., 14.], dtype=torch.float64)

In [69]:
# Dot product
a = torch.arange(1,6, dtype=torch.float32)
b = torch.ones(5)*3
a,b
    

(tensor([1., 2., 3., 4., 5.]), tensor([3., 3., 3., 3., 3.]))

In [70]:
torch.dot(a,b)

tensor(45.)

In [73]:
# Matrices operations

x = torch.arange(5, dtype=torch.float64)
A = torch.arange(25, dtype=torch.float64).reshape((5,5))

A, x, torch.mv(A,x)

(tensor([[ 0.,  1.,  2.,  3.,  4.],
         [ 5.,  6.,  7.,  8.,  9.],
         [10., 11., 12., 13., 14.],
         [15., 16., 17., 18., 19.],
         [20., 21., 22., 23., 24.]], dtype=torch.float64),
 tensor([0., 1., 2., 3., 4.], dtype=torch.float64),
 tensor([ 30.,  80., 130., 180., 230.], dtype=torch.float64))

In [75]:
A, A, torch.mm(A,A)

(tensor([[ 0.,  1.,  2.,  3.,  4.],
         [ 5.,  6.,  7.,  8.,  9.],
         [10., 11., 12., 13., 14.],
         [15., 16., 17., 18., 19.],
         [20., 21., 22., 23., 24.]], dtype=torch.float64),
 tensor([[ 0.,  1.,  2.,  3.,  4.],
         [ 5.,  6.,  7.,  8.,  9.],
         [10., 11., 12., 13., 14.],
         [15., 16., 17., 18., 19.],
         [20., 21., 22., 23., 24.]], dtype=torch.float64),
 tensor([[ 150.,  160.,  170.,  180.,  190.],
         [ 400.,  435.,  470.,  505.,  540.],
         [ 650.,  710.,  770.,  830.,  890.],
         [ 900.,  985., 1070., 1155., 1240.],
         [1150., 1260., 1370., 1480., 1590.]], dtype=torch.float64))

In [76]:
A@x # shrotcut

tensor([ 30.,  80., 130., 180., 230.], dtype=torch.float64)

In [79]:
# Vector norm
x, torch.norm(x) # euclidean norm

(tensor([0., 1., 2., 3., 4.], dtype=torch.float64),
 tensor(5.4772, dtype=torch.float64))

<br>
<br>
<br>

## Example

In [84]:
A = torch.arange(6, dtype=torch.float32).reshape(2, 3)

B = torch.ones(3, 2)

C = torch.arange(3, dtype=torch.float32)

In [85]:
A, C, torch.mv(A,C)

(tensor([[0., 1., 2.],
         [3., 4., 5.]]),
 tensor([0., 1., 2.]),
 tensor([ 5., 14.]))

In [88]:
A, B, torch.mm(A,B)

(tensor([[0., 1., 2.],
         [3., 4., 5.]]),
 tensor([[1., 1.],
         [1., 1.],
         [1., 1.]]),
 tensor([[ 3.,  3.],
         [12., 12.]]))

In [89]:
A@B

tensor([[ 3.,  3.],
        [12., 12.]])

In [1]:
import torch
import numpy as np

In [4]:
w = torch.tensor([-1, 0, 0.5, -0.2, 0.1, 0.0], dtype=torch.float32).reshape((3,2))
b = torch.tensor([0.5, 0.5], dtype=torch.float32)
x = torch.tensor([1, 2, 3], dtype=torch.float32)

w, b, x

(tensor([[-1.0000,  0.0000],
         [ 0.5000, -0.2000],
         [ 0.1000,  0.0000]]),
 tensor([0.5000, 0.5000]),
 tensor([1., 2., 3.]))

In [5]:
x@w + b

tensor([0.8000, 0.1000])

In [19]:
def q_loss(y_hat, y):
        l = (y.reshape(y_hat.shape) - y_hat)**2
        return l.sum()

def cross_entropy(y_hat, y):
        l = -y.reshape(y_hat.shape)*np.log(y_hat)
        return l.sum()

In [20]:
y_hat = torch.tensor([0.2, 0.1, 0.2, 0.1 , 0.1, 0.3], dtype=torch.float32)
y = torch.tensor([0, 0, 0, 0, 0, 1], dtype=torch.float32)

print('Quadratic loss: ', q_loss(y_hat, y))
print('Cross entropy: ', cross_entropy(y_hat, y))

Quadratic loss:  tensor(0.6000)
Cross entropy:  tensor(1.2040)


In [21]:
y_hat = torch.tensor([1/6, 1/6, 1/6, 1/6 , 1/6, 1/6], dtype=torch.float32)
y = torch.tensor([0, 0, 0, 0, 0, 1], dtype=torch.float32)

print(y_hat, y)

print('Quadratic loss: ', q_loss(y_hat, y))
print('Cross entropy: ', cross_entropy(y_hat, y))

tensor([0.1667, 0.1667, 0.1667, 0.1667, 0.1667, 0.1667]) tensor([0., 0., 0., 0., 0., 1.])
Quadratic loss:  tensor(0.8333)
Cross entropy:  tensor(1.7918)
