# Session 2 - 19/01

## Recap on Predictive Modelling:

- Unlike econometrics, the objetive is to predict, rather than explain;

- Hence, the importance of having a test set to assess the model accuracy and precision;

- A key step in predictive modelling is finding a good balance of model complexity, that is: if the model is to simple, it will fail to capture important information, while if it is too complex it will let the noise of the data influence it too much;

    - A good way to control for this issue is through regularization (i.e. imposing a penalty on the magnitude/number of the parameters);

    - A good way to assess the balanced model is through checking bias-variance results (roughly, the performance on test and training sample)

        - Usually, increasing test sample performance comes at a cost in terms of error in the training sample

## Neural Networks - Multi-layer Perceptron

- Neural networks are a family of flexible predictive models composed of multiple interconnected neurons;

- State-of-art performance in problems involving high-dimensional (text, image) and unstructured data (i.e. 3d matrix such as images Height x Width x Color Density);

- A network is composed of many simple non-linear functions: $y_{iu}$ = $f_u$( $x_i$ ; $beta_u$ ) = $sigma$ ($x_i$ * $beta_u$);

- Where $x_i$ and $β_u$ are k-dimensional vectors +1 for the constant 1 and σ is a non-linear activation function;

- In a network, an units output is not the output of the model, but an intermediate step/transformation of the input;

- **Hidden Layers** $f^{(1)}$,...,$f^{(L-1)}$ compute recursively intermediate transformations of the input;

- **Output Layer** $f^{(L)}$ produces the model prediction from the input it received a from $f^{(L-1)}$;

- These units can be seen as capturing the functional form from the data i.e. non-linearities and interactions

- Activations are chosen to help optimisation e.g. differentiable, monotonous, zero-centred, efficiency

    - when the derivative has a small interval, we can incur in vanishing gradient, leading to a disfunctional NN

    - ReLU came to replace Sigmoid function, as they are simpler to compute and have a higher range when differentiated

- If network is too large, but properly trained, overfitting shall not be a concern


In [1]:
import torch

print("PyTorch:", torch.__version__)
print("MPS built:", torch.backends.mps.is_built())
print("MPS available:", torch.backends.mps.is_available())

device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
x = torch.randn(10000, 10000, device=device)
y = x @ x
print("Device:", x.device)

PyTorch: 2.9.1
MPS built: True
MPS available: True
Device: mps:0
