# nmi | spring 2024
## special lecture 04 : neural networks, briefly


### 3 advanced topics, briefly


#### 3.1 other architectures


* multi-layer perceptrons (MLP).

* feedforward neural networks (FNN). for simple predictions, classification.

* convolutional neural networks (CNN). specialized layers for spatial patterns in images for image classification.

* recurrent neural networks (RNN)
looped sequential "memory" for using previous inputs to predict next words, stocks, genes in DNA.

* long short-term memory networks (LSTM)
RNN built to remember over longer sequences for complex tasks like translating paragraphs, predicting plots.

* generative adversarial networks (GAN)
one ai tells lies and another ai calls it out. used for realistic images, music, text, etc.
</br>


#### 3.2 wrt outer layer

<b>vector-valued activation functions</b> output a vector of multiple elements.
</br></br>

<b>gated units</b> combine linear transformation with gates controlled by activation functions that regulate information flow. this allows an architecture like long short-term memory (LSTM) to learn what to remember or forget, going beyond a simple sum.
</br></br>

<b>multiple perceptron layers</b> can stack, producing an output vector whose elements represent different features or class probabilities.
</br></br>

<b>output layer design</b> determins the nature of output. for tasks that require set-like outputs, the output layer might have multiple neurons, each representing a member of the set (and its membership).
</br>


#### 3.3 optimization methods


##### 3.1.1 stochastic gradient descent


* <b>stochastic gradient descent (SGD, minibatch).</b> gradient descent using one or a small batch of (random!) training data point at a time. can be noisy but efficient for large data sets.

* <b>momentum SGD.</b> accumulates past gradients to improve convergence speed and avoid getting stuck in local minima.

* <b>nesterov accelerated gradient (NAG).</b> makes a more informed update by calculating the gradient of the future approximate position of parameters. can speed convergence, improve performce.

* <b>adaptive moment estimation (ADAM).</b> combines ideas of SGD and momentum. estimates learning rates for each weight individually, considering past gradients and their squared values. often faster convergence and better performance than vanilla SGD.

* <b>root mean square propagation (RMSprop).</b> similar to ADAM but uses root mean square of past gradients, which gives more importance to recent large updates. can be useful for problems where gradients fluctuate significantly.

* <b>adaptive gradient (AdaGrad).</b> similar to RMSProp but accumulates squared gradients for all past udpates. that can lead to very small learning rates later in training so Adam and RMSProp used more.
</br>


##### example, neural network with SGD optimization in TensorFlow


###### code, hand-crafted


In [None]:
class SGDRegressor:
  def __init__(self, learning_rate=0.01, epochs=100, batch_size=1, reg=None, reg_param=0.0):
    """
    Constructor for the SGDRegressor.

    Parameters:
    learning_rate (float): The step size used in each update.
    epochs (int): Number of passes over the training dataset.
    batch_size (int): Number of samples to be used in each batch.
    reg (str): Type of regularization ('l1' or 'l2'); None if no regularization.
    reg_param (float): Regularization parameter.

    The weights and bias are initialized as None and will be set during the fit method.
    """
    self.learning_rate = learning_rate
    self.epochs = epochs
    self.batch_size = batch_size
    self.reg = reg
    self.reg_param = reg_param
    self.weights = None
    self.bias = None

  def fit(self, X, y):
    """
    Fits the SGDRegressor to the training data.

    Parameters:
    X (numpy.ndarray): Training data, shape (m_samples, n_features).
    y (numpy.ndarray): Target values, shape (m_samples,).

    This method initializes the weights and bias, and then updates them over a number of epochs.
    """
    m, n = X.shape  # m is number of samples, n is number of features
    self.weights = np.zeros(n)
    self.bias = 0

    for _ in range(self.epochs):
      indices = np.random.permutation(m)
      X_shuffled = X[indices]
      y_shuffled = y[indices]

      for i in range(0, m, self.batch_size):
        X_batch = X_shuffled[i:i+self.batch_size]
        y_batch = y_shuffled[i:i+self.batch_size]

        gradient_w = -2 * np.dot(X_batch.T, (y_batch - np.dot(X_batch, self.weights) - self.bias)) / self.batch_size
        gradient_b = -2 * np.sum(y_batch - np.dot(X_batch, self.weights) - self.bias) / self.batch_size

        if self.reg == 'l1':
          gradient_w += self.reg_param * np.sign(self.weights)
        elif self.reg == 'l2':
          gradient_w += self.reg_param * self.weights

        self.weights -= self.learning_rate * gradient_w
        self.bias -= self.learning_rate * gradient_b

  def predict(self, X):
    """
    Predicts the target values using the linear model.

    Parameters:
    X (numpy.ndarray): Data for which to predict target values.

    Returns:
    numpy.ndarray: Predicted target values.
    """
    return np.dot(X, self.weights) + self.bias

  def compute_loss(self, X, y):
    """
    Computes the loss of the model.

    Parameters:
    X (numpy.ndarray): The input data.
    y (numpy.ndarray): The true target values.

    Returns:
    float: The computed loss value.
    """
    return (np.mean((y - self.predict(X)) ** 2) + self._get_regularization_loss()) ** 0.5

  def _get_regularization_loss(self):
    """
    Computes the regularization loss based on the regularization type.

    Returns:
    float: The regularization loss.
    """
    if self.reg == 'l1':
      return self.reg_param * np.sum(np.abs(self.weights))
    elif self.reg == 'l2':
      return self.reg_param * np.sum(self.weights ** 2)
    else:
      return 0

  def get_weights(self):
    """
    Returns the weights of the model.

    Returns:
    numpy.ndarray: The weights of the linear model.
    """
    return self.weights


<b>SGD, scikit-learn.</b>
</br>


In [None]:
from sklearn.linear_model import SGDRegressor

# Create and fit the model
model = SGDRegressor(max_iter=1000)
model.fit(X, y)

# Making predictions
predictions = model.predict(X)


<b>SGD, tensorflow</b>
</br>


In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD

# Create a simple neural network model
model = Sequential([
  Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
  Dense(1)
])

sgd = SGD(learning_rate=0.01)

# Compile the model with SGD optimizer
model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X, y, epochs=10)


In [None]:
from tensorflow.keras.optimizers import SGD
sgd = SGD(learning_rate=0.01)


##### 3.1.2 learning rate


too high and a minimum may get overshot, causing instability; too low and convergence slows down and may get stuck in a local minimum or stop improving before reaching the minimum.
</br>


<b>learning rate scheduling</b> involves adjusting the learning rate over time.
</br></br>

* <b>time-based decay.</b> learning rate decreases over each update.

* <b>step decay.</b> reduce learning rate after a certain number of epochs.

* <b>exponential decay.</b> decrease the learning rate exponentially.

* <b>adaptive learning rate.</b> methods like AdaGrad, RMSProp, Adam adjust the learning rate automatically during training.
</br>


#### 3.9 example: wildfires


##### data, wildfires


###### code, google drive


In [None]:
# google drive stuff
google_drive = "/content/drive/My Drive/Colab Notebooks/" # colab home directory
local = "nmi/nmi_2401/" # if subdirectory; eg, "test/" # yes, "/" is important
path = google_drive + local


In [None]:
from google.colab import drive
drive.mount('/content/drive')

# this is shell script executed in colab notebook
#!ls "/content/drive/My Drive/Colab Notebooks/"
# "doc", "homework", "recitation" are folders;
# files would show with any associated extensions

Mounted at /content/drive


###### code, import data


In [None]:
import pandas as pd

file = "areaburntbywildfiresbyweek_24044.csv" # file in that local path
df = pd.read_csv(path+file) # csv file read into pandas.dataframe
display(df)


Unnamed: 0,Entity,Code,Year,area burnt by wildfires in 2024,burnt by wildfires in 2023,burnt by wildfires in 2022,burnt by wildfires in 2021,burnt by wildfires in 2020,area burnt by wildfires in 2019,a burnt by wildfires in 2018,area burnt by wildfires in 2017,area burnt by wildfires in 2016,area burnt by wildfires in 2015,area burnt by wildfires in 2014,area burnt by wildfires in 2013,area burnt by wildfires in 2012
0,Afghanistan,AFG,1,391,0,0,1246,0,0,117,0,0,368,0,0,0
1,Africa,0,1,6036545,6666179,5144202,5588420,6755805,7348274,7633876,8487175,8236975,10298865,8682071,8065530,561955
2,Akrotiri and Dhekelia,OWID_AKD,1,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Aland Islands,ALA,1,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Albania,ALB,1,0,0,0,0,317,0,0,0,507,0,0,185,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13411,Western Sahara,ESH,52,0,0,0,0,262,0,145,0,0,0,0,0,0
13412,World,OWID_WRL,52,0,399923200,364051420,384226340,408747400,407480060,329695260,411776500,416143200,443511650,406300200,387962430,437670750
13413,Yemen,YEM,52,0,730,1365,1732,360,892,971,517,0,498,1909,1164,798
13414,Zambia,ZMB,52,0,19000474,20358664,18246986,19424936,21792472,20552240,21293072,23266800,23737028,21109304,23450868,23413762


### und so weiter


##### resources


* deep learning architectures [@dsc](https://www.datasciencecentral.com/concise-visual-summary-of-deep-learning-architectures/)

* mlcroissant [@colab](https://colab.research.google.com/github/mlcommons/croissant/blob/main/python/mlcroissant/recipes/introduction.ipynb)

* croissant [@huggingface](https://huggingface.co/spaces/MLCommons/croissant-editor)


##### lecture resources


* SGD code, cristian leo [@medium](https://towardsdatascience.com/stochastic-gradient-descent-math-and-python-code-35b5e66d6f79)


* wildfires [@kaggle](https://www.kaggle.com/datasets/willianoliveiragibin/wildfires/data)
