### **Hebbian Learning**

- Neurons strengthen their connection with each other based on repeated co-activation. This is often summarized as **"cells that fire together, wire together."**

### **Simple Perceptron (McCulloch-Pitts Model)**


$$
O = \text{sign}(w^T x + b)
$$
Where:
- $ w $ is the vector of weights.
- $ x $ is the input vector.
- $ b $ is the bias term.
- $ \text{sign}() $ is the activation function that outputs either +1 or -1 depending on the result of the linear sum.

---

### **Network Topologies**

- **Types:**
  - **Feedforward Networks**: Neurons are arranged in layers, with no connections going backward (no loops). This is the simplest form of an artificial neural network.
  - **Recurrent Networks**: Neurons have feedback loops, meaning the output of a neuron can influence its own future input. These are often used in time-series or sequence-based tasks.
  - **Deep Neural Networks (DNNs)**: These are multi-layered networks, usually with many hidden layers between the input and output layers. DNNs can learn more complex, hierarchical representations of data.

- **Deep Neural Networks**: These can solve problems where simpler networks fail, as they can automatically extract complex features from data. However, they are computationally expensive and require more data for training.

---

- **Common Activation Functions**:
  1. **Step function (Heaviside function)**: Outputs either 0 or 1 (binary classification).
  2. **Sigmoid function**: Outputs values between 0 and 1. Often used for probabilistic outputs.
  3. **Tanh function**: Outputs values between -1 and 1. It is a scaled version of the sigmoid.
  4. **ReLU (Rectified Linear Unit)**: Outputs 0 for negative inputs and the input itself for positive inputs. It has become one of the most widely used activation functions for hidden layers.

![Activation Functions](https://www.researchgate.net/profile/Aaron-Stebner-2/publication/341310767/figure/fig7/AS:890211844255749@1589254451431/Common-activation-functions-in-artificial-neural-networks-NNs-that-introduce.ppm)

---

### **Learning Rule**

The **learning rule** defines how a neural network learns from its training data. In supervised learning, the network is trained on labeled examples, and the goal is to adjust weights and biases to minimize the error between predicted and actual outputs.

- **Training Process**: During training adjust using a learning algorithm (e.g., **gradient descent**).
  
- **Epochs**: A complete run of the entire training dataset. Happens more to ensure that the network generalizes well to unseen data.

- **Generalization**: The ability of a neural network to perform well on unseen data after being trained on examples.

### **Perceptron Learning Rule**

- For each training example $ (x_p, y_p) $, where $ x_p $ is the input vector and $ y_p $ is the desired output, the perceptron computes the output $ O_p $ as:
  $$
  O_p = \text{sign}(w^T x_p + b)
  $$
  - If the output $ O_p $ is incorrect, the weights are updated as follows:
  $$
  w' = w + \eta y_p x_p
  $$
  Where:
  - $ \eta $ is the **learning rate** (a small constant, typically between 0 and 1).
  - $ y_p $ is the true label (+1 or -1).
  - $ x_p $ is the input vector for the current pattern.

---

The **AND logical function** is **linearly separable** because you can draw a straight line that separates the true outputs from the false ones.

- **Non-linearly separable**: Problems like **XOR** (exclusive OR) are not linearly separable, meaning no single line can separate the two classes.

#### **XOR Table**:

| Input  | Output |
|--------|--------|
| (0,0)  |   0    |
| (0,1)  |   1    |
| (1,0)  |   1    |
| (1,1)  |   0    |

#### **Solution**:  
- You can use a **multi-layer network** (e.g., a multilayer perceptron with a hidden layer) to transform the data into a higher-dimensional space where it becomes linearly separable.
- **Transformation**: We apply two functions, $ g_1(\mathbf{x}) $ and $ g_2(\mathbf{x}) $, to map the inputs to a new space.
   - We compute:  
     $$
     y_1 = \text{sign}(g_1(\mathbf{x})), \quad y_2 = \text{sign}(g_2(\mathbf{x}))
     $$
3. **New space**: The transformed data points become **linearly separable** in this new space, meaning a straight line can now divide the classes.
