# Neural Network Formulas

### 1. Probability of All Outcomes
$$
P(all) = \prod_{i=1}^n  \left[ y_i p_i + (1 - y_i) (1 - p_i) \right]
$$

### 2. Logarithmic Loss Example  
$$
- \log(0.6) - \log(0.2) - \log(0.1) - \log(0.7) = 4.8
$$

### 3. Cross-Entropy Loss (Binary Classification)  
$$
\text{Cross-entropy} = - \sum_{i=1}^m \left[ y_i \ln(p_i) + (1 - y_i) \ln(1 - p_i) \right]
$$

### 4. Cross-Entropy Loss (Multiclass Classification)  
$$
\text{Cross-entropy} = - \sum_{i=1}^n \sum_{j=1}^m y_{ij} \ln(p_{ij})
$$

### 5. Error Function for Binary Classification  
$$
\text{Error function} = -\frac{1}{m} \sum_{i=1}^m \left[ (1 - y_i) \ln(1 - \hat{y}_i) + y_i \ln(\hat{y}_i) \right]
$$

### 6. Error Function with Sigmoid Activation  
$$
E(W, b) = -\frac{1}{m} \sum_{i=1}^m \left[ (1 - y_i) \ln(1 - \sigma(W x^{(i)} + b)) + y_i \ln(\sigma(W x^{(i)} + b)) \right]
$$

### 7. Error Function for Multiclass Classification  
$$
\text{Error function} = -\frac{1}{m} \sum_{i=1}^m \sum_{j=1}^n y_{ij} \ln(\hat{y}_{ij})
$$


In the context of neural networks and their optimization, there are several other important formulas and concepts. Here are a few key ones to consider adding:

---

### 8. **Gradient Descent Update Rule**  
The basic formula for updating weights in gradient descent:  
$$
w \leftarrow w - \eta \frac{\partial E}{\partial w}
$$  
Where:  
- \( w \): Weight parameter  
- \( \eta \): Learning rate  
- \( \frac{\partial E}{\partial w} \): Gradient of the loss function with respect to the weight  

---

### 9. **Chain Rule for Backpropagation**  
The chain rule is used to compute gradients during backpropagation:  
$$
\frac{\partial E}{\partial w} = \frac{\partial E}{\partial a} \cdot \frac{\partial a}{\partial z} \cdot \frac{\partial z}{\partial w}
$$  
Where:  
- \( a \): Activation  
- \( z \): Weighted sum before activation  
- \( w \): Weight  

---

### 10. **Sigmoid Activation Function**  
The sigmoid activation function is commonly used in binary classification tasks:  
$$
\sigma(z) = \frac{1}{1 + e^{-z}}
$$  
Its derivative is:  
$$
\sigma'(z) = \sigma(z)(1 - \sigma(z))
$$  

---

### 11. **ReLU (Rectified Linear Unit) Activation Function**  
The ReLU activation function is used for hidden layers:  
$$
f(z) = \max(0, z)
$$  
Its derivative:  
$$
f'(z) =
\begin{cases} 
1 & \text{if } z > 0 \\
0 & \text{if } z \leq 0 
\end{cases}
$$  

---

### 12. **Softmax Function**  
For multiclass classification, the softmax function converts raw scores into probabilities:  
$$
\sigma(z_i) = \frac{e^{z_i}}{\sum_{j=1}^n e^{z_j}}
$$  
Where \( z_i \) is the \( i \)-th raw score, and \( n \) is the number of classes.

---

### 13. **Weight Initialization**  
To prevent vanishing or exploding gradients, weights are often initialized using:  
**He Initialization (for ReLU):**  
$$
w \sim \mathcal{N}(0, \sqrt{\frac{2}{n}})
$$  
**Xavier Initialization (for sigmoid/tanh):**  
$$
w \sim \mathcal{N}(0, \sqrt{\frac{1}{n}})
$$  

---

### 14. **Mean Squared Error (MSE)**  
An alternative loss function used in regression tasks:  
$$
\text{MSE} = \frac{1}{m} \sum_{i=1}^m (\hat{y}_i - y_i)^2
$$  

---

### 15. **L2 Regularization (Weight Decay)**  
To prevent overfitting:  
$$
\text{Regularized Loss} = E(W, b) + \lambda \sum_{i=1}^n w_i^2
$$  
Where \( \lambda \) is the regularization parameter.

---

These formulas are central to understanding how neural networks work, how they are trained, and how they make predictions. If you're exploring a specific area, such as optimization or activation functions, let me know, and I can suggest more!