$$\Large\text{Machine Learning Algorithms}$$


# Machine Learning Algorithms

## **Supervised Learning Algorithms**

### **1. Linear Regression**
Used for predicting continuous values.

$$
Y = \beta_0 + \beta_1X + \epsilon
$$

Where:  
- $Y$ = dependent variable  
- $X$ = independent variable  
- $\beta_0$ = intercept  
- $\beta_1$ = coefficient  
- $\epsilon$ = error term

Cost Function:

$$
J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x_i) - y_i)^2
$$

Gradient Descent Update Rule:

$$
\theta_j := \theta_j - \alpha \frac{\partial J(\theta)}{\partial \theta_j}
$$

---

### **2. Logistic Regression**
Used for binary classification.

$$
h_{\theta}(x) = \frac{1}{1 + e^{-\theta^T x}}
$$

Loss Function (Binary Cross-Entropy):

$$
J(\theta) = -\frac{1}{m} \sum_{i=1}^{m} \left[y_i \log(h_{\theta}(x_i)) + (1 - y_i) \log(1 - h_{\theta}(x_i))\right]
$$

---

### **3. Support Vector Machine (SVM)**
Used for classification.

Decision Boundary:

$$
w^T x + b = 0
$$

Hinge Loss:

$$
J(w, b) = \frac{1}{2} ||w||^2 + C \sum_{i=1}^{m} \max(0, 1 - y_i (w^T x_i + b))
$$

---

### **4. k-Nearest Neighbors (KNN)**
Used for classification and regression.

Distance Metric (Euclidean):

$$
d(p, q) = \sqrt{\sum_{i=1}^{n} (p_i - q_i)^2}
$$

---

### **5. Naive Bayes Classifier**
Used for classification.

Bayes Theorem:

$$
P(Y|X) = \frac{P(X|Y) P(Y)}{P(X)}
$$

Gaussian Naive Bayes:

$$
P(X_i | Y) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(X_i - \mu)^2}{2\sigma^2}}
$$

---

### **6. Decision Tree**
Used for classification and regression.

Entropy:

$$
H(S) = -\sum_{i=1}^{n} p_i \log_2 p_i
$$

Gini Impurity:

$$
Gini = 1 - \sum_{i=1}^{n} p_i^2
$$

Information Gain:

$$
IG = H(S) - \sum_{j=1}^{m} \frac{|S_j|}{|S|} H(S_j)
$$

---

### **7. Random Forest**
An ensemble of decision trees.

Prediction:

$$
\hat{y} = \frac{1}{n} \sum_{i=1}^{n} h_i(x)
$$

---

### **8. Gradient Boosting (GBM) & XGBoost**
Boosting algorithm that minimizes loss iteratively.

Update Rule:

$$
F_m(x) = F_{m-1}(x) + \gamma h_m(x)
$$

$\gamma$ is the learning rate.

---

## **Unsupervised Learning Algorithms**

### **9. k-Means Clustering**
Used for clustering.

Centroid Update:

$$
\mu_j = \frac{1}{|C_j|} \sum_{x_i \in C_j} x_i
$$

Distance Metric:

$$
d(x, \mu) = ||x - \mu||^2
$$

---

### **10. Principal Component Analysis (PCA)**
Used for dimensionality reduction.

Eigenvalue Equation:

$$
X^T X v = \lambda v
$$

Projection:

$$
Z = XW
$$

Where $W$ consists of top eigenvectors.

---

### **11. Hierarchical Clustering**
Distance Between Clusters:

- Single Linkage: $d(A, B) = \min (d(a, b))$
- Complete Linkage: $d(A, B) = \max (d(a, b))$
- Average Linkage: $d(A, B) = \frac{1}{|A||B|} \sum d(a, b)$

---

## **Reinforcement Learning Algorithms**

### **12. Q-Learning**
Used in reinforcement learning.

Q-Value Update Rule:

$$
Q(s, a) = Q(s, a) + \alpha \left[ r + \gamma \max_{a'} Q(s', a') - Q(s, a) \right]
$$

Where:
- $\alpha$ = learning rate
- $\gamma$ = discount factor
- $r$ = reward
- $s'$ = next state

---

### **13. Deep Q Network (DQN)**
Uses neural networks to approximate Q-values.

Loss Function:

$$
L(\theta) = \mathbb{E} \left[ (r + \gamma \max_{a'} Q(s', a'; \theta^-) - Q(s, a; \theta))^2 \right]
$$

---

### **14. Policy Gradient Methods**
Used for learning policies directly.

Policy Gradient Update:

$$
\theta = \theta + \alpha \nabla_{\theta} J(\theta)
$$

Where:

$$
J(\theta) = \mathbb{E} [ R ]
$$

---

## **Neural Networks and Deep Learning**

### **15. Artificial Neural Networks (ANN)**
Forward Propagation:

$$
a^{[l]} = g(W^{[l]} a^{[l-1]} + b^{[l]})
$$

Backpropagation (Gradient Descent):

$$
\frac{\partial J}{\partial W^{[l]}} = \frac{\partial J}{\partial a^{[l]}} \cdot \frac{\partial a^{[l]}}{\partial W^{[l]}}
$$

---

### **16. Convolutional Neural Networks (CNN)**
Used for image processing.

Convolution Operation:

$$
S(i, j) = \sum_{m} \sum_{n} X(i+m, j+n) \cdot K(m, n)
$$

---

### **17. Recurrent Neural Networks (RNN)**
Used for sequential data.

Hidden State Update:

$$
h_t = \sigma(W_h h_{t-1} + W_x x_t + b)
$$

---

### **18. Long Short-Term Memory (LSTM)**
Used for long-term dependencies in sequences.

Cell State Update:

$$
C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t
$$

Forget Gate:

$$
f_t = \sigma(W_f [h_{t-1}, x_t] + b_f)
$$

---

## **Generative Models**

### **19. Generative Adversarial Networks (GANs)**
Generator Loss:

$$
L_G = -\mathbb{E} [\log D(G(z))]
$$

Discriminator Loss:

$$
L_D = -\mathbb{E} [\log D(x)] - \mathbb{E} [\log (1 - D(G(z)))]
$$

---

### **20. Variational Autoencoders (VAE)**
Loss Function:

$$
L = \mathbb{E}_{q(z|x)} [\log p(x|z)] - D_{KL}(q(z|x) || p(z))
$$