# Machine Learning



# **Logistic Regression**

Logistic regression is a supervised learning algorithm used for **binary classification** problems, where the output variable can take only two possible values, like "Yes/No," "Pass/Fail," or "0/1." It predicts the probability of an event occurring by mapping inputs to probabilities.

---

## **1. How Logistic Regression Works**

### **Step 1: Input Features**
- Logistic regression takes in one or more input features, $(x_1, x_2, \dots, x_n)$, which could represent things like:
  - Age
  - Income
  - Hours studied, etc.

### **Step 2: Linear Combination (Log-Odds)**
- Logistic regression first calculates a weighted sum of the inputs:
  
  $
  z = w_1x_1 + w_2x_2 + \dots + w_nx_n + b
  $
  
  Here:
  -$(w_1, w_2, \dots, w_n)$: Weights or coefficients that determine the importance of each feature.
  
  - \(b\): Bias term, which shifts the curve.

  This value \(z\) is called the **log-odds**.

---

### **Step 3: Sigmoid Function**
- The log-odds (\(z\)) is then passed through the **sigmoid function** to map it to a probability value between 0 and 1:
  $
  P(y=1) = \frac{1}{1 + e^{-z}}
  $
  The sigmoid function looks like an "S"-shaped curve and ensures the output is always in the range [0, 1].

---

### **Step 4: Decision Rule**
- Based on the probability \(P(y=1)\), logistic regression makes a classification decision:
  - If \(P(y=1) > 0.5\), predict \(y=1\).
  - If \(P(y=1) \leq 0.5\), predict \(y=0\).

---

## **2. Mathematical Intuition**
- The output of logistic regression is the probability of \(y=1\), calculated as:
  $
  P(y=1) = \frac{1}{1 + e^{-(w_1x_1 + w_2x_2 + \dots + w_nx_n + b)}}
  $
  - The term \(e^{-z}\) ensures that probabilities decay smoothly.

- Alternatively, in terms of odds (the ratio of probabilities):
  $
  \text{Odds} = \frac{P(y=1)}{1 - P(y=1)} = e^z
  $

- Taking the log of odds (logit function):
  $
  \log\left(\frac{P(y=1)}{1 - P(y=1)}\right) = z
  $
  Logistic regression essentially models this relationship.

---

## **3. Example: Predicting if a Student Passes a Test**

### **Dataset**
| Hours Studied (\(x\)) | Pass (\(y\)) |
|------------------------|--------------|
| 1                      | 0            |
| 2                      | 0            |
| 3                      | 0            |
| 4                      | 1            |
| 5                      | 1            |
| 6                      | 1            |

### **Step-by-Step Process**
1. **Linear Combination**:
   Suppose we model the relationship as:
   $
   z = 2x - 5
   $
   Where:
   - \(w=2\): Weight for "hours studied."
   - \(b=-5\): Bias term.

2. **Sigmoid Transformation**:
   Apply the sigmoid function:
   $
   P(y=1) = \frac{1}{1 + e^{-(2x - 5)}}
   $
   Example for \(x=3\):
   $
   z = 2(3) - 5 = 1
   $
   $
   P(y=1) = \frac{1}{1 + e^{-1}} \approx 0.73
   $
   So, there's a 73% chance the student passes.

3. **Decision**:
   - Since \(P(y=1) > 0.5\), predict "Pass."

---

## **5. Advantages of Logistic Regression**
- **Interpretable**: The weights (\(w_i\)) show the impact of each feature.
- **Efficient**: Works well on small to medium-sized datasets.
- **Probabilistic Output**: Provides confidence scores, not just predictions.

---

## **6. Limitations**
- **Linear Boundary**: Assumes a linear relationship between features and log-odds. Complex data may require other algorithms.
- **Outliers**: Sensitive to extreme values in the input features.

---

## **7. Use Cases**
- Medical Diagnosis: Predicting if a patient has a disease.
- Email Spam Detection: Classifying emails as spam or not spam.
- Marketing: Predicting if a customer will buy a product.

---

![image.png](attachment:e41749c1-3f9e-425a-b46b-91511ed72294.png)


# **Random Forest**

Random Forest is a machine learning algorithm used for both classification and regression tasks. It combines the predictions of multiple decision trees to improve accuracy and reduce overfitting.

---

## **1. How Random Forest Works**

### **Step 1: Decision Trees**
- A **decision tree** splits the data into branches based on questions like "Is $( \text{Feature 1} > 3.5 )?".$
- At each split (node), the tree chooses the best feature and threshold to separate the data.
- The process continues until a stopping condition is met (e.g., maximum depth or pure leaf nodes).

### **Step 2: Building a Forest**
- Random Forest creates **multiple decision trees** using random subsets of the dataset (bootstrap sampling).
- Each tree is trained on a random subset of features and data, introducing diversity.

### **Step 3: Aggregating Predictions**
- For classification:
  - Each tree votes on the class, and the majority vote determines the final prediction.
- For regression:
  - The predictions from all trees are averaged.

---

## **2. Mathematical Intuition**

### **Tree Construction**
- For each tree:
  1. Select a random sample of the dataset (with replacement), called **bootstrap sampling**.
  2. At each split (node), choose a random subset of features and find the best split.
  3. Continue until a stopping criterion is met (e.g., maximum depth, minimum samples per leaf).

### **Aggregation**
- For classification:
  $
  \text{Final Prediction} = \text{Mode}(\text{Predictions from all trees})
  $
- For regression:
  $
  \text{Final Prediction} = \frac{1}{N} \sum_{i=1}^{N} \hat{y}_i
  $
  where $( N )$ is the number of trees, and $( \hat{y}_i )$ is the prediction from the $( i )-th$ tree.

---

## **3. Example: Classifying Points Based on Two Features**

### **Dataset**
- **Features**: "Feature 1" and "Feature 2".
- **Classes**: 0 (Blue) and 1 (Red).
- (Refer to the scatterplot above for a visual representation.)

### **Process**
1. **Individual Trees**:
   - Each tree splits the dataset differently due to randomness in data and feature selection.
2. **Forest**:
   - The Random Forest aggregates predictions from all trees to make the final decision.

---

## **4. Visual Representation**

1. **Decision Tree**:
   - The first diagram shows a single decision tree used in the forest.
   - Each split is based on a feature threshold to separate data points.
2. **Data Points**:
   - The second diagram shows how the data is distributed across two features, with each class color-coded.

---

## **5. Advantages of Random Forest**
- **Robust to Overfitting**: Combining multiple trees reduces variance and improves generalization.
- **Handles Missing Data**: Trees can handle missing values in some cases.
- **Feature Importance**: Random Forest can rank features based on their contribution to predictions.

---

## **6. Limitations**
- **Training Time**: Building many trees can be computationally expensive for large datasets.
- **Interpretability**: The model is less interpretable than a single decision tree.

---

## **7. Use Cases**
- **Classification**: Spam detection, medical diagnosis.
- **Regression**: Predicting house prices, stock market trends.
- **Feature Selection**: Identifying the most important features in a dataset.

---

![image.png](attachment:254921e7-3bfe-4e39-b6e5-647e72927bfa.png)
![image.png](attachment:eb3e49f9-563f-49fe-8064-a95fba30ba2f.png)