# Decision Tree Assignment - Theoretical Questions


### 1. What is a Decision Tree, and how does it work?  
- A Decision Tree is a supervised learning algorithm used for classification and regression.  

- It splits the data into branches based on feature values, forming a tree-like structure.  

- The tree consists of decision nodes (where splits occur) and leaf nodes (final predictions).  

- It follows a recursive splitting process to maximize information gain or minimize impurity.  

----


### 2. What are impurity measures in Decision Trees?  
- Impurity measures indicate the disorder in a dataset at a node.  

- Common impurity measures include **Gini Impurity** and **Entropy**.  

- Lower impurity means better homogeneity of data in a node.  

- These measures guide the tree in making optimal splits.  

----

### 3. What is the mathematical formula for Gini Impurity?  

  - The Gini Impurity measures the probability of misclassifying a randomly chosen element:

Gini  =

![alt text](gini.png)
---

### 4. What is the mathematical formula for Entropy?


![alt text](entropy.png)
---


### 5. What is Information Gain, and how is it used in Decision Trees?  
- **Information Gain (IG)** measures the reduction in impurity after a split.  
- It is calculated as:  

![alt text](dt.png)

- The split with the highest information gain is chosen.  

- It ensures the tree makes the most informative decisions.  
---


### 6. What is the difference between Gini Impurity and Entropy?  

- **Gini Impurity** measures the probability of misclassification, while **Entropy** measures information disorder.  

- Entropy is more computationally expensive due to logarithmic calculations. 

- Gini is often preferred in decision trees due to its simplicity.  

- Both methods lead to similar results but may differ in splitting behavior.  

----



### 7. What is the mathematical explanation behind Decision Trees?  

- Decision Trees use recursive binary splitting based on impurity measures. 

- The **best split** is chosen using Information Gain, Gini Impurity, or Entropy.  

- The process continues until a stopping criterion is met (e.g., max depth, min samples per leaf).  

- A tree can be pruned to prevent overfitting.  

--- 



### 8. What is Pre-Pruning in Decision Trees?  

- Pre-Pruning stops tree growth early to avoid overfitting.  

- It applies constraints like **maximum depth** or **minimum samples per leaf**.  

- It reduces model complexity and improves generalization.  

- However, it may stop before finding the best splits.  

----



### 9. What is Post-Pruning in Decision Trees?  

- Post-Pruning removes branches after the tree is fully grown.  

- It prunes nodes that do not improve model performance on validation data.  

- Common methods include **cost complexity pruning** and **reduced error pruning**.  

- It helps balance bias and variance for better generalization.  

----



### 10. What is the difference between Pre-Pruning and Post-Pruning?  

- **Pre-Pruning** stops the tree from growing beyond certain limits, while **Post-Pruning** trims an overgrown tree.  

- Pre-Pruning is **proactive**, while Post-Pruning is **reactive**.  

- Pre-Pruning may miss important splits, while Post-Pruning ensures optimal simplification.  

- Post-Pruning typically yields better results by analyzing tree performance.  

---



### 11. What is a Decision Tree Regressor?  

- A **Decision Tree Regressor** is used for predicting continuous values.  

- Instead of classification, it minimizes the variance in target values.  

- Splits are chosen based on metrics like **Mean Squared Error (MSE)**.  

- It works well on non-linear relationships but may overfit.  

---



### 12. What are the advantages and disadvantages of Decision Trees?  

**Advantages:**  

- Simple and easy to interpret.  

- Handles both numerical and categorical data.  

- Requires minimal data preprocessing.  

---



**Disadvantages:**  

- Prone to overfitting without pruning.  

- Sensitive to small changes in data.  

- Greedy algorithm may not find the optimal tree.  

---



### 13. How does a Decision Tree handle missing values?  

- It can **ignore missing values** and choose the best split based on available data.  

- It can **assign missing values to the most frequent category** (for categorical data).  

- Some implementations use **surrogate splits** to handle missing values.  

- Missing values can also be imputed before training.  

---



### 14.  How does a Decision Tree handle categorical features?  

- It uses **one-hot encoding** or **label encoding** to convert categories into numerical values.  

- Some implementations can handle categorical splits directly (e.g., CART for categorical features).  

- Feature selection is based on Information Gain or Gini Impurity.  

- It creates binary or multi-way splits depending on the implementation.  

----



### 15. What are some real-world applications of Decision Trees?  

- **Medical Diagnosis:** Identifying diseases based on symptoms.  

- **Customer Segmentation:** Categorizing customers for marketing.  

- **Fraud Detection:** Detecting fraudulent transactions.  

- **Credit Scoring:** Assessing loan eligibility based on customer profiles.  

- **Recommendation Systems:** Suggesting products based on user preferences.  


----