In [1]:
%autosave 15

Autosaving every 15 seconds


# i. Decision Tree

**Brief:** Apparently to reach a decision by going through a series of binary decisions.

A **`Decision Tree`** is a popular machine learning algorithm which can be used for classification as well as regression. It is a tree-like model in which internal nodes represent the features or attributes of the data, branches represent the decision rules, and the leaves represent the final decisions or outcomes.

**Process of Building a Decision Tree:** It involves selecting the best feature (one at a time) recursively to split the data into smaller subsets (and build a tree), based on some metric such as information gain or Gini impurity. Each split creates a new node in the tree, which is connected to its parent node by a branch. The leaf nodes of the tree represent the final decision, such as a classification or a predicted value.

**Example:** Say, we have a dataset of customer information for a company, including attributes like age, income, and gender, and we want to use a decision tree to predict whether a customer will make a purchase. 

The decision tree algorithm would first look at all the attributes and **select the one that has the highest information gain, which is a measure of how well a feature can split the data into different classes**. Let's say that the age attribute has the highest information gain. The algorithm would then split the dataset into subsets based on age, with one branch representing customers under the age of 30 and another branch representing customers over the age of 30. The algorithm would then repeat this process for each branch, selecting the attribute with the highest information gain and splitting the data again, until it reaches the leaf nodes, which represent the final classification of whether the customer will make a purchase or not.

### i.i Can decision trees handle missing data?

Yes, Decision trees can handle missing data by using different methods for imputation and splitting nodes. When missing data is encountered during tree construction, the decision tree algorithm can either **impute the missing value**, **ignore the missing value**, or **create a new branch in the tree for instances with missing values**.

**Impuatation:** One common approach for imputation is to use the most common value for categorical features or the mean or median for continuous features, **only when the missing data is not systematic and occurs randomly**. Another approach is to use regression imputation or other more complex methods to estimate the missing values based on the available data.

**Creating a new branch:** For splitting nodes, a decision tree algorithm can use different strategies depending on the type of feature and the nature of the data. One strategy is to create a separate branch for instances with missing values, and use the remaining data to split other nodes.

The choice of strategy depends on the nature of the data and the objectives of the analysis.

### i.ii Can decision trees overfit to the training data?

Yes, decision trees are prone to the overfitting. They are non-parametric algorithms and as such, tend to stick closely to the training data if the tree is deep and too complex.

To avoid overfitting, techniques such as **pruning**, **limiting the maximum depth of the tree**, and **using an appropriate minimum number of samples per leaf node** can be used.    

### i.iii State the advantages and disadvantages of the decision tree.

### Advantages:

- Decision Tress are **very intuitive** and can be easily understood by both technical and non-technical audiences. The decision rules are represented in a tree-like structure, making it easy to visualize the decision-making process.<br><br>

- Decision trees **can handle both numerical and categorical data without the need for data transformation**. This makes them more versatile than other machine learning algorithms that can only handle specific types of data.<br><br>

- Decision trees are a **non-parametric method**, which means that they do not make any assumptions about the distribution of the data. This makes them useful for both linear and non-linear data.<br><br>

- Decision trees can handle missing data by simply ignoring the missing values and splitting the data based on the available values. This makes them more robust to missing data than other machine learning algorithms.<br><br>

- **Feature Selection:** Decision trees assign an **importance score** to each feature **based on its contribution to splitting the data**. This score is calculated by measuring the decrease in impurity of the target variable when a feature is used for splitting. Features with higher importance scores are considered more important for predicting the target variable, and can be selected for further analysis.<br><br>

-  Decision trees **can handle multi-class problems**, which means that they can classify data into more than two classes. This makes them useful for a wide range of classification problems.<br><br>

- Decision trees are **easy to validate** using methods such as cross-validation, which helps to prevent overfitting and improve the generalization performance of the model.

### Disavantages:

- Probablity of overfitting is very high, especially when the tree is deep and complex. This can lead to poor generalization on unseen data.<br><br>

- **Instability:** Decision trees **are sensitive to small variations in the data**, and can produce different trees for different samples of data.<br><br>

- **Greediness:** Decision trees are greedy in nature, and choose the best split at each node based on the available features, without considering the global optimum (what's gonna happen in upcoming levels). This can lead to suboptimal trees.<br><br>

- It takes more time to train a decision tree model than others.

### i.iv Explain the bias-variance trade-off in the decision trees.

**Generally, decision trees have `low bias` and `high variance`.** This is because decision trees are capable of fitting complex and non-linear relationships in the data, which gives them low bias. However, this flexibility can also lead to overfitting, which causes high variance.

# ii. Random Forests