**1. What is the estimated depth of a Decision Tree trained (unrestricted) on a one million instance
training set?**

The estimated depth of a Decision Tree trained on a one million instance training set can vary depending on various factors, including the complexity of the data and the hyperparameters set for the tree. In an unrestricted scenario (no maximum depth set), the tree could potentially grow very deep, but it may also stop growing when it achieves perfect purity (i.e., all leaf nodes are pure). In practice, it's common to see Decision Trees with depths ranging from a few levels to much deeper, depending on the complexity of the problem and the quality of the data.

**2. Is the Gini impurity of a node usually lower or higher than that of its parent? Is it always
lower/greater, or is it usually lower/greater?**

The Gini impurity of a node in a Decision Tree is typically lower than that of its parent. The Gini impurity measures the impurity or uncertainty of a node, and the goal of the Decision Tree algorithm is to reduce this impurity as it splits the data into child nodes. Each split aims to make the child nodes more homogeneous, which results in a lower Gini impurity compared to the parent node. This process continues until a stopping criterion is met or the tree reaches a predefined maximum depth.

**3. Explain if its a good idea to reduce max depth if a Decision Tree is overfitting the training set?**

It can be a good idea to reduce the maximum depth of a Decision Tree if it is overfitting the training set. Overfitting occurs when the tree becomes too deep and captures noise in the training data, leading to poor generalization to new, unseen data. By reducing the maximum depth, you can make the tree less complex and more likely to generalize well to new data. You should monitor the model's performance on a validation set to determine the optimal depth that balances bias and variance.

**4. Explain if its a good idea to try scaling the input features if a Decision Tree underfits the training
set?**

Scaling the input features is generally not necessary for Decision Trees. Decision Trees are not sensitive to the scale of the input features because they make binary decisions based on feature values and thresholds. Unlike some other algorithms like gradient boosting or support vector machines, Decision Trees do not rely on distance metrics, so feature scaling does not impact their performance.

**5. How much time will it take to train another Decision Tree on a training set of 10 million instances if it takes an hour to train a Decision Tree on a training set with 1 million instances?**

The time it takes to train a Decision Tree on a training set is influenced by several factors, including the algorithm used, the hardware, and software optimizations. However, as a rough estimate, training a Decision Tree on a training set of 10 million instances may take roughly 10 times longer than training it on a training set of 1 million instances. This assumes that the complexity of the tree and other parameters remain similar.

**6. Will setting presort=True speed up training if your training set has 100,000 instances?**

Setting `presort=True` in a Decision Tree can potentially speed up training for small datasets (e.g., 100,000 instances) but can significantly slow down training for larger datasets. When `presort=True`, the algorithm sorts the data at each node to find the best split, which can be computationally expensive for large datasets. For small datasets, the overhead of sorting might be outweighed by the benefit of faster splitting decisions, but for larger datasets, it can become impractical. In practice, it's often better to let the algorithm decide whether to use presorting based on the dataset size. Modern implementations of Decision Trees, like those in scikit-learn, usually have this optimization built in.

**7. Follow these steps to train and fine-tune a Decision Tree for the moons dataset:**

a. To build a moons dataset, use make moons(n samples=10000, noise=0.4).

b. Divide the dataset into a training and a test collection with train test split().

c. To find good hyperparameters values for a DecisionTreeClassifier, use grid search with cross-
validation (with the GridSearchCV class). Try different values for max leaf nodes.

d. Use these hyperparameters to train the model on the entire training set, and then assess its
output on the test set. You can achieve an accuracy of 85 to 87 percent.

In [1]:
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score

X, y = make_moons(n_samples=10000, noise=0.4, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


param_grid = {'max_leaf_nodes': [10, 20, 30, 40, 50]}  # Try different values
tree_clf = DecisionTreeClassifier(random_state=42)
grid_search = GridSearchCV(tree_clf, param_grid, cv=5, verbose=1)
grid_search.fit(X_train, y_train)


best_tree_clf = grid_search.best_estimator_
y_pred = best_tree_clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Test set accuracy:", accuracy)



Fitting 5 folds for each of 5 candidates, totalling 25 fits
Test set accuracy: 0.87


**8. Follow these steps to grow a forest:**

a. Using the same method as before, create 1,000 subsets of the training set, each containing 100 instances chosen at random. You can do this with Scikit-ShuffleSplitLearn class.

b. Using the best hyperparameter values found in the previous exercise, train one Decision Tree on each subset. On the test collection, evaluate these 1,000 Decision Trees. These Decision Trees would likely perform worse than the first Decision Tree, achieving only around 80% accuracy, since they were trained on smaller sets.

c. Now the magic begins. Create 1,000 Decision Tree predictions for each test set case, and keep only the most common prediction (you can do this with SciPy mode() function). Over the test collection, this method gives you majority-vote predictions.

d. On the test range, evaluate these predictions: you should achieve a slightly higher accuracy than the first model (approx 0.5 to 1.5 percent higher). You've successfully learned a Random Forest classifier!

In [2]:
import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from scipy.stats import mode
from sklearn.base import clone

# Create the moons dataset
X, y = make_moons(n_samples=10000, noise=0.4, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# a. Create 1,000 subsets of the training set with 100 instances each
from sklearn.model_selection import ShuffleSplit

n_trees = 1000
n_instances = 100
subsets = []

rs = ShuffleSplit(n_splits=n_trees, test_size=len(X_train) - n_instances, random_state=42)
for train_index, _ in rs.split(X_train):
    X_subset = X_train[train_index]
    y_subset = y_train[train_index]
    subsets.append((X_subset, y_subset))

# b. Train Decision Trees on each subset
decision_trees = []
for X_subset, y_subset in subsets:
    tree_clf = DecisionTreeClassifier(max_leaf_nodes=best_tree_clf.max_leaf_nodes, random_state=42)
    tree_clf.fit(X_subset, y_subset)
    decision_trees.append(tree_clf)

# c. Make predictions and use majority vote
y_preds = np.empty([n_trees, len(X_test)], dtype=np.uint8)
for i, tree_clf in enumerate(decision_trees):
    y_preds[i] = tree_clf.predict(X_test)

y_pred_majority_votes, _ = mode(y_preds, axis=0)

# d. Evaluate the Random Forest
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred_majority_votes.ravel())
print("Random Forest Test Set Accuracy:", accuracy)

Random Forest Test Set Accuracy: 0.872


  y_pred_majority_votes, _ = mode(y_preds, axis=0)
