#1. What is a Decision Tree, and how does it work in the context of classification?
   - A Decision Tree is a supervised machine learning algorithm that can be used for both classification and regression tasks. It models decisions in a tree-like structure, where each internal node represents a 'test' on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label (in classification) or a numerical value (in regression).

   - In the context of classification, a Decision Tree works by:

   1. Splitting the Data: It starts with a root node that contains the entire dataset. The algorithm then looks for the best attribute to split the data into subsets. The 'best' attribute is typically chosen based on metrics like Gini impurity or information gain, which aim to create the most 'pure' subsets possible .

   2. Recursive Partitioning: This splitting process is then recursively applied to each of the subsets created in the previous step. This continues until a stopping criterion is met.
   
   3. Leaf Nodes: Once the splitting stops, the final nodes are called leaf nodes. Each leaf node is assigned a class label, which is typically the majority class of the samples that end up in that node.

   - How it makes predictions (classification):

      When a new, unseen data point comes in, it traverses the tree from the root node down to a leaf node. At each internal node, it follows the branch corresponding to the outcome of the test on that data point's attribute. Once it reaches a leaf node, the class label assigned to that leaf node is the prediction for the new data point.

#2. : Explain the concepts of Gini Impurity and Entropy as impurity measures. How do they impact the splits in a Decision Tree?
   - Gini Impurity:It tells us how often a randomly chosen data point from the node would be incorrectly classified if we labeled it based on the class proportions.
  Formula: Gini = 1 - (p1² + p2² + … + pk²).
  If a node is pure (only one class), Gini is 0. Higher Gini means more impurity or more mixing of classes.

 - Entropy:
Entropy measures the amount of randomness or disorder in the node.
Formula: Entropy = - Σ (pi log2 pi).
Entropy is 0 when the node is pure. Higher entropy means the node has more uncertainty because it contains a mix of classes.

 - How they affect the split:
A decision tree tries to split the data in such a way that the impurity after the split becomes as low as possible.
  For Entropy, the tree calculates “Information Gain,” which is the reduction in entropy after the split. The split with the highest information gain is chosen. For Gini, the tree calculates the decrease in Gini Impurity after the split and chooses the split that reduces impurity the most.

#3. What is the difference between Pre-Pruning and Post-Pruning in Decision Trees? Give one practical advantage of using each.
  - Pre-Pruning and Post-Pruning are two techniques used to stop a decision tree from becoming too large and overfitting the data.

- Pre-Pruning:
Pre-pruning stops the growth of the tree early, before it becomes too deep.
This is done by setting conditions like minimum samples required to split, maximum depth, or minimum information gain.
If the split does not improve the model enough, the tree stops growing at that point.

- Practical advantage of Pre-Pruning:
It saves time and computation because the tree does not grow unnecessarily large.

- Post-Pruning:
Post-pruning allows the tree to grow fully first. After that, branches that do not improve performance are removed or replaced.
This is usually done by checking accuracy on a validation set and cutting back the parts of the tree that cause overfitting.

- Practical advantage of Post-Pruning:
It usually gives better accuracy because the model first learns all patterns and then removes the noisy or less useful parts.

#4. : What is Information Gain in Decision Trees, and why is it important for choosing the best split?
   - nformation Gain is a measure used in decision trees to decide which feature gives the best split. It tells us how much purity (or reduction in impurity) we achieve after splitting a node using a particular feature.

- Information Gain is calculated as:
Information Gain = impurity of parent node - impurity of child nodes after the split.

- A higher Information Gain means the feature helps separate the classes better and makes the child nodes purer.

- It is important because the decision tree always chooses the split with the highest Information Gain, as this leads to better separation of the data, less impurity, and a more accurate model.

#5. What are some common real-world applications of Decision Trees, and what are their main advantages and limitations?
  - Decision trees are used in many real-world applications because they are simple to understand and interpret.

- Common real-world applications:

    1.Medical diagnosis - predicting diseases based on symptoms.

    2.Banking and finance - loan approval, credit risk assessment, fraud detection.

    3.Marketing - customer segmentation, predicting whether a customer will buy a product.

    4.Agriculture - predicting crop yield, disease detection, and weather-based decisions.

    5.Manufacturing - quality control and identifying defective products.

- Main advantages:
    1.Easy to understand and interpret, even by non-technical people.

    2.Can handle both numerical and categorical data.

    3.Requires little data preprocessing (no need for scaling or normalization).

    4.Works well for non-linear relationships



- Main limitations:

    1.Prone to overfitting if not pruned.

    2.Small changes in data can create a completely different tree (unstable).

    3.Not as accurate as more advanced models like Random Forests or Gradient Boosting.

    4.Can become very large and complex if not controlled.

#6.  Write a Python program to: ● Load the Iris Dataset ● Train a Decision Tree Classifier using the Gini criterion ● Print the model’s accuracy and feature importances .
  -

In [3]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
x= iris.data
y = iris.target

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=1)


model = DecisionTreeClassifier(criterion="gini", random_state=1)
model.fit(x_train, y_train)

y_pred = model.predict(x_test)

accuracy = accuracy_score(y_test, y_pred)

print("Model Accuracy:", accuracy)
print("Feature Importances:", model.feature_importances_)


Model Accuracy: 0.9555555555555556
Feature Importances: [0.02146947 0.02146947 0.06316954 0.89389153]


#7. Write a Python program to: ● Load the Iris Dataset ● Train a Decision Tree Classifier with max_depth=3 and compare its accuracy to a fully-grown tree.

In [5]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
iris = load_iris()
X = iris.data
y = iris.target

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)


pruned_tree = DecisionTreeClassifier(max_depth=3, random_state=1)
pruned_tree.fit(x_train, y_train)
pruned_pred = pruned_tree.predict(x_test)
pruned_accuracy = accuracy_score(y_test, pruned_pred)

full_tree = DecisionTreeClassifier(random_state=1)
full_tree.fit(x_train, y_train)
full_pred = full_tree.predict(x_test)
full_accuracy = accuracy_score(y_test, full_pred)

print("Accuracy of Decision Tree (max_depth=3):", pruned_accuracy)
print("Accuracy of Fully-Grown Decision Tree:", full_accuracy)


Accuracy of Decision Tree (max_depth=3): 0.9666666666666667
Accuracy of Fully-Grown Decision Tree: 0.9666666666666667


#8.: Write a Python program to: ● Load the Boston Housing Dataset ● Train a Decision Tree Regressor ● Print the Mean Squared Error (MSE) and feature importances

In [7]:
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

boston = fetch_openml(name="boston", version=1, as_frame=True)
x = boston.data
y = boston.target

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=1)

model = DecisionTreeRegressor(random_state=1)
model.fit(x_train, y_train)


y_pred = model.predict(x_test)

mse = mean_squared_error(y_test, y_pred)


print("Mean Squared Error (MSE):", mse)
print("Feature Importances:", model.feature_importances_)


Mean Squared Error (MSE): 12.61375
Feature Importances: [0.01867957 0.00071628 0.00141336 0.00230839 0.03006354 0.25015592
 0.00676335 0.08579168 0.00326494 0.00857855 0.01020894 0.02886154
 0.55319392]


#9.  Write a Python program to: ● Load the Iris Dataset ● Tune the Decision Tree’s max_depth and min_samples_split using GridSearchCV ● Print the best parameters and the resulting model accuracy

In [8]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

iris = load_iris()
x = iris.data
y = iris.target

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=42)


param_grid = {
    'max_depth': [2, 3, 4, 5, None],
    'min_samples_split': [2, 3, 4, 5, 10]
}

dt = DecisionTreeClassifier(random_state=1)

grid = GridSearchCV(dt, param_grid, cv=5)
grid.fit(x_train, y_train)

best_model = grid.best_estimator_

y_pred = best_model.predict(x_test)

accuracy = accuracy_score(y_test, y_pred)


print("Best Parameters:", grid.best_params_)
print("Model Accuracy:", accuracy)


Best Parameters: {'max_depth': 4, 'min_samples_split': 10}
Model Accuracy: 1.0


#10.Imagine you’re working as a data scientist for a healthcare company that wants to predict whether a patient has a certain disease. You have a large dataset with mixed data types and some missing values. Explain the step-by-step process you would follow to: ● Handle the missing values ● Encode the categorical features ● Train a Decision Tree model ● Tune its hyperparameters ● Evaluate its performance And describe what business value this model could provide in the real-world setting.
   - Understand the Problem and Data: Before diving into the technical steps, it's crucial to thoroughly understand the problem (predicting a disease) and the nature of the dataset, including its size, mixed data types, and the extent of missing values. This step also involves defining the target variable and features.
   
   - Handle Missing Values: Explain various strategies for addressing missing data, such as imputation (mean, median, mode, or more advanced techniques like k-NN imputation) for numerical features, and mode imputation or creating a 'missing' category for categorical features. The choice depends on the amount of missing data and its potential impact.
   - Encode Categorical Features: Describe methods for converting categorical features into a numerical format that a Decision Tree can process. This includes techniques like One-Hot Encoding for nominal variables and Label Encoding or Ordinal Encoding for ordinal variables, explaining when to use each.
   - Train a Decision Tree Model: Detail the process of splitting the preprocessed dataset into training and testing sets. Then, explain how to initialize and train a Decision Tree Classifier (or Regressor, depending on the target variable) using a library like scikit-learn, briefly touching upon the Gini or Entropy criterion.
  - Tune Model Hyperparameters: Explain the importance of hyperparameter tuning for Decision Trees to prevent overfitting and improve generalization. Discuss common hyperparameters like 'max_depth', 'min_samples_split', 'min_samples_leaf', and 'criterion'. Describe techniques like GridSearchCV or RandomizedSearchCV for systematically finding the optimal combination of these parameters.
  - Evaluate Model Performance: Outline how to evaluate the performance of the tuned Decision Tree model on the unseen test set. Discuss relevant classification metrics such as accuracy, precision, recall, F1-score, and ROC AUC, emphasizing the importance of choosing metrics appropriate for the business problem (e.g., high recall for disease prediction to minimize false negatives).
  - Describe Business Value: Articulate the real-world business value this predictive model could provide for the healthcare company. This includes aspects like early disease detection, improving patient outcomes, optimizing resource allocation, reducing healthcare costs, and enabling proactive interventions.
 - Final Task: Summarize the entire process and its implications.