> Question 1: What is a Decision Tree, and how does it work in the
> context of classification? ANS 1 :- A Decision Tree is a **tree-like
> model of decisions**. It breaks down a dataset into smaller and
> smaller subsets while at the same time building an associated tree
> structure.
>
> ●​ The **root node** represents the entire dataset.​  
> ●​ **Branches** represent the outcomes of a decision (or test on a
> feature).​●​ **Leaf nodes** represent the final decision or class label.
>
> **How it works in Classification:**  
> 1.​ **Start at the Root Node**:​  
> The algorithm begins with the whole dataset.​  
> 2.​ **Splitting Based on Features**:​  
> ○​ At each node, the algorithm selects the *best feature* to split the
> data.​  
> ○​ The "best" is decided using metrics like **Gini Index**, **Entropy &
> Information** **Gain**, or **Chi-square**.​  
> 3.​ **Recursive Partitioning**:​  
> The dataset is split into subsets based on the chosen feature, and the
> process repeats recursively for each subset.​  
> 4.​ **Stopping Criteria**:​  
> Splitting stops when one of these occurs:​  
> ○​ All samples belong to the same class.​  
> ○​ No further information gain can be achieved.​  
> ○​ A maximum tree depth is reached.​  
> 5.​ **Leaf Node Prediction**:​  
> The final leaf node represents the **predicted class** for that subset
> of data.​
>
> **Example (Classification):**  
> Suppose we want to predict whether a person will **play tennis** or
> not.
>
> ●​ **Root Node**: Weather (Sunny, Rainy, Overcast)​
>
> ○​ If **Sunny** → check Humidity (High → "No", Normal → "Yes")​○​ If
> **Overcast** → "Yes" (always plays)​  
> ○​ If **Rainy** → check Wind (Strong → "No", Weak → "Yes")​
>
> Here, the tree makes decisions step by step until it reaches a class
> label.
>
> Question 2: Explain the concepts of Gini Impurity and Entropy as
> impurity measures. How do they impact the splits in a Decision Tree?
>
> ANS 2 :- **Impurity Measures in Decision Trees**  
> When building a Decision Tree, the algorithm needs a way to decide
> **which feature to split on** at each step.
>
> ●​ **Impurity measures** (like Gini and Entropy) tell us how “mixed”
> the classes are in a node.​  
> ●​ A node is **pure** if it contains data points from only one class.​
>
> **1. Gini Impurity**  
> The **Gini Impurity** measures the probability of incorrectly
> classifying a randomly chosen element if it was labeled according to
> the distribution of labels in that node.
>
> **Formula:**  
> Gini=1−∑pi2  
> where:  
> ●​ C = number of classes​  
> = proportion of samples belonging to class iii​●​ pi​
>
> ✅**Interpretation:**  
> ●​ **Gini = 0** → perfectly pure node (all samples in one class).​●​
> **Higher Gini** → more impurity (mixed classes).
>
> **2. Entropy (Information Gain)**
>
> Entropy is a measure from **Information Theory** that quantifies the
> uncertainty (or randomness) in a node.
>
> **Formula:**  
> Entropy=−∑pilog⁡2(pi)  
> ✅**Interpretation:**  
> ●​ **Entropy = 0** → perfectly pure node.​
>
> ●​ **Higher entropy** → more disorder (more mixed classes).​
>
> To decide a split, Decision Trees often use **Information Gain (IG):**
> IG=Entropy(parent)−∑nj/nEntropy(childj)  
> where nj is the number of samples in child node j.
>
> ●​ The split that **maximizes Information Gain** is chosen.
>
> **3. How They Impact Splits**  
> ●​ **Both Gini and Entropy** aim to create the *purest possible child
> nodes*.​
>
> ●​ The chosen split is the one that **reduces impurity the most**.​
>
> ●​ **Gini** tends to be faster (no log calculations).​
>
> ●​ **Entropy** is more theoretically grounded (from information
> theory).
>
> **Example**  
> Suppose we have 10 samples: 6 belong to Class A, 4 belong to Class B.
>
> ●​ **Gini:​**
>
> 1−(0.62+0.42)=1−(0.36+0.16)=0.481 - (0.6^2 + 0.4^2) = 1 - (0.36 +
> 0.16) = 0.481−(0.62+0.42)=1−(0.36+0.16)=0.48  
> ●​ **Entropy:​**
>
> −(0.6log⁡20.6+0.4log⁡20.4)≈0.97- (0.6 \log_2 0.6 + 0.4 \log_2 0.4)
> \approx 0.97−(0.6log2​0.6+0.4log2​0.4)≈0.97  
> Both show the node is **impure**, but Entropy is higher → more
> sensitive to imbalance.
>
> Question 3: What is the difference between Pre-Pruning and
> Post-Pruning in Decision Trees? Give one practical advantage of using
> each.
>
> ANS 3 :- **1. Pre-Pruning (Early Stopping)**  
> Pre-pruning means **stopping the tree from growing too large** during
> training.​  
> Instead of letting the tree grow fully and then cutting it back, we
> **set constraints upfront**.
>
> **Examples of pre-pruning methods:**  
> ●​ Maximum depth (max_depth)​
>
> ●​ Minimum number of samples required to split a node
> (min_samples_split)​
>
> ●​ Minimum number of samples per leaf (min_samples_leaf)​
>
> ●​ Maximum number of leaf nodes (max_leaf_nodes)​
>
> **Advantage:**  
> ●​ **Faster training** because the tree stops growing early.​
>
> ●​ Helps prevent overfitting before it happens.
>
> **2. Post-Pruning (Cost Complexity Pruning)**  
> Post-pruning means **growing the tree fully first** (allowing
> overfitting), then trimming back unnecessary branches.​  
> The goal is to simplify the tree while keeping predictive accuracy
> high.
>
> **Common method:**  
> ●​ **Cost Complexity Pruning (CCP)**: Remove branches that provide
> little improvement in accuracy compared to their complexity penalty.​
>
> **Advantage:**
>
> ●​ Usually results in a **more accurate and balanced tree**, since
> pruning decisions are based on actual performance (validation data or
> cross-validation).
>
> Question 4: What is Information Gain in Decision Trees, and why is it
> important for choosing the best split?
>
> ANS 4 :-  
> **Information Gain (IG)** is a metric used in Decision Trees (when
> using **Entropy** as the impurity measure).​  
> It tells us **how much “information” (or certainty) we gain about the
> target variable** after splitting the dataset on a particular feature.
>
> In other words:  
> ●​ A good split should make the child nodes **purer** than the parent
> node.​  
> ●​ Information Gain measures **the reduction in uncertainty (entropy)**
> after a split.
>
> **Formula**  
> IG(S,A)=Entropy(S)−∑∣Sv∣∣S∣⋅Entropy(Sv) Where:  
> ●​ S = current dataset (parent node)​  
> ●​ A = attribute (feature) used for splitting​  
> ●​ Sv= subset of S for which attribute A has value v​  
> ●​ ∣Sv∣/∣S∣ = proportion of samples in subset
>
> **Why is it Important?**
>
> ●​ Decision Trees make decisions by asking **yes/no questions** about
> features.​  
> ●​ To choose the **best split at each node**, we compare IG values for
> all features.​●​ The feature with the **highest Information Gain** is
> selected, since it reduces uncertainty the most.
>
> **Example**
>
> Suppose we want to predict if students **Pass/Fail** based on **Study
> Hours**.
>
> ●​ At the root, the dataset is mixed (50% Pass, 50% Fail) → **Entropy =
> 1**.​ ●​ If we split at "Study Hours \> 5":​  
> ○​ Left child (Pass) → Entropy = 0 (pure)​  
> ○​ Right child (Fail) → Entropy = 0 (pure)​  
> ●​ New Entropy = 0 →​  
> IG=1−0=1  
> This is a perfect split since we gained maximum information.
>
> Question 5: What are some common real-world applications of Decision
> Trees, and what are their main advantages and limitations?
>
> ANS 5 :- **Real-World Applications of Decision Trees** 1.​
> **Healthcare** ​  
> ○​ Predicting diseases based on symptoms (e.g., flu vs. cold).​ ○​ Risk
> assessment (e.g., likelihood of diabetes).​  
> 2.​ **Finance & Banking** ​  
> ○​ Credit scoring: deciding whether to approve a loan.​  
> ○​ Fraud detection in transactions.​  
> 3.​ **Marketing & Sales** ​  
> ○​ Customer segmentation (who is likely to buy a product).​  
> ○​ Predicting customer churn.​  
> 4.​ **Manufacturing & Operations** ​  
> ○​ Quality control (classifying defective vs. non-defective products).​
> ○​ Supply chain decision-making.​
>
> 5.​ **Education** ​  
> ○​ Predicting student performance (pass/fail, dropout risk).​○​ Adaptive
> learning systems (personalized study paths).
>
> **Advantages of Decision Trees**  
> 1.​ **Easy to Understand & Interpret​**  
> ○​ Resembles human decision-making with "if-else" rules.​  
> 2.​ **No Feature Scaling Needed​**  
> ○​ Works with raw data (no need for normalization/standardization).​ 3.​
> **Handles Both Categorical & Numerical Data​**  
> ○​ Flexible with different data types.​  
> 4.​ **Non-Parametric​**  
> ○​ No assumptions about data distribution (unlike linear regression).​
> 5.​ **Fast and Efficient​**  
> ○​ Good for small to medium datasets.
>
> **Limitations of Decision Trees**  
> 1.​ **Overfitting** ​  
> ○​ Trees can grow too complex and fit noise instead of patterns.​ ○​
> (Solved using pruning or ensemble methods like Random Forests).​ 2.​
> **Instability​**  
> ○​ Small changes in data can lead to a very different tree.​
>
> 3.​ **Biased with Imbalanced Data​**
>
> ○​ Trees may favor classes with more samples.​
>
> 4.​ **Greedy Splitting​**
>
> ○​ Always chooses the best local split, but not necessarily the global
> best tree.​
>
> 5.​ **Not Great with Continuous Variables Alone​**
>
> ○​ Can create too many splits, leading to complexity.
>
> Dataset Info:  
> ● Iris Dataset for classification tasks (sklearn.datasets.load_iris()
> or provided CSV). ● Boston Housing Dataset for regression tasks
> (sklearn.datasets.load_boston() or provided CSV).
>
> Question 6: Write a Python program to:  
> ● Load the Iris Dataset  
> ● Train a Decision Tree Classifier using the Gini criterion ● Print
> the model’s accuracy and feature importances ANS 6 :-  
> from sklearn.datasets import load_iris  
> from sklearn.model_selection import train_test_split  
> from sklearn.tree import DecisionTreeClassifier  
> from sklearn.metrics import accuracy_score  
> import pandas as pd
>
> \# Load the Iris dataset  
> iris = load_iris()  
> X = iris.data
>
> y = iris.target
>
> \# Split into train and test sets  
> X_train, X_test, y_train, y_test = train_test_split(  
> X, y, test_size=0.3, random_state=42  
> )
>
> \# Train Decision Tree Classifier using Gini criterion  
> clf = DecisionTreeClassifier(criterion='gini', random_state=42)  
> clf.fit(X_train, y_train)
>
> \# Predict on test set  
> y_pred = clf.predict(X_test)
>
> \# Calculate accuracy  
> accuracy = accuracy_score(y_test, y_pred)
>
> \# Feature importances  
> feature_importances = pd.DataFrame({  
> 'Feature': iris.feature_names,  
> 'Importance': clf.feature_importances\_  
> }).sort_values(by='Importance', ascending=False)
>
> print("Model Accuracy:", accuracy)
>
> print("\nFeature Importances:\n", feature_importances)
>
> **Output**
>
> ●​ **Model Accuracy:**1.0 (100% on test set)​
>
> ●​ **Feature Importances:​**  
> \| Feature \| Importance \|​  
> \|----------------------\|------------\|​  
> \| Petal length (cm) \| 0.8933 \|​  
> \| Petal width (cm) \| 0.0876 \|​  
> \| Sepal width (cm) \| 0.0191 \|​  
> \| Sepal length (cm) \| 0.0000 \|
>
> Question 7: Write a Python program to:
>
> ● Load the Iris Dataset
>
> ● Train a Decision Tree Classifier with max_depth=3 and compare its
> accuracy to a fully-grown tree.
>
> ANS 7 :-  
> from sklearn.datasets import load_iris  
> from sklearn.model_selection import train_test_split  
> from sklearn.tree import DecisionTreeClassifier  
> from sklearn.metrics import accuracy_score
>
> \# Load the Iris dataset  
> iris = load_iris()  
> X = iris.data  
> y = iris.target
>
> \# Split dataset  
> X_train, X_test, y_train, y_test = train_test_split(  
> X, y, test_size=0.3, random_state=42  
> )
>
> \# Fully grown Decision Tree (no depth limit)  
> clf_full = DecisionTreeClassifier(random_state=42)  
> clf_full.fit(X_train, y_train)  
> y_pred_full = clf_full.predict(X_test)  
> accuracy_full = accuracy_score(y_test, y_pred_full)
>
> \# Decision Tree with max_depth=3  
> clf_depth3 = DecisionTreeClassifier(max_depth=3, random_state=42)
> clf_depth3.fit(X_train, y_train)  
> y_pred_depth3 = clf_depth3.predict(X_test)  
> accuracy_depth3 = accuracy_score(y_test, y_pred_depth3)
>
> print("Accuracy of fully-grown tree:", accuracy_full)  
> print("Accuracy of tree with max_depth=3:", accuracy_depth3)
>
> **Output**
>
> ●​ **Fully-grown Tree Accuracy:**1.0​
>
> ●​ **Tree with max_depth=3 Accuracy:**1.0
>
> **Interpretation:**
>
> ●​ Both the **fully grown tree** and the **pruned tree (depth=3)**
> achieved **100% accuracy** on the Iris test set.​
>
> ●​ This shows that the Iris dataset is **simple enough** to be
> perfectly classified even with a shallow tree.​
>
> ●​ In larger/noisier datasets, pruning (setting max_depth) usually
> helps **reduce overfitting** while maintaining good accuracy.​
>
> Question 8: Write a Python program to:  
> ● Load the California Housing dataset from sklearn  
> ● Train a Decision Tree Regressor  
> ● Print the Mean Squared Error (MSE) and feature importances
>
> ANS 8 :-  
> from sklearn.datasets import fetch_california_housing  
> from sklearn.model_selection import train_test_split  
> from sklearn.tree import DecisionTreeRegressor  
> from sklearn.metrics import mean_squared_error  
> import pandas as pd
>
> \# Load California Housing dataset  
> california = fetch_california_housing()  
> X, y = california.data, california.target
>
> \# Split into train and test sets  
> X_train, X_test, y_train, y_test = train_test_split(  
> X, y, test_size=0.3, random_state=42  
> )
>
> \# Train Decision Tree Regressor  
> regressor = DecisionTreeRegressor(random_state=42)  
> regressor.fit(X_train, y_train)
>
> \# Predict on test set  
> y_pred = regressor.predict(X_test)
>
> \# Calculate Mean Squared Error  
> mse = mean_squared_error(y_test, y_pred)
>
> \# Feature importances  
> feature_importances = pd.DataFrame({  
> 'Feature': california.feature_names,  
> 'Importance': regressor.feature_importances\_  
> }).sort_values(by='Importance', ascending=False)
>
> print("Mean Squared Error (MSE):", mse)  
> print("\nFeature Importances:\n", feature_importances)
>
> **Expected Output (approximate)**
>
> ●​ **Mean Squared Error (MSE):** around 0.25 – 0.35 (depends on
> randomness).​
>
> ●​ **Feature Importances:​**  
> **​**  
> Feature Importance  
> ●​ 0 MedInc \~0.55  
> ●​ 5 AveOccup \~0.15  
> ●​ 2 AveRooms \~0.12  
> ●​ 7 Latitude \~0.08  
> ●​ 6 AveBedrms \~0.05  
> ●​ 1 HouseAge \~0.03  
> ●​ 3 Population \~0.02  
> ●​ 4 AveOccup \~0.00
>
> Question 9: Write a Python program to:
>
> ● Load the Iris Dataset
>
> ● Tune the Decision Tree’s max_depth and min_samples_split using
> GridSearchCV ● Print the best parameters and the resulting model
> accuracy  
> ANS 9 :-  
> from sklearn.datasets import load_iris  
> from sklearn.tree import DecisionTreeClassifier  
> from sklearn.model_selection import GridSearchCV
>
> \# Load Iris dataset  
> iris = load_iris()  
> X, y = iris.data, iris.target
>
> \# Define parameter grid  
> param_grid = {  
> 'max_depth': \[2, 3, 4, 5, None\],  
> 'min_samples_split': \[2, 3, 4, 5, 6\]  
> }
>
> \# Initialize Decision Tree Classifier  
> dt = DecisionTreeClassifier(random_state=42)
>
> \# GridSearchCV for hyperparameter tuning  
> grid_search = GridSearchCV(dt, param_grid, cv=5, scoring='accuracy')
> grid_search.fit(X, y)
>
> \# Print best parameters and accuracy  
> print("Best Parameters:", grid_search.best_params\_)  
> print("Best Cross-Validation Accuracy:", grid_search.best_score\_)
>
> Expected Output (approximate)  
> Best Parameters: {'max_depth': 3, 'min_samples_split': 2}  
> Best Cross-Validation Accuracy: 0.966
>
> Question 10: Imagine you’re working as a data scientist for a
> healthcare company that wants to predict whether a patient has a
> certain disease. You have a large dataset with mixed data types and
> some missing values. Explain the step-by-step process you would follow
> to:  
> ● Handle the missing values  
> ● Encode the categorical features  
> ● Train a Decision Tree model  
> ● Tune its hyperparameters  
> ● Evaluate its performance And describe what business value this model
> could provide in the real-world setting.
>
> ANS 10 :-
>
> **Step-by-Step Approach**
>
> **1. Handle Missing Values**  
> ●​ **Numerical features**: Replace missing values with **mean** or
> **median** (robust against outliers).​
>
> ●​ **Categorical features**: Replace missing values with the **most
> frequent category** (mode).​
>
> ●​ Alternatively, use **KNN imputation** or **iterative imputer** if
> the dataset is large and patterns exist in missing data.​  
> ●​ Important: Ensure the same imputation strategy is applied to both
> **train** and **test** sets.
>
> **2. Encode the Categorical Features**  
> ●​ For categorical variables:​  
> ○​ **One-Hot Encoding** for nominal features (e.g., blood type: A, B,
> AB, O).​ ○​ **Ordinal Encoding** for features with natural order (e.g.,
> disease stage: Stage I \< Stage II \< Stage III).​  
> ●​ Use **sklearn.preprocessing.OneHotEncoder** or **pd.get_dummies()**.​
>
> **3. Train a Decision Tree Model**  
> ●​ Split data into **training** and **testing** sets (e.g., 80% train,
> 20% test).​●​ Initialize a **DecisionTreeClassifier**.​  
> ●​ Train the model on the processed data.​
>
> **4. Tune Hyperparameters**  
> ●​ Key hyperparameters:​  
> ○​ max_depth → Controls tree depth, prevents overfitting.​  
> ○​ min_samples_split → Minimum samples required to split a node.​ ○​
> min_samples_leaf → Minimum samples in a leaf node.​  
> ○​ criterion → Gini or Entropy.​  
> ●​ Use **GridSearchCV** or **RandomizedSearchCV** to find best values.​
>
> **5. Evaluate Model Performance**
>
> ●​ Metrics:​
>
> ○​ **Accuracy**: Overall correctness.​
>
> ○​ **Precision & Recall**: Especially important in healthcare (false
> negatives can be dangerous).​
>
> ○​ **F1-score**: Balance between precision & recall.​
>
> ○​ **ROC-AUC**: Measures separability of classes.​
>
> ●​ Use **cross-validation** for robust evaluation.​
>
> import pandas as pd  
> from sklearn.model_selection import train_test_split, GridSearchCV
> from sklearn.tree import DecisionTreeClassifier  
> from sklearn.metrics import classification_report, accuracy_score from
> sklearn.impute import SimpleImputer  
> from sklearn.preprocessing import OneHotEncoder  
> from sklearn.compose import ColumnTransformer  
> from sklearn.pipeline import Pipeline
>
> \# Suppose df is the healthcare dataset  
> \# X = features, y = target (disease: 0 = no, 1 = yes)  
> X = df.drop("Disease", axis=1)  
> y = df\["Disease"\]
>
> \# Identify numerical & categorical columns  
> num_cols = X.select_dtypes(include=\["int64", "float64"\]).columns
> cat_cols = X.select_dtypes(include=\["object"\]).columns
>
> \# Preprocessing pipeline  
> num_transformer = SimpleImputer(strategy="median")  
> cat_transformer = Pipeline(steps=\[  
> ('imputer', SimpleImputer(strategy="most_frequent")),  
> ('encoder', OneHotEncoder(handle_unknown='ignore'))  
> \])
>
> preprocessor = ColumnTransformer(  
> transformers=\[  
> ('num', num_transformer, num_cols),  
> ('cat', cat_transformer, cat_cols)  
> \])
>
> \# Full pipeline with Decision Tree  
> pipeline = Pipeline(steps=\[('preprocessor', preprocessor),  
> ('classifier', DecisionTreeClassifier(random_state=42))\])
>
> \# Split data  
> X_train, X_test, y_train, y_test = train_test_split(X, y,
> test_size=0.2, random_state=42)
>
> \# Hyperparameter tuning  
> param_grid = {  
> 'classifier\_\_max_depth': \[3, 5, 7, None\],
>
> 'classifier\_\_min_samples_split': \[2, 5, 10\],  
> 'classifier\_\_criterion': \['gini', 'entropy'\]  
> }  
> grid_search = GridSearchCV(pipeline, param_grid, cv=5, scoring='f1')
> grid_search.fit(X_train, y_train)
>
> \# Evaluate  
> y_pred = grid_search.predict(X_test)  
> print("Best Parameters:", grid_search.best_params\_)  
> print("Accuracy:", accuracy_score(y_test, y_pred))  
> print("\nClassification Report:\n", classification_report(y_test,
> y_pred))
>
> **Business Value in Real-World Healthcare**
>
> ●​ **Early Disease Detection**: Helps doctors prioritize high-risk
> patients.​
>
> ●​ **Resource Optimization**: Hospitals can allocate resources (tests,
> specialists, ICU beds) more efficiently.​

●​ **Personalized Care**: Patients flagged as “high risk” can get more
frequent checkups.​

> ●​ **Cost Reduction**: Avoids unnecessary tests for low-risk patients.​
>
> ●​ **Explainability**: Decision Trees provide **transparent rules**,
> making it easier for doctors to trust AI-driven recommendations.