<a href="https://colab.research.google.com/github/girupashankar/Machine_Learning/blob/main/Decision_Tree_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Decision Tree Classification Explained

Decision Tree is a popular and widely used machine learning algorithm for classification and regression tasks. It works by recursively splitting the dataset into subsets based on the most significant feature at each node. This process creates a tree-like structure where the leaves represent the class labels or regression values.

### Key Concepts

1. **Nodes**: Represent features or attributes in the dataset.
2. **Edges**: Represent the decision rules based on feature values.
3. **Root Node**: The topmost node that corresponds to the best predictor.
4. **Internal Nodes**: Represent features and are used for decision making.
5. **Leaf Nodes**: Represent the class labels or regression values.
6. **Decision Rules**: Determined by the feature values at each node.

### How Decision Tree Classification Works

1. **Select the Best Split**: Determine the best feature to split the dataset. Common metrics include Gini impurity and information gain.
2. **Split the Dataset**: Split the dataset into subsets based on the selected feature.
3. **Repeat**: Recursively apply the above steps to each subset until all data points in a subset belong to the same class or a subset contains a specified number of data points.
4. **Create the Tree**: The result is a tree with decision nodes and leaf nodes.

### Example: Iris Dataset Classification

Let's use the Iris dataset to classify iris flowers into three different species based on features like sepal length, sepal width, petal length, and petal width.

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Load the dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and fit the Decision Tree classifier model
dt_classifier = DecisionTreeClassifier()
dt_classifier.fit(X_train, y_train)

# Predicting on the test set
y_pred = dt_classifier.predict(X_test)

# Model evaluation
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)


Accuracy: 1.0
Confusion Matrix:
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



### Explanation of the Code

1. **Data Preparation**: We load the Iris dataset and split it into features (X) and target variable (y).
2. **Train-Test Split**: We split the data into training and testing sets for model evaluation.
3. **Model Creation and Training**: We create a DecisionTreeClassifier instance and fit it to the training data.
4. **Prediction and Evaluation**: We predict iris species on the test set and evaluate the model using accuracy, confusion matrix, and classification report.

### Conclusion

Decision Tree is a versatile algorithm that is easy to interpret and understand. It can handle both numerical and categorical data and is robust against outliers. However, decision trees can be prone to overfitting, especially with deep trees and complex datasets. Techniques like pruning, setting a maximum depth, or using ensemble methods like Random Forest can help mitigate overfitting.

If you have any specific questions or need further details on any part of this explanation, feel free to ask! 😊