# 1. Introduction:

In this project, we implement a basic decision tree classifier in Python from scratch. A decision tree is a widely used machine learning algorithm that is particularly effective for classification tasks. It builds a model in the form of a tree structure, where each internal node represents a test on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes).

The decision tree classifier works by recursively splitting the dataset into subsets based on the attribute that provides the highest information gain, a measure of how well an attribute separates the data into distinct classes. The process continues until the subsets are homogeneous enough (i.e., all instances belong to the same class) or other stopping criteria are met.

This code is organized into several key components:

- TreeNode Class: Defines the structure of each node in the decision tree.
- Helper Functions: Includes functions for reading data from a CSV file, calculating entropy and information gain, and selecting the best attribute for splitting the data.
- Tree Building Function: Recursively constructs the decision tree from the dataset.
- Tree Printing Function: Provides a way to visually inspect the structure of the constructed tree.
- Classification Function: Classifies new instances using the constructed decision tree.
- Main Function: Orchestrates the entire process by reading input data, building the tree, and classifying new instances.

This implementation provides a practical and educational tool for understanding the inner workings of decision trees, from reading and processing data to building and utilizing a decision tree for classification. It is designed to be simple and intuitive, making it a great starting point for anyone interested in learning about decision trees and machine learning in general.

# 2. Algorithm Descriptions:

### Introduction to the ID3 Algorithm

The ID3 (Iterative Dichotomiser 3) algorithm is a popular algorithm used to create decision trees, a type of predictive model used in machine learning for classification tasks. Developed by Ross Quinlan in 1986, ID3 is designed to construct a decision tree by employing a top-down, greedy approach. It uses information gain as a criterion to select the attribute that best separates the data into distinct classes at each step of the tree construction.

### Key Concepts

**1. Entropy:**
Entropy is a measure of the randomness or impurity in a dataset. In the context of decision trees, it quantifies the degree of uncertainty or disorder in the target variable (class labels). The entropy \( H \) of a dataset is given by:

$$
   \ H(S) = -\sum_{i=1}^{n} p_i \log_2(p_i) \
$$

where \( p_i \) is the proportion of instances belonging to class \( i \).

**2. Information Gain:**
Information gain measures the reduction in entropy achieved by partitioning the data based on a particular attribute. It quantifies how well a given attribute separates the data into classes. The information gain \( IG \) of an attribute \( A \) is calculated as:

$$
   \ IG(S, A) = H(S) - \sum_{v \in \text{Values}(A)} \frac{|S_v|}{|S|} H(S_v) \
$$
 
where \( S \) is the dataset, \( S_v \) is the subset of \( S \) where attribute \( A \) has value \( v \), and \( H(S_v) \) is the entropy of the subset \( S_v \).

### The ID3 Algorithm

The ID3 algorithm follows these steps:

1. **Start with the Entire Dataset:**
   Begin with the root node representing the entire dataset.

2. **Calculate Entropy:**
   Calculate the entropy of the current dataset.

3. **Select the Best Attribute:**
   For each attribute, calculate the information gain. Select the attribute with the highest information gain as the splitting criterion.

4. **Split the Data:**
   Partition the dataset into subsets based on the selected attribute's values.

5. **Create Child Nodes:**
   Create a child node for each subset and assign the corresponding subset to the child node.

6. **Repeat Recursively:**
   Repeat the process for each child node using the subset of data associated with that node, excluding the previously used attribute.

7. **Stopping Criteria:**
   The recursion stops when one of the following conditions is met:
   - All instances in a subset belong to the same class.
   - There are no more attributes to split on.
   - The subset is empty.

8. **Assign Class Labels:**
   Assign a class label to leaf nodes based on the majority class in the subset of data.

### Example and Practical Use

Consider a dataset where we want to predict whether a customer will buy a computer based on attributes like age, income, student status, and credit rating. Using the ID3 algorithm, we would:

1. Calculate the initial entropy of the dataset.
2. Compute the information gain for each attribute.
3. Select the attribute with the highest information gain (e.g., 'age').
4. Split the dataset into subsets based on the selected attribute's values (e.g., 'youth', 'middle-aged', 'senior').
5. Repeat the process for each subset, recursively building the tree until the stopping criteria are met.

### Advantages and Limitations

**Advantages:**
- **Simple and Intuitive:** The ID3 algorithm is easy to understand and implement.
- **Effective for Small to Medium-sized Datasets:** It performs well on datasets with a manageable number of attributes and instances.
- **No Need for Feature Scaling:** ID3 handles both categorical and numerical data without requiring scaling.

**Limitations:**
- **Overfitting:** The algorithm may produce a complex tree that overfits the training data, especially if the tree is deep.
- **Bias Towards Attributes with More Levels:** ID3 tends to favor attributes with more distinct values, which might not always be the best choice.
- **Missing Values:** The algorithm does not handle missing values inherently.

### Conclusion

The ID3 algorithm is a foundational method for constructing decision trees, serving as a basis for more advanced algorithms like C4.5 and CART. Its simplicity and effectiveness make it a valuable tool for educational purposes and for solving basic classification problems. Understanding ID3 provides a solid grounding in decision tree learning and the principles of information theory applied in machine learning.

## 3. The `TreeNode` Class

In this section, we will explore the `TreeNode` class, which is a crucial component of our decision tree implementation. This class represents a node in the decision tree, encapsulating the attributes, children, and class labels necessary for making decisions.


### Defining the `TreeNode` Class

The `TreeNode` class has several attributes and methods that help in building and navigating the decision tree.

```python
class TreeNode:
    def __init__(self, attribute=None):
        self.attribute = attribute  # Attribute used for splitting at this node
        self.children = {}  # Dictionary to store children nodes
        self.class_label = None  # Class label for leaf nodes
        self.counter = Counter()  # Counter to keep track of class label occurrences
```

### Attributes

- `attribute`: This is the attribute used for splitting the data at this node. For internal nodes, it determines which feature is used to partition the dataset.

- `children`: A dictionary where the keys are the possible values of the splitting attribute, and the values are the child nodes resulting from the split.

- `class_label`: This is the class label assigned to the node. It is `None` for internal nodes and holds the predicted class for leaf nodes.

- `counter`: A `Counter` object from the `collections` module to keep track of the frequency of class labels in the dataset reaching this node. This helps in making predictions and understanding the distribution of the data.

### Initialization

The `__init__` method initializes the node with the given attribute (default is `None`). It also initializes an empty dictionary for child nodes, sets the class label to `None`, and initializes a `Counter` to keep track of class labels.

In [15]:
import csv  # Import the CSV module to handle CSV file operations
import math  # Import the math module to perform mathematical operations
from collections import Counter  # Import Counter from collections module to count occurrences of elements

# Define a class for tree nodes used in the decision tree
class TreeNode:
    def __init__(self, attribute=None):
        self.attribute = attribute  # Attribute used for splitting at this node
        self.children = {}  # Dictionary to store children nodes
        self.class_label = None  # Class label for leaf nodes
        self.counter = Counter()  # Counter to keep track of class label occurrences

# Function to read CSV file and return the data as a list of dictionaries
def read_csv(file_path):
    data = []  # Initialize an empty list to store the data
    with open(file_path, 'r') as file:  # Open the CSV file in read mode
        reader = csv.DictReader(file)  # Create a CSV reader object to read the file as a dictionary
        for row in reader:  # Iterate over each row in the CSV file
            cleaned_row = {key: value for key, value in row.items() if key != 'ID'}  # Skip the 'ID' column
            data.append(cleaned_row)  # Append the cleaned row to the data list
    return data  # Return the list of dictionaries

# 4. Entropy Calculation

In this section, we will explore the `entropy` function, which calculates the entropy of a dataset based on the class labels. Entropy is a measure of impurity or disorder in the dataset, and it helps in determining the effectiveness of attribute splits in decision tree algorithms.

## Defining the `entropy` Function

The `entropy` function calculates the entropy of a dataset based on the class labels.

```python
def entropy(data, class_label):
    if not data:
        return 0
    label_column = [row[class_label] for row in data]
    label_counts = Counter(label_column)
    entropy_val = 0
    total_examples = len(data)
    for label in label_counts:
        label_prob = label_counts[label] / total_examples
        entropy_val -= label_prob * math.log2(label_prob)
    return entropy_val
```

### Explanation

- **data**: The dataset for which entropy needs to be calculated.
- **class_label**: The column name representing the class labels in the dataset.
- **label_column**: A list containing the class labels of all instances in the dataset.
- **label_counts**: A `Counter` object that counts the occurrences of each class label in the dataset.
- **total_examples**: The total number of instances in the dataset.
- **entropy_val**: Initialized to 0, it accumulates the entropy calculation.
- **for label in label_counts**: Iterates over each unique class label.
- **label_prob**: Calculates the probability of each class label occurrence in the dataset.
- **entropy_val**: Updates the entropy calculation based on the probability of each class label occurrence.

In [16]:
def entropy(data, class_label):
    if not data:  # If data is empty, return 0
        return 0
    label_column = [row[class_label] for row in data]  # Extract the class label column
    label_counts = Counter(label_column)  # Count occurrences of each class label
    entropy_val = 0  # Initialize entropy value
    total_examples = len(data)  # Get the total number of examples in the data
    for label in label_counts:  # Iterate over each class label
        label_prob = label_counts[label] / total_examples  # Calculate the probability of the class label
        entropy_val -= label_prob * math.log2(label_prob)  # Update the entropy value
    return entropy_val  # Return the calculated entropy


# 5. Information Gain Calculation

In this section, we will explore the `information_gain` function, which calculates the information gain of an attribute in a dataset. Information gain helps in selecting the best attribute for splitting the dataset in decision tree algorithms.

## Defining the `information_gain` Function

The `information_gain` function calculates the information gain of an attribute based on the class labels.

```python
def information_gain(data, attribute, class_label):
    attribute_values = set([row[attribute] for row in data])
    total_examples = len(data)
    attribute_entropy = 0
    for value in attribute_values:
        subset = [row for row in data if row[attribute] == value]
        subset_entropy = entropy(subset, class_label)
        subset_size = len(subset)
        attribute_entropy += (subset_size / total_examples) * subset_entropy
    return entropy(data, class_label) - attribute_entropy
```

### Explanation

- **data**: The dataset for which information gain needs to be calculated.
- **attribute**: The attribute for which information gain is calculated.
- **class_label**: The column name representing the class labels in the dataset.
- **attribute_values**: A set containing all unique values of the attribute.
- **total_examples**: The total number of instances in the dataset.
- **attribute_entropy**: Initialized to 0, it accumulates the entropy of subsets based on attribute values.
- **for value in attribute_values**: Iterates over each unique value of the attribute.
- **subset**: Filters the dataset to include only instances with the current attribute value.
- **subset_entropy**: Calculates the entropy of the subset using the `entropy` function.
- **subset_size**: The number of instances in the subset.
- **attribute_entropy**: Updates the attribute entropy based on the subset entropy and size.

In [9]:
# Function to calculate the information gain of an attribute
def information_gain(data, attribute, class_label):
    attribute_values = set([row[attribute] for row in data])  # Get unique values of the attribute
    total_examples = len(data)  # Get the total number of examples in the data
    attribute_entropy = 0  # Initialize the attribute entropy
    for value in attribute_values:  # Iterate over each unique value of the attribute
        subset = [row for row in data if row[attribute] == value]  # Get the subset of data with the attribute value
        subset_entropy = entropy(subset, class_label)  # Calculate the entropy of the subset
        subset_size = len(subset)  # Get the size of the subset
        attribute_entropy += (subset_size / total_examples) * subset_entropy  # Update the attribute entropy
    return entropy(data, class_label) - attribute_entropy  # Return the information gain

# 6. Choose Best Attribute for Splitting

In this section, we will explore the `choose_best_attribute` function, which selects the best attribute for splitting the dataset based on the highest information gain. This process is crucial in decision tree algorithms for determining the optimal attribute to partition the data.

## Defining the `choose_best_attribute` Function

The `choose_best_attribute` function selects the best attribute for splitting the dataset based on the highest information gain.

```python
def choose_best_attribute(data, attributes, class_label):
    gains = {attr: information_gain(data, attr, class_label) for attr in attributes}
    best_attribute = max(gains, key=gains.get)
    return best_attribute
```

### Explanation

- **data**: The dataset for which the best attribute needs to be selected.
- **attributes**: A list of attributes available for splitting.
- **class_label**: The column name representing the class labels in the dataset.
- **gains**: A dictionary comprehension that calculates the information gain for each attribute.
- **best_attribute**: Finds the attribute with the highest information gain using the `max` function.

In [10]:
# Function to choose the best attribute for splitting the data
def choose_best_attribute(data, attributes, class_label):
    gains = {attr: information_gain(data, attr, class_label) for attr in attributes}  # Calculate information gain for each attribute
    best_attribute = max(gains, key=gains.get)  # Find the attribute with the maximum information gain
    return best_attribute  # Return the best attribute

# 7. Building the Decision Tree

In this section, we will explore the `build_tree` function, which constructs a decision tree recursively by selecting the best attribute for splitting at each node. This process is fundamental in decision tree algorithms for creating an optimal tree structure for classification tasks.

## Defining the `build_tree` Function

The `build_tree` function constructs a decision tree recursively by selecting the best attribute for splitting at each node.

```python
def build_tree(data, attributes, class_label):
    if not data:
        return None

    # Create a new tree node
    node = TreeNode()

    # Check if all examples have the same class label
    class_column = [row[class_label] for row in data]
    if len(set(class_column)) == 1:
        node.class_label = class_column[0]
        node.counter = Counter(class_column)
        return node

    # Choose the best attribute to split on
    best_attribute = choose_best_attribute(data, attributes, class_label)
    node.attribute = best_attribute

    # Split data based on the chosen attribute
    attribute_values = set([row[best_attribute] for row in data])
    for value in attribute_values:
        subset = [row for row in data if row[best_attribute] == value]
        child_node = build_tree(subset, [attr for attr in attributes if attr != best_attribute], class_label)
        node.children[value] = child_node

    return node
```

### Explanation

- **data**: The dataset for which the decision tree needs to be constructed.
- **attributes**: A list of attributes available for splitting.
- **class_label**: The column name representing the class labels in the dataset.
- **node**: Represents a node in the decision tree.
- **class_column**: Extracts the class labels from the dataset.
- **if len(set(class_column)) == 1**: Checks if all instances have the same class label. If true, creates a leaf node with the class label.
- **best_attribute**: Selects the best attribute for splitting using the `choose_best_attribute` function.
- **attribute_values**: Extracts unique values of the best attribute.
- **for value in attribute_values**: Iterates over each unique value of the best attribute and recursively builds child nodes.

In [11]:
# Function to build the decision tree
def build_tree(data, attributes, class_label):
    if not data:  # If data is empty, return None
        return None

    node = TreeNode()  # Create a new tree node

    class_column = [row[class_label] for row in data]  # Extract the class label column
    if len(set(class_column)) == 1:  # If all examples have the same class label
        node.class_label = class_column[0]  # Set the class label of the node
        node.counter = Counter(class_column)  # Set the counter for the class label
        return node  # Return the node

    best_attribute = choose_best_attribute(data, attributes, class_label)  # Choose the best attribute to split on
    node.attribute = best_attribute  # Set the attribute of the node

    attribute_values = set([row[best_attribute] for row in data])  # Get unique values of the best attribute
    for value in attribute_values:  # Iterate over each unique value of the best attribute
        subset = [row for row in data if row[best_attribute] == value]  # Get the subset of data with the attribute value
        child_node = build_tree(subset, [attr for attr in attributes if attr != best_attribute], class_label)  # Build the subtree
        node.children[value] = child_node  # Add the subtree as a child of the current node

    return node  # Return the node

# 8. Printing the Decision Tree

In this section, we will explore the `print_tree` function, which prints the structure of a decision tree in a readable format. This visualization helps in understanding the decisions made at each node of the tree.

## Defining the `print_tree` Function

The `print_tree` function prints the structure of the decision tree.

```python
def print_tree(node, depth=0):
    if node.class_label:
        print(f"{'    ' * depth}<Leaf> {node.class_label} ({', '.join(f'{k}: {v}' for k, v in node.counter.items())})")
    else:
        print(f"{'    ' * depth}<{node.attribute}>")
        for value, child_node in node.children.items():
            print(f"{'    ' * (depth + 1)}{value}: ", end='')
            print_tree(child_node, depth + 2)
```

### Explanation

- **node**: The current node in the decision tree.
- **depth**: The current depth in the tree, used for indentation.
- **if node.class_label**: Checks if the node is a leaf node. If true, prints the class label and the count of class labels.
- **else**: If the node is not a leaf, prints the attribute and recursively prints its children.
- **print(f"{'    ' * depth}<{node.attribute}>")**: Prints the attribute of the current node with indentation based on depth.
- **for value, child_node in node.children.items()**: Iterates over each child node and prints its value, then recursively calls `print_tree` on the child node with increased depth.

In [12]:
def print_tree(node, depth=0):
    if node.class_label:  # If the node is a leaf node
        print(f"{'    ' * depth}<Leaf> {node.class_label} ({', '.join(f'{k}: {v}' for k, v in node.counter.items())})")  # Print the class label and counter
    else:  # If the node is an internal node
        print(f"{'    ' * depth}<{node.attribute}>")  # Print the attribute
        for value, child_node in node.children.items():  # Iterate over each child node
            print(f"{'    ' * (depth + 1)}{value}: ", end='')  # Print the attribute value
            print_tree(child_node, depth + 2)  # Recursively print the child node


# 9. Classifying Instances with the Decision Tree

In this section, we will explore the `classify` function, which predicts the class label of an instance using a decision tree. This process involves traversing the decision tree based on the attribute values of the instance until a leaf node is reached, which provides the predicted class label.

## Defining the `classify` Function

The `classify` function predicts the class label of an instance using a decision tree.

```python
def classify(instance, node):
    if node.class_label:
        return node.class_label

    attribute_value = instance.get(node.attribute)
    if attribute_value is None or attribute_value not in node.children:
        # Handle unknown attribute values by returning a default prediction
        # Modify this based on your specific dataset and requirements
        return "Unknown"

    return classify(instance, node.children[attribute_value])
```

### Explanation

- **instance**: The instance for which the class label needs to be predicted.
- **node**: The current node in the decision tree.
- **if node.class_label**: Checks if the current node is a leaf node. If true, returns the class label of the node.
- **attribute_value**: Retrieves the value of the attribute corresponding to the current node from the instance.
- **if attribute_value is None or attribute_value not in node.children**: Checks if the attribute value is unknown or not present in the decision tree. In such cases, returns a default prediction (e.g., "Unknown").
- **return classify(instance, node.children[attribute_value])**: Recursively traverses the decision tree based on the attribute values of the instance.

In [13]:
# Function to classify a new instance using the decision tree
def classify(instance, node):
    if node.class_label:  # If the node is a leaf node
        return node.class_label  # Return the class label

    attribute_value = instance.get(node.attribute)  # Get the attribute value of the instance
    if attribute_value is None or attribute_value not in node.children:  # If the attribute value is unknown
        return "Unknown"  # Return "Unknown"

    return classify(instance, node.children[attribute_value])  # Recursively classify the instance using the child node

# 10. Decision Tree Main Function

In this notebook, we'll explore the `main` function, which orchestrates the process of building a decision tree from a dataset and classifying new instances using the constructed tree.

## Imports

Before diving into the `main` function, let's import the necessary modules:

```python
import csv
import math
from collections import Counter
```

## Defining the `main` Function

The `main` function is the entry point of the decision tree construction and classification process. It reads a dataset from a CSV file, constructs a decision tree, prints the tree structure, and classifies new instances using the constructed tree.

```python
def main():
    # Prompt the user to enter the path to the CSV file
    file_path = input("Enter the path to the CSV file: ")
    
    # Read the dataset from the CSV file
    data = read_csv(file_path)

    # Extract attribute names from the header
    attributes = list(data[0].keys())  # Extract all attribute names from the first row of data
    
    # Remove the 'ID' column if present
    if 'ID' in attributes:
        attributes.remove('ID')
    
    # Extract the class label column
    class_label = attributes[-1]  # Last remaining column is assumed to be the class label
    attributes = attributes[:-1]  # Exclude the class label column

    # Build the decision tree
    root_node = build_tree(data, attributes, class_label)

    # Print the decision tree structure
    print("Decision Tree:")
    print_tree(root_node)

    # Prompt the user to enter attribute values for a new instance
    new_instance = {}
    for attribute in attributes:
        value = input(f"Enter value for '{attribute}': ")
        new_instance[attribute] = value

    # Classify the new instance using the decision tree
    predicted_class = classify(new_instance, root_node)
    print(f"Predicted class for the new instance: {predicted_class}")

# Execute the main function if the script is run directly
if __name__ == "__main__":
    main()
```

### Explanation

- **Prompting for CSV file path**: Asks the user to enter the path to the CSV file containing the dataset.
- **Reading the dataset**: Uses the `read_csv` function to read the dataset from the CSV file.
- **Extracting attributes and class label**: Extracts attribute names and the class label from the dataset.
- **Building the decision tree**: Constructs a decision tree using the `build_tree` function.
- **Printing the decision tree**: Prints the structure of the decision tree using the `print_tree` function.
- **Prompting for attribute values**: Prompts the user to enter attribute values for a new instance.
- **Classifying the new instance**: Classifies the new instance using the constructed decision tree and the `classify` function.

In [None]:
def main():
    file_path = input("Enter the path to the CSV file: ")  # Prompt user to enter the path to the CSV file
    data = read_csv(file_path)  # Read the CSV file and get the data

    attributes = list(data[0].keys())  # Extract attribute names from the header
    if 'ID' in attributes:  # If 'ID' is in the list of attributes
        attributes.remove('ID')  # Remove 'ID' from the list of attributes

    class_label = attributes[-1]  # Last remaining column is assumed to be the class label
    attributes = attributes[:-1]  # Exclude the class label column

    root_node = build_tree(data, attributes, class_label)  # Build the decision tree
    print("Decision Tree:")  # Print a message
    print_tree(root_node)  # Print the decision tree

    new_instance = {}  # Initialize an empty dictionary for the new instance
    for attribute in attributes:  # Iterate over each attribute
        value = input(f"Enter value for '{attribute}': ")  # Prompt user to enter the value for the attribute
        new_instance[attribute] = value  # Add the attribute value to the new instance

    predicted_class = classify(new_instance, root_node)  # Classify the new instance using the decision tree
    print(f"Predicted class for the new instance: {predicted_class}")  # Print the predicted class

if __name__ == "__main__":
    main()  # Execute the main function

# 11. Conclusion

In conclusion, the code presented implements a decision tree classifier from scratch using Python. Here's a summary of its key components and functionalities:

1. **Data Handling**: The code reads a dataset from a CSV file, where each row represents an instance with multiple attributes and a class label.

2. **Decision Tree Construction**: It constructs a decision tree recursively using the ID3 algorithm. At each node, it selects the best attribute for splitting based on information gain and continues partitioning the dataset until it reaches leaf nodes with homogeneous class labels or stops due to certain conditions.

3. **Decision Tree Visualization**: The code provides a function to print the structure of the constructed decision tree in a readable format, allowing users to understand the decision-making process at each node.

4. **Instance Classification**: Once the decision tree is built, the code allows users to input attribute values for new instances interactively. It then predicts the class label of these instances using the constructed decision tree.

5. **Customization**: The code is flexible and can be adapted to different datasets by modifying the handling of unknown attribute values and other specific requirements.

Overall, this code provides a foundational understanding of decision tree algorithms and demonstrates how they can be implemented for classification tasks. It serves as a valuable learning tool for those interested in machine learning algorithms and data analysis.