<a href="https://colab.research.google.com/github/sreevatsava007/ML-CONCLAVE/blob/main/Id3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [18]:
import numpy as np

# Define a simple dataset with two features: Outlook and Temperature, and a target variable: PlayTennis
# Outlook: Sunny=0, Overcast=1, Rainy=2
# Temperature: Hot=0, Mild=1, Cool=2
# PlayTennis: No=0, Yes=1
data = np.array([
    [0, 0, 0],  # Sunny, Hot, No
    [0, 1, 0],  # Sunny, Mild, Yes
    [1, 0, 1],  # Overcast, Hot, Yes
    [2, 2, 1],  # Rainy, Cool, No
    [2, 1, 1],  # Rainy, Mild, Yes
])

In [19]:

# Define a function to calculate entropy
def calculate_entropy(y):
    unique_labels, counts = np.unique(y, return_counts=True)
    probabilities = counts / len(y)
    entropy = -np.sum(probabilities * np.log2(probabilities))
    return entropy

In [20]:

# Define a function to calculate information gain
def calculate_information_gain(X, y, feature_index):
    # Calculate entropy before the split
    entropy_before_split = calculate_entropy(y)

    # Calculate entropy after the split
    unique_values, counts = np.unique(X[:, feature_index], return_counts=True)
    weighted_entropy_after_split = 0
    for value, count in zip(unique_values, counts):
        subset_y = y[X[:, feature_index] == value]
        weighted_entropy_after_split += (count / len(X)) * calculate_entropy(subset_y)

    # Calculate information gain
    information_gain = entropy_before_split - weighted_entropy_after_split
    return information_gain

In [21]:

# Define the ID3 algorithm
def id3(X, y, feature_names):
    if len(np.unique(y)) == 1:
        # If all instances have the same class label, return a leaf node with that label
        return y[0]

    if X.shape[1] == 0:
        # If there are no more features to split on, return the most common class label
        return np.argmax(np.bincount(y))

    # Calculate information gain for each feature
    information_gains = [calculate_information_gain(X, y, i) for i in range(X.shape[1])]

    # Choose the feature with the highest information gain
    best_feature_index = np.argmax(information_gains)
    best_feature_name = feature_names[best_feature_index]

    # Create a decision tree node with the best feature
    tree = {best_feature_name: {}}

    # Recursively build the tree
    unique_values = np.unique(X[:, best_feature_index])
    for value in unique_values:
        subset_indices = np.where(X[:, best_feature_index] == value)[0]
        subset_X = X[subset_indices]
        subset_y = y[subset_indices]
        subtree = id3(subset_X, subset_y, feature_names)
        tree[best_feature_name][value] = subtree

    return tree

In [22]:
# Define feature names
feature_names = ['Outlook', 'Temperature']


In [23]:
# Separate features and target variable
X = data[:, :-1]
y = data[:, -1]

In [24]:
# Build the decision tree
decision_tree = id3(X, y, feature_names)

In [25]:
# Print the decision tree
print("Decision Tree:")
print(decision_tree)

Decision Tree:
{'Outlook': {0: 0, 1: 1, 2: 1}}


Explanation:

We start by defining a simple dataset with two features (Outlook and Temperature) and a target variable (PlayTennis).
The calculate_entropy function calculates the entropy of a given set of class labels.
The calculate_information_gain function calculates the information gain for a given feature.
The id3 function implements the ID3 algorithm recursively. It selects the best feature to split on at each node based on the highest information gain and builds the decision tree accordingly.
Finally, we print the resulting decision tree.