<a href="https://colab.research.google.com/github/shivanshu1303/Simple-ML-Algos-Implemented/blob/main/Decision_Trees.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Here, I will implement decision trees. They are useful algorithms that help a lot in classification problems.
## Like what the notion of a `tree` would suggest, these have nodes too that are formed after splitting the dataset using some particular feature into child nodes that either have/ don't have the feature.

### One of the fundamental concepts here is that of `entropy`, a function whose calculation is similar to that of the cost function for logistic regression(another algorithm primarily used for classification problem)

In [1]:
import numpy as np

In [2]:
def entropy(y):
  if(y==0 or y==1):
    return 0
  else:
    return -((y*np.log2(y))+((1-y)*(np.log2(1-y))))

## After this, we have to actually split the indices i.e. form 2 child nodes from 1 parent node

In [3]:
"""
If feature=1 - > left node
otherwise    - > right node
"""
def split_node(X,feature_number):
  left_indices=[]
  right_indices=[]

  for i,x in enumerate(X):
    if x[feature_number] ==1:
      left_indices.append(i)
    else:
      right_indices.append(i)

  return left_indices, right_indices

### The next thing we do is calculate the `weighted entropy`.
### To do this(in our case where we split the node), for each of the child nodes, we multiply 2 values:
* The ratio of: the number of samples in the child divided by the number of samples in the parent node
* For the child node, the value of the entropy calculated for the value obtained after dividing the number of positive samples (`y[index]=1`) to the number of total samples

In [4]:
def weighted_entropy(X,y,left_indices,right_indices):
  w_left=len(left_indices)/len(X)
  w_right=len(right_indices)/len(X)
  p_left=sum(y[left_indices])/len(left_indices)
  p_right=sum(y[right_indices])/len(right_indices)

  weighted_entropy=(w_left)*(entropy(p_left))+(w_right)*(entropy(p_right))

  return weighted_entropy

## This split at any node can be performed for various features i.e. in multiple manners.
## How do we choose which is the best?
## We calculate the 'information gain' that occurs in case of each possible split
## This is calculated by
* Calculating the information in the `parent node`(using the weighted entropy method) as - weight would be 1 since it has all the samples there are and then we calculate the entropy @ the value we get by dividing the no. of positive samples by no. of total samples in the node
* Calculating the `weighted entropy` by the above method for the child nodes obtained after any particular split

### Then, we subtract the above 2 values

In [5]:
def info_gain(X,y,left_indices,right_indices):
  info_parent=(1)*entropy((sum(y))/(len(y)))

  info_children=weighted_entropy(X,y,left_indices,right_indices)

  info_gain=info_parent-info_children

  return info_gain

In [9]:
X_train = np.array([[1, 1, 1],
[0, 0, 1],
 [0, 1, 0],
 [1, 0, 1],
 [1, 1, 1],
 [1, 1, 0],
 [0, 0, 0],
 [1, 1, 0],
 [0, 1, 0],
 [0, 1, 0]])

y_train = np.array([1, 1, 0, 0, 1, 1, 0, 1, 0, 0])

left_indices,right_indices = split_node(X_train, 0)

info_gain(X_train, y_train, left_indices, right_indices)

0.2780719051126377

## Naturally, for all different split possible and all consequent child nodes 'birthed', they'll have different values of the information gain.

## The best split is the one with the **highest information gain value**

In [10]:
features=(["Feature 1","Feature 2","Feature 3"])
for i,feature_name in enumerate(features):
  left_indices,right_indices=split_node(X_train,i)
  information_gain=info_gain(X_train,y_train,left_indices,right_indices)

  print(f"The info gain when split by feature number {i} i.e {features[i]} is:{information_gain:0.04f}")

The info gain when split by feature number 0 i.e Feature 1 is:0.2781
The info gain when split by feature number 1 i.e Feature 2 is:0.0349
The info gain when split by feature number 2 i.e Feature 3 is:0.1245


## In the above code block, we see that the 3 features that we could split a particular node on have differing values of the resulting information gain.

## The best split is the one with the highest information gain i.e. here, Feature 1