# Decision Tree Algorithm 
A decision tree is a structure that includes a root node which is the top most node, branches, and leaf nodes.Each internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome and holds the class label.
In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root node of the tree. This algorithm compares the values of root attribute with the record (real dataset) attribute and, based on the comparison, follows the branch and jumps to the next node.

This is an example of a classification tree where the target variable is in categories and the tree is used to identify the class within which a target variable would likely to fall into.



In [1]:
import pandas as pd
# read excel with pandas 
from pandas import ExcelWriter
from pandas import ExcelFile

import numpy as np

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

df = pd.read_excel('resources/datasets/data_farmer.xlsx', sheet_name='whole')

print("Column headings:")
print(df.columns)
df

Column headings:
Index(['Name', 'Gender', 'Age', 'Most common search food', 'Date Bought',
       'Total Price', 'System will recommed'],
      dtype='object')


Unnamed: 0,Name,Gender,Age,Most common search food,Date Bought,Total Price,System will recommed
0,William,M,16,Fruits,2021-03-01,200,Apple
1,John,M,17,Fruits,2021-03-01,150,Potatos
2,Jack,M,19,Vitamin C Fruit,2021-03-01,220,Cantaloupe
3,Mary,F,21,Vitamin C Fruit,2021-03-01,300,Grapefruit
4,Sam,F,22,Fruits,2021-03-01,250,Avocado
5,Alex,F,25,Fruits,2021-03-01,110,Watermelon
6,Nick,M,25,Fruits,2021-04-01,450,Watermelon
7,Sharon,F,29,Vegetables,2021-04-01,300,Ginger
8,Micheal,M,30,High carbohydrates food,2021-04-01,450,Sweet Potatoes
9,Randy,M,38,High carbohydrates food,2021-04-01,302,Oats


In [26]:
def getAnalysis(x):
  if (x == 'Fruits'):
    return '1'
  if (x == 'Vitamin C Fruit'):
    return '2'
  if (x == 'Vegetables'):
    return '3'
  if (x == 'High carbohydrates food'):
    return '4'
  else:
    return '5'

df['foodid'] = df['Most common search food'].apply(getAnalysis)
# test['rel'] = test['Label'].apply(getAnalysis)
df

    
    
    

Unnamed: 0,Name,Gender,Age,Most common search food,Date Bought,Total Price,System will recommed,foodid
0,William,M,16,Fruits,2021-03-01,200,Apple,1
1,John,M,17,Fruits,2021-03-01,150,Potatos,1
2,Jack,M,19,Vitamin C Fruit,2021-03-01,220,Cantaloupe,2
3,Mary,F,21,Vitamin C Fruit,2021-03-01,300,Grapefruit,2
4,Sam,F,22,Fruits,2021-03-01,250,Avocado,1
5,Alex,F,25,Fruits,2021-03-01,110,Watermelon,1
6,Nick,M,25,Fruits,2021-04-01,450,Watermelon,1
7,Sharon,F,29,Vegetables,2021-04-01,300,Ginger,3
8,Micheal,M,30,High carbohydrates food,2021-04-01,450,Sweet Potatoes,4
9,Randy,M,38,High carbohydrates food,2021-04-01,302,Oats,4


# Exploratory analysis

In [27]:
# checking for missing values
df.isnull().sum()

Name                       0
Gender                     0
Age                        0
Most common search food    0
Date Bought                0
Total Price                0
System will recommed       0
foodid                     0
dtype: int64

In [28]:
# Total number of users
len(df) 

20

# Descriptive analysis 

In [29]:
# descriptive analysis
df.describe()

Unnamed: 0,Age,Total Price
count,20.0,20.0
mean,36.65,224.25
std,14.375692,112.930288
min,16.0,70.0
25%,24.25,140.0
50%,38.5,210.0
75%,48.5,300.0
max,60.0,450.0


# Separating input variables (x) and target variable (output)

In [31]:
x = df.drop(columns=['System will recommed','Total Price','Date Bought','Gender','Name','Most common search food'])
output = df['System will recommed']
x.head()

Unnamed: 0,Age,foodid
0,16,1
1,17,1
2,19,2
3,21,2
4,22,1


In [32]:
output.head()

0         Apple
1       Potatos
2    Cantaloupe
3    Grapefruit
4       Avocado
Name: System will recommed, dtype: object

# Train and Test Split , Building the model , Predicting and getting the accuracy 


In [20]:
# building and predicting the model 
model = DecisionTreeClassifier()
model.fit(x,output)
predictions = model.predict([[35,4],[47,5]])
predictions

array(['Oats', 'Eggs'], dtype=object)

In [10]:
model = DecisionTreeClassifier()
# splitting the data into testing 20% and training80%
x_train, x_test, output_train, output_test = train_test_split(x, output, test_size=0.2)
model.fit(x_train,output_train)
predictions = model.predict(x_test)
accuracy = accuracy_score(output_test, predictions)
accuracy

0.25