## Prediction using Decision Tree Algorithm
### By Rutwik V Jangam
### GRIPDEC20
Dataset : https://bit.ly/3kXTdox

### Importing and Understanding Data

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [None]:
df=pd.read_csv("../input/iris-data/Iris.csv")

In [None]:
df.head()

- We drop the ID Variable since it is unique in nature and would not give us any insights.

In [None]:
df.drop('Id', axis=1, inplace=True)

We can see that our final dependent variable i.e. Species is categorical we can encode it to 1,2 and 3 for simplicity.
- 1 for Iris-setosa
- 2 for Iris-versicolor
- 3 for Iris-virginica
- For this we use the label encoder.

In [None]:
from sklearn import preprocessing
df_categorical = df.select_dtypes(include=['object'])
le = preprocessing.LabelEncoder()
df_categorical = df_categorical.apply(le.fit_transform)
df_categorical.head()

- Merge the encoded variable back into the original data frame

In [None]:
df = df.drop(df_categorical.columns, axis=1)
df = pd.concat([df, df_categorical], axis=1)
df.head()

In [None]:
df.Species.value_counts()

- We now proceed to build a decision tree

In [None]:
df_copy=df.copy()
# Putting feature variable to X
X = df.drop('Species',axis=1)

# Putting response variable to y
y = df['Species']

In [None]:
from sklearn.tree import DecisionTreeClassifier
# Fitting the decision tree with default hyperparameters
dt_1 = DecisionTreeClassifier()
dt_1.fit(X, y)

#### Visualizing the Decision Tree

In [None]:
# !pip install dtreeviz

In [None]:
from dtreeviz.trees import dtreeviz

viz = dtreeviz(dt_1, X, y,
                target_name="Species",
                feature_names=X.columns,
                class_names=list(le.classes_))

viz 

- We can now feed any new/test data to this classifer and it would be able to predict the right class accordingly.
- We can check by building a new model and dividing the data set into test and train datasets.

In [None]:
# Putting feature variable to X
X = df_copy.drop('Species',axis=1)

# Putting response variable to y
y = df_copy['Species']

In [None]:
# Importing train-test-split 
from sklearn.model_selection import train_test_split

In [None]:
# Splitting the data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.30, 
                                                    random_state = 99)
X_train.head()

In [None]:
dt_2 = DecisionTreeClassifier()
dt_2.fit(X_train, y_train)

In [None]:
# Importing classification report and confusion matrix from sklearn metrics
from sklearn.metrics import accuracy_score

# Making predictions
y_pred = dt_2.predict(X_test)

#### Accuracy and Classification

In [None]:
print(accuracy_score(y_test,y_pred))

- We can see that we have a very good accuracy score.

In [None]:
# classification metrics
from sklearn.metrics import classification_report,confusion_matrix
print(classification_report(y_test, y_pred))

- We can see that we have a very good precision score as well.