# **Data Science and Business Analytics (GRIP May'21)**
## **Task 3 : Prediction using Decision Tree Algorithm**
### **Author : Jeet Sahoo**
#### Objective: Create the Decision Tree classifier and visualize it graphically. 


## **Technical Stack : Scikit Learn, Numpy Array, Seaborn, Pandas, Matplotlib**

#### Importing Required Libraries

In [None]:
# Importing the required libraries
import sklearn.metrics as sm
import pandas as pd
import numpy as np
import seaborn as sns
%matplotlib inline
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn import metrics
from sklearn import tree
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import confusion_matrix
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn.model_selection import train_test_split
from IPython.display import Image 

#### Exploring and Understanding Data

In [None]:
# Loading data from file
link='Iris.csv'
iris_df = pd.read_csv(link)
print("Data load successful")

iris_df.head(20) #To see first 20 rows of data

In [None]:
# Understanding the data
iris_df.describe() #Data Description

In [None]:
iris_df.info() #Info of Dataset

In [None]:
iris_df.values #Values of Dataset

In [None]:
iris_df.columns #Columns of Dataset

In [None]:
iris_df.shape #To find the shape of data

In [None]:
iris_df.isnull().sum() #Checking for null values in Dataset

In [None]:
iris_df.duplicated().sum() #Checking for duplicate entries in Dataset

In [None]:
iris_df.corr()

#### Visualizing Data

In [None]:
sns.pairplot(iris_df)

In [None]:
iris_df.hist(figsize=(15,15))

In [None]:
sns.heatmap(iris_df.corr())

In [None]:
sns.pairplot(iris_df.dropna(),hue='Species')

In [None]:
# Scatter plot of data based on Sepal Length and Width features
sns.FacetGrid(iris_df,hue='Species').map(plt.scatter,'SepalLengthCm','SepalWidthCm').add_legend()
plt.show()

In [None]:
# Scatter plot of data based on Petal Length and Width features
sns.FacetGrid(iris_df,hue='Species').map(plt.scatter,'PetalLengthCm','PetalWidthCm').add_legend()
plt.show()

#### Dataset Splitting

In [None]:
X=iris_df[['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']]
Y=iris_df['Species']
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3,random_state=0)
print('Training split-',X_train.shape)
print('Training split-',X_test.shape)

#### Fitting Model

In [None]:
# Model Training
iris_df_model=DecisionTreeClassifier()

In [None]:
iris_df_model.fit(X_train,Y_train)
print("Training Complete.")

In [None]:
Y_Pred=iris_df_model.predict(X_test)
Y_Pred

#### Comparing Actual and Predicted Flower Classification

In [None]:
df = pd.DataFrame({'Actual': Y_test, 'Predicted': Y_Pred}) 
df

#### Decision Tree Visualization

In [None]:
featName=['sepal length ( in cm)','sepal width (in cm)','petal length (in cm)','petal width (in cm)']
clsName=['Iris-Setosa','Iris-Versicolor','Iris-virginica']

fig, axes = plt.subplots(nrows = 1, ncols = 1, figsize = (10,6), dpi = 350)

tree.plot_tree(iris_df_model, feature_names = featName, class_names = clsName, filled = True);

#### Evaluating Model

In [None]:
# Model Accuracy
evaluate_model=metrics.accuracy_score(Y_test,Y_Pred)
print('ACCURACY:',evaluate_model)

#### Confusion Matrix

In [None]:
cm=confusion_matrix(Y_test,Y_Pred)
cm

In [None]:
plt.figure(figsize=(18,18))
tree.plot_tree(iris_df_model,filled=True,rounded=True,proportion=True,node_ids=True)
plt.show()

### Conclusion

I was able to successfully carry-out prediction using Prediction using Decision Tree Algorithm and was able to evaluate the model's accuracy score.

#### Thank You