<h1 align="center"> Mushroom Dataset Classification with ML</h1>
<center><img src="https://drugsandbadideas.com/images/articles/intros/sacred-magic-mushroom-tea.jpg" width="60%" >


**In that software page, we will try to deal with the data that we have in several ways, and as soon as one of the methods succeeds, we will succeed.**
> **This data I have dealt with before, but the first time I used deep learning so this time I will try several algorithms from machine learning with it.
Let's continue.......**

In [None]:
import pandas as pd 
import numpy as np 
import seaborn as sns 
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt
import missingno as msno
from pandas_profiling import ProfileReport
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix
from sklearn.ensemble import RandomForestClassifier

# Read our data

In [None]:
data=pd.read_csv("../input/mushroom-classification-updated-dataset/mushroomsupdated.csv")
data.head()

In [None]:
data.shape

In [None]:
list(data.columns)

In [None]:
data.isnull().sum()

In [None]:
for i in data.columns:
  print(i, data[i].unique())

Previous code prompts us to take faster steps in dealing with data

In [None]:
# In the results, we notice that there is an unknown value called “c”, and I do not know to what basis it belongs.
data['cap-shape'].value_counts()    

In [None]:
# What does "non" mean? Does it denote a missing or empty value or that there is no biological loop? There is no reliable source, my friend.
data["ring-number"].value_counts()

In [None]:
# What does the question mark indicate? It is clear that it is influential because it is spread in a not small percentage in the data.
data["stalk-root"].value_counts()

In [None]:
# same thing 
data["ring-type"].value_counts()

> **In order to get more details about the data, we call one of the powerful functions.**

In [None]:
profile=ProfileReport(data, title="Mushroom Dataset Report")
profile

One of the sweet things that we have is to keep the data as it is and start the process of transformation and data processing in order to enter into the algorithms.

In [None]:
object_1=LabelEncoder()
# During the conversion process, we used the first projection.
for i in data.columns:
    data[i] = object_1.fit_transform(data[i])

In [None]:
data.head()

In [None]:
for i in data.columns:
  print(i, data[i].unique())

In [None]:
plt.hist(data["class"])
plt.show() 

In [None]:
msno.matrix(data)

In [None]:
# Here we have deleted one of the columns that has no connection with the rest of the data.
data.drop(['veil-type'], axis=1, inplace=True)
# Now we will draw a heat map.
fig, ax = plt.subplots(figsize=(15,15))
sns.heatmap(data.corr(), annot=True, linewidths=.5, ax=ax)
plt.show()


In [None]:
corr_1=data.corr()
most_effect=corr_1.nlargest(10,"class")
most_effect

In [None]:
sns.scatterplot(x='cap-shape', y='cap-surface', data=data)

In [None]:
data.hist(figsize=(18,10))
plt.show()

In [None]:
most_effect.hist(figsize=(18,10))
plt.show()

In [None]:
sns.distplot(data['cap-shape'], color = 'b', label = 'Solids')

> **The previous charts show / confirm what we obtained in the previous report.**

# Now let's start with the data partitioning process

In [None]:
target= data["class"].values
feature= data.drop(["class"],axis=1)

In [None]:
scaler = StandardScaler(copy=True, with_mean=True, with_std=True)
X = scaler.fit_transform(feature)

In [None]:
x_train, x_test, y_train, y_test = train_test_split(X,target, test_size=0.2, random_state=11)

In [None]:
print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)

In [None]:
SVCModel=SVC(C=0.5, kernel='rbf', degree=3, gamma='auto', coef0=0.0, shrinking=True,
                probability=False, tol=0.001, cache_size=101, class_weight=None,verbose=False,
                max_iter=-1, decision_function_shape='ovr', random_state=0)
SVCModel.fit(x_train, y_train)


In [None]:
#Calculating Details
print('SVCModel Train Score is : ' , SVCModel.score(x_train, y_train))
print('SVCModel Test Score is : ' , SVCModel.score(x_test, y_test))
#Calculating Prediction
y_pred = SVCModel.predict(x_test)
print('Predicted Value for SVCModel is : ' , y_pred[:20])
print("target values y_test........ is : " , y_test[:20])

In [None]:
#Calculating Confusion Matrix
CM = confusion_matrix(y_test, y_pred)
print('Confusion Matrix is : \n', CM)

# drawing confusion matrix
sns.heatmap(CM, center = True)
plt.show()

In [None]:
DecisionTreeClassifierModel=DecisionTreeClassifier(criterion='gini', splitter='best', max_depth=5,min_samples_split=5,
                                    min_samples_leaf=3,max_features=None,
                                    random_state=0, max_leaf_nodes=3)
DecisionTreeClassifierModel.fit(x_train, y_train)

In [None]:
#Calculating Details
print('DecisionTreeClassifierModel Train Score is : ' , DecisionTreeClassifierModel.score(x_train, y_train))
print('DecisionTreeClassifierModel Test Score is : ' , DecisionTreeClassifierModel.score(x_test, y_test))
print('DecisionTreeClassifierModel Classes are : ' , DecisionTreeClassifierModel.classes_)
print('DecisionTreeClassifierModel feature importances are : ' , DecisionTreeClassifierModel.feature_importances_)


In [None]:
#Calculating Prediction
y_pred = DecisionTreeClassifierModel.predict(x_test)
y_pred_prob = DecisionTreeClassifierModel.predict_proba(x_test)
print('Predicted Value for DecisionTreeClassifierModel is : ' , y_pred[:10])
print("real values of y_test                           is : " , y_test[:10] )
print('Prediction Probabilities Value for DecisionTreeClassifierModel is : ' , y_pred_prob[:10])


In [None]:
#Calculating Confusion Matrix
CM = confusion_matrix(y_test, y_pred)
print('Confusion Matrix is : \n', CM)

# drawing confusion matrix
sns.heatmap(CM, center = True)
plt.show()
 

**The results we obtained from the decision tree algorithm were not good.**

In [None]:
RandomForestClassifierModel=RandomForestClassifier(n_estimators=30, criterion='gini', max_depth=7,
                                min_samples_split=3, min_samples_leaf=1,min_weight_fraction_leaf=0.0,
                                max_features='auto',max_leaf_nodes=7,min_impurity_decrease=0.0,
                                min_impurity_split=None, bootstrap=True,oob_score=False, n_jobs=1,
                                random_state=0)
RandomForestClassifierModel.fit(x_train, y_train)


In [None]:
#Calculating Details
print('RandomForestClassifierModel Train Score is : ' , RandomForestClassifierModel.score(x_train, y_train))
print('RandomForestClassifierModel Test Score is : ' , RandomForestClassifierModel.score(x_test, y_test))
print('RandomForestClassifierModel features importances are : ' , RandomForestClassifierModel.feature_importances_)


In [None]:
#Calculating Prediction
y_pred = RandomForestClassifierModel.predict(x_test)
y_pred_prob = RandomForestClassifierModel.predict_proba(x_test)
print('Predicted Value for RandomForestClassifierModel is : ' , y_pred[:10])
print("real values of target colunm y_test             is : " , y_test[:10])
print('Prediction Probabilities Value for RandomForestClassifierModel is : ' , y_pred_prob[:10])


> **In the random jungle algorithm, we obtained reasonable and very good accuracy, far from overfitting.**

In [None]:
#Calculating Confusion Matrix
CM = confusion_matrix(y_test, y_pred)
print('Confusion Matrix is : \n', CM)

# drawing confusion matrix
sns.heatmap(CM, center = True)
plt.show()
 

We have now finished putting some algorithms in machine learning, some of them are good algorithms and some are good algorithms to some extent.
> Of course, we could have improved the results we obtained, but with the success of some algorithm in reaching high rates, it is okay.

# In the end, thank you for your time, and if you have any modification or suggestion let me know, bye.