- version 1: Initial
- version 2: Removing Pie chart for better visualisation.

#  Mushroom Classification
### Context
Although this dataset was originally contributed to the UCI Machine Learning repository nearly 30 years ago, mushroom hunting (otherwise known as "shrooming") is enjoying new peaks in popularity. Learn which features spell certain death and which are most palatable in this dataset of mushroom characteristics. And how certain can your model be?

### Content
This dataset includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family Mushroom drawn from The Audubon Society Field Guide to North American Mushrooms (1981). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like "leaflets three, let it be'' for Poisonous Oak and Ivy.

Time period: Donated to UCI ML 27 April 1987
### Inspiration
What types of machine learning models perform best on this dataset?

Which features are most indicative of a poisonous mushroom?

![](https://jb004.k12.sd.us/my%20website%20info/PICS/mushroom_diagram.jpg)

In [None]:
# Importing Required Module for Data Preperation And Analysis
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

warnings.filterwarnings('ignore')
sns.set()
%matplotlib inline

In [None]:
# Reading CSV File 
data = pd.read_csv('../input/mushrooms.csv')
data.head() 

In [None]:
# Number of CLasses Counts
data['class'].value_counts()

As result shown above we have Two class Mushroom, **Edible** which is denoted by **'e'**. **Poisonous** which is denoted by **'p'**.

# Analysis With Odor of Mushroom

The odor is main part in Mushroom identification process. Many of us don't know what is odor of Mushroom(Poisonous/Edible). Here in Dataset the odor is Provided as below with their notation.
1. almond=a
2. anise=l
3. creosote=c
4. fishy=y
5. foul=f
6. musty=m
7. none=n
8. pungent=p
9. spicy=s

In [None]:
# Updating Labels for odor
odor_dict = {"a": "Almond", "l": "Anise", "c": "Creosote", "y": "Fishy", "f": "Foul", "m": "Musty", "n": "None", "p": "Pungent", "s": "Spicy"} 
data['odor'] = data['odor'].apply(lambda x:odor_dict[x])

In [None]:
# Getting all Poisonous Mushroom odor   
Poisonous_Odor = data[data['class']=='p']['odor'].value_counts()

# # Showing With Pie Chart Poisonous_Odor
ordor_pos = Poisonous_Odor.plot(kind='bar',figsize=(12,5),fontsize=12)
ordor_pos.set_title('Poisonous Mushroom with their Odor',fontsize=14)
ordor_pos.tick_params(labelrotation=0)

The Foul type ordor is more in Poisonous Mushroom count

In [None]:
# Getting all Edible Mushroom odor
Edible_Odor = data[data['class']=='e']['odor'].value_counts()

# # Showing with Pie Chart Edible_Odor
ordor_ed = Edible_Odor.plot(kind='bar',figsize=(12,5),fontsize=12)
ordor_ed.set_title('Edible Mushroom with their Odor',fontsize=14)
ordor_ed.tick_params(labelrotation=0)

There is no odor for Edible Mushroom count is more than others.

# Analysis with bruises:
Another feature to consider when identifying mushrooms is whether they bruise or bleed a specific color. As shown in image.Certain mushrooms will change colors when damaged or injured. Cutting into a mushroom and observing any color changes can be very important when trying to determine what it is.
Please note that color change alone is one of the least reliable ways to go about mushroom identification. There are always variations, and color changes may not be trustworthy depending on the age of your fungus. Use mushroom bruising as just another tool in your identification arsenal.
![](https://www.mushroom-appreciation.com/image-files/xlactarius_indigo-bruise.jpg.pagespeed.ic.yuZDbFyfq8.jpg)
    There are many famous blue-bruising mushrooms. A common rule for boletes is that you shouldn't eat one that has a red pore surface and bruises blue. Because so many blue bruising boletes are **poisonous**, it's best to avoid them altogether.As 

In [None]:
# Bruises Mushrooms :  
# t: Bruises:True
# f: No Bruises: False

Poisonous_Bruises = data[data['class']== 'p']['bruises'].value_counts()
Edible_Bruises = data[data['class']== 'e']['bruises'].value_counts()

Bruises = pd.DataFrame([Poisonous_Bruises,Edible_Bruises],
                       index=['Poisonous','Edible'])
Bruises.plot(kind = 'barh',stacked = True,figsize=(14,5),fontsize=12)
plt.title("Bruises Mushrooms",fontsize=14)
plt.legend(labels=["f = False","t = True"],fontsize=12)
plt.show()

As Shown above stcaked barplot the ratio of brusies 
- In Edible: bruises is more
- In Poisonous: No Bruises is more

# Analysis on Habitat Feature: 
In Given Dataset there are 7 habitat Mushroom are Used:
- grasses=g
- leaves=l 
- meadows=m
- paths=p 
- urban=u 
- waste=w 
- woods=d

In [None]:
# Updating name for habitat feature
habitate_dict = {"g": "Grasses","l": "Leaves","m": "Meadows", "p": "Paths", "u": "Urban", "w": "Waste", "d": "Woods"}
data['habitat'] = data['habitat'].apply(lambda x:habitate_dict[x])

In [None]:
# Analysing Habitat for Edible Mushroom:
Edible_habitate = data[data['class'] == 'e']['habitat'].value_counts()
habit_ed = Edible_habitate.plot(kind='bar',figsize=(12,5),fontsize=12)
habit_ed.set_title('Edible Mushroom with their Habitat',fontsize=14)
habit_ed.tick_params(labelrotation=0)

The habitat type count shows the Wood and Grasses type is more.

In [None]:
# Analysing Habitat for Poisonous Mushroom:
Poisonous_habitate = data[data['class'] == 'p']['habitat'].value_counts()
habit_pos = Poisonous_habitate.plot(kind='bar',figsize=(12,5))
habit_pos.set_title('Poisonous Mushroom with their Habitat',fontsize=14)
habit_pos.tick_params(labelrotation=0)
plt.show()

## Data Preprocessing
Given data is in character or string.To apply Machine Learning Algorithm we need to convert int integers

In [None]:
data.head(3)

In [None]:
# basic label encoding
from sklearn.preprocessing import LabelEncoder

df = data.copy()
label_encoder = LabelEncoder()
for column in df.columns:
    df[column] = label_encoder.fit_transform(df[column])
df.head()    

### Let us see correlation of Features:

In [None]:
# Using Advanced Plotting Tool to plot correlation 
plt.figure(figsize=(15,15))
sns.heatmap(df.corr(), annot=True, linewidths=.5)
plt.show()

Veil Type Feature is only one type as we can see in Above Heatmp. Its not corr to any other feature. Other Features Ratio is as shown above.

## Classification of Mushrooom Using Different Machine Learning Model

In [None]:
# Seaparating the labels and Features
Label = df['class']
Features = df.drop(['class'],axis=1)

In [None]:
# Spltting Data in Training and Testing Data
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(Features,Label,random_state = 125)

In [None]:
# Importing Models
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier 

In [None]:
# Model List
models = []
models.append(('LogisticRegression',LogisticRegression()))
models.append(('GaussianNB',GaussianNB()))
models.append(('KNeighborsClassifier',KNeighborsClassifier()))
models.append(('SVC',SVC()))
models.append(('DecisionTreeClassifier',DecisionTreeClassifier()))
models.append(('RandomForestClassifier',RandomForestClassifier()))

In [None]:
from sklearn.model_selection import cross_val_score
acc = []
names = []
result = []

for name, model in models:
    # Cross Validation
    acc_of_model = cross_val_score(model, X_train, y_train, cv=10, scoring='accuracy')
    # Accuracy of model
    acc.append(acc_of_model)
    # Name of model
    names.append(name)
    
    Out = "Model: %s: Accuracy: %f" % (name, acc_of_model.mean())
    result.append(acc_of_model)
    print(Out)

From Above result we can say that Support Vector Machine, Decision Tree, RandomForest are more accurate models for given dataset. 