# Mushroom Classification

![](https://www.thetimes.co.uk/imageserver/image/%2Fmethode%2Ftimes%2Fprod%2Fweb%2Fbin%2Fbcdcf23a-187f-11eb-8493-5b46eb56a071.jpg?crop=7360%2C4140%2C0%2C383&resize=1180)

A mushroom is the fleshy, spore-bearing fruiting body of a fungus, which grows above the ground on soil or its food source. It is known as the 'meat' of the vegetable world. Since they were discovered, slowly and gradually, mushrooms are now used extensively in cooking in many cuisines, notably Chinese, Korean, European, and Japanese.

Here in this notebook we are going to classify whether the Mushrooms are Poisonous or Edible.

In [None]:
import pandas as pd
import numpy as np 
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

Let us load the data set

In [None]:
mushroom= pd.read_csv('../input/mushroom-classification/mushrooms.csv')


# Data Analysis On Mushroom Data Set¶


In [None]:
mushroom.head(5)


In [None]:
mushroom.tail(5)


Let's check the duplicate data in data set



In [None]:
mushroom.duplicated().sum()


In [None]:
mushroom.shape


In [None]:
mushroom.info()


In [None]:
mushroom.isnull().sum()


So, there 8124 records in 23 columns. Also, there are no null records as well as duplicate values.



# Explanation of the relevant features

![](https://image.freepik.com/free-vector/mushroom-anatomy-labeled-biology-illustration_1995-566.jpg)

1. **Attribute Information:**

classes: edible = e, poisonous = p

cap-shape: bell = b, conical = c, convex = x, flat = f, knobbed = k, sunken = s

cap-surface: fibrous = f, grooves = g, scaly = y, smooth = s

cap-color: brown = n, buff = b, cinnamon = c, gray = g, green = r, pink = p, purple = u, red = e, white = w, yellow = y

bruises: yes = t, no = f

odor: almond = a, anise = l, creosote = c, fishy = y, foul = f, musty = m, none = n, pungent = p, spicy = s

gill-attachment: attached = a, descending = d, free = f, notched = n

gill-spacing: close = c, crowded = w, distant = d

gill-size: broad = b, narrow = n

gill-color: black = k, brown = n, buff = b, chocolate = h, gray = g, green = r, orange = o, pink = p, purple = u, red = e, white = w ,yellow = y

stalk-shape: enlarging = e, tapering = t

stalk-root: bulbous = b, club = c, cup = u, equal = e, rhizomorphs = z, rooted = r, missing = ?

stalk-surface-above-ring: fibrous = f, scaly = y, silky = k, smooth = s

stalk-surface-below-ring: fibrous = f, scaly = y, silky = k, smooth = s

stalk-color-above-ring: brown = n, buff = b, cinnamon = c, gray = g, orange = o, pink = p, red = e, white = w, yellow = y

stalk-color-below-ring: brown = n, buff = b, cinnamon = c, gray = g, orange = o, pink = p, red = e, white = w, yellow = y

veil-type: partial = p, universal = u

veil-color: brown = n, orange = o, white = w, yellow = y

ring-number: none = n, one = o, two = t

ring-type: cobwebby = c, evanescent = e, flaring = f, large = l, none = n, pendant = p, sheathing = s, zone = z

spore-print-color: black = k, brown = n, buff = b, chocolate = h, green = r, orange = o,purple = u, white = w, yellow = y

population: abundant = a, clustered = c, numerous = n, scattered = s, several = v, solitary = y

habitat: grasses = g, leaves = l, meadows = m, paths = p, urban = u, waste = w, woods = d


# Exploratory Data Analysis¶

# **Class**

In [None]:
mushroom['class'].value_counts().to_frame()


In [None]:
plt.figure(figsize=(10,5))
plt.title('Mushrooms Poisonous v/s Edible', fontsize=14)
sns.countplot(x="class", data=mushroom, palette=('#9b111e','#50c878'))
plt.xlabel("Mushroom Type", fontsize=12)
plt.ylabel("Count", fontsize=12)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.show()

****Observations:**


There is no imbalance in class features.

Edible mushrooms are more than poisonous mushrooms in data set.

# **Cap Shape**

In [None]:
mushroom.groupby(['cap-shape'])['class'].value_counts().to_frame()


In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['cap-shape'], ax=axarr[0], order=mushroom['cap-shape'].value_counts().index, palette="magma").set_title('Cap Shape Distribution')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Cap Shape')
b = sns.countplot(x="cap-shape", data=mushroom, hue="class", palette=('#9b111e','#50c878'), order=mushroom['cap-shape'].value_counts().index, ax=axarr[1]).set_ylabel('Count')

*** Observations:******

Convex(x) & flat(f) cap shaped mushrooms are more in dataset.

Bell(b) cap shape has more edible mushrooms.

Knobbed(k) cap shape has more poisonous mushroom.

Sunken(s) cap shape has only edible mushroom whereas Conical(c) cap shape has only poisonous mushrooms.

# **Cap Surface**


In [None]:
mushroom.groupby(['cap-surface'])['class'].value_counts().to_frame()


In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['cap-surface'], ax=axarr[0], order=mushroom['cap-surface'].value_counts().index, palette="magma").set_title('Cap Surface Distribution')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Cap Surface')
b = sns.countplot(x="cap-surface", data=mushroom, hue="class", order=mushroom['cap-surface'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')


**Observations:**

Grooves(g) cap surface mushrooms has only poisonous mushrooms and are very less in numbers.

Smooth(s) & Scaly(y) cap surface mushrooms has more poisonous mushroom whereas Fibrous(f) cap surface mushrooms has more edible mushrooms.


# **Cap Color**

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['cap-color'], ax=axarr[0], order=mushroom['cap-color'].value_counts().index, palette="magma").set_title('Cap Color Distribution')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Cap Color')
b = sns.countplot(x="cap-color", data=mushroom, hue="class", order=mushroom['cap-color'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

**Observations**:

Brown(n) colored mushrooms are more in number followed by gray(g) & red(e)

Most of the brown(n), white(w) & gray(g) colored mushrooms are edible whereas most of the red(e), yellow(y) colored mushrooms are poisonous.

All purple(u) & green(r) colored mushrooms are edible but they are less in numbers.

# **Bruises**

In [None]:
mushroom.groupby(['bruises'])['class'].value_counts().to_frame()


In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['bruises'], ax=axarr[0], order=mushroom['bruises'].value_counts().index, palette="magma").set_title('Bruise Distribution')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Bruises')
b = sns.countplot(x="bruises", data=mushroom, hue="class", order=mushroom['bruises'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

We have more number of mushrooms that does not bruise(f) at all.

Mushrooms that bruises, having said there high % that they are edible, where mushrooms that does not bruise(f), most of them are poisonous.

Also, note that, not all bruised(t) mushrooms are edible and vice versa. There are other factors involved in it. But we can say what bruise can be one of the important feature while predicting class.


# **Odor**

In [None]:
mushroom.groupby(['odor'])['class'].value_counts().to_frame()


In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['odor'], ax=axarr[0], order=mushroom['odor'].value_counts().index, palette="magma").set_title('Odor Distribution')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Odor')
b = sns.countplot(x="odor", data=mushroom, hue="class", order=mushroom['odor'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

**Observations**:

Mushrooms with no odor(n) and foul(f) odor are more in mumbers.

It is very much clear that, all purgent(p), foul(f), creosote(c), fishy(y), spicy(s) and musty(m) odor mushrooms are poisonous.

All almond(a) and anise(l) odor mushroom are edible.

Mushroom with no odor(n) can be edible and poisonous. But from distribution we can say that most of them are edible

Thus, Odor can be one of the most important feature while predicting the class of mushrooms.

# **Gill Attachment**

In [None]:
mushroom.groupby(['gill-attachment'])['class'].value_counts().to_frame()


In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['gill-attachment'], ax=axarr[0], order=mushroom['gill-attachment'].value_counts().index, palette="magma").set_title('Gill Attachment')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Gill Attachment')
b = sns.countplot(x="gill-attachment", data=mushroom, hue="class", order=mushroom['gill-attachment'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

**Observations:**

Gill attachment type free(f) mushrooms are more in number

All mushrooms with gill attachment type as attached(a) are edible.

Not much difference in free(f) gill attachment mushrooms while classifying them as edible or poisonous.

# **Gill Spacing**

In [None]:
mushroom.groupby(['gill-spacing'])['class'].value_counts().to_frame()


In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['gill-spacing'], ax=axarr[0], order=mushroom['gill-spacing'].value_counts().index, palette="magma").set_title('Gill Spacing')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Gill Spacing')
b = sns.countplot(x="gill-spacing", data=mushroom, hue="class", order=mushroom['gill-spacing'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

**Observations:**

Closed(c) gill spacing mushrooms are more in number.

Most of the wide(w) gill spacing mushrooms are edible.

# **Gill Size**

In [None]:
mushroom.groupby(['gill-size'])['class'].value_counts().to_frame()


In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['gill-size'], ax=axarr[0], order=mushroom['gill-size'].value_counts().index, palette="flare").set_title('Gill Size')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Gill Size')
b = sns.countplot(x="gill-size", data=mushroom, hue="class", order=mushroom['gill-size'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

**Observations:**

Overall broad(b) gill size mushrooms are more in number.

Most of the narrow(n) gill size mushrooms are poisonous whereas most of the broad(b) gill size mushrooms are edible.

# **Gill Color**

In [None]:
mushroom.groupby(['gill-color'])['class'].value_counts().to_frame()


In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['gill-color'], ax=axarr[0], order=mushroom['gill-color'].value_counts().index, palette="magma").set_title('Gill Color')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Gill color')
b = sns.countplot(x="gill-color", data=mushroom, hue="class", order=mushroom['gill-color'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

**Observations:**

Buff(b) gill color mushrooms are more in number followed by pink(p) & white(w)

All buff(b) gill colored mushrooms are poisonous. Also, all green(r) gill color mushrooms are poisonous.

All red(e) and orange gill color muhsrooms are edible.

# **Stalk Shape**

In [None]:
mushroom.groupby(['stalk-shape'])['class'].value_counts().to_frame()


In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['stalk-shape'], ax=axarr[0], order=mushroom['stalk-shape'].value_counts().index, palette="magma").set_title('Stalk Shape')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Stalk Shape')
b = sns.countplot(x="stalk-shape", data=mushroom, hue="class", order=mushroom['stalk-shape'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

**Observations:**

Tapering(t) stalk shape mushrooms are slightly more than enlarging(e) stalk shaped one.

There no significant difference while considering class of mushroom. Enlarging(e) stalk shape mushrooms are more poisonous whereas tapering(t) ones are more edible.

# **Stalk Surface Above Ring**

In [None]:
mushroom.groupby(['stalk-surface-above-ring'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['stalk-surface-above-ring'], ax=axarr[0], order=mushroom['stalk-surface-above-ring'].value_counts().index, palette="magma").set_title('Stalk Surface Above Ring')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Stalk Surface Above Ring')
b = sns.countplot(x="stalk-surface-above-ring", data=mushroom, hue="class", order=mushroom['stalk-surface-above-ring'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

**Observations:**

Smooth(s) stalk surface above ring mushrooms are more and scaly(y) surface ones are very very less in numbers.

Most of the smooth(s) stalk surface above ring mushrooms are edible.

Mostof the silky(k) stalk surface above ring mushrooms are poisonous.

# **Stalk Surface Below Ring**

In [None]:
mushroom.groupby(['stalk-surface-below-ring'])['class'].value_counts().to_frame()


In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['stalk-surface-below-ring'], order=mushroom['stalk-surface-below-ring'].value_counts().index, ax=axarr[0], palette="magma").set_title('Stalk Surface Below Ring')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Stalk Surface Below Ring')
b = sns.countplot(x="stalk-surface-below-ring", data=mushroom, hue="class", order=mushroom['stalk-surface-below-ring'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

**Observations******

Smooth(s) stalk surface below ring mushrooms are more followed by scaly(y) surface mushrooms.

Most of the smooth(s) stalk surface below ring mushrooms are edible.

Most of the silky(k) stalk surface above ring mushrooms are poisonous





**From Stalk Surface Above & Below Ring:**

Most of the smooth(s) gill surface above & below ring mushrooms are edible.

Most of the silky(k) gill surface above & below ring mushrooms are poisonous.

# **Stalk Color Above Ring**

In [None]:
mushroom.groupby(['stalk-color-above-ring'])['class'].value_counts().to_frame()


In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['stalk-color-above-ring'], ax=axarr[0], order=mushroom['stalk-color-above-ring'].value_counts().index, palette="magma").set_title('Stalk Color Above Ring')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Stalk Color Above Ring')
b = sns.countplot(x="stalk-color-above-ring", data=mushroom, hue="class", order=mushroom['stalk-color-above-ring'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

Observations:

White(w) stalk color above ring mushrooms are more in numbers.

Most of the white(w) stalk color above ring mushrooms are edible.


Most of the pink(p) stalk color above ring mushrooms are poisonous.

# Stalk Color Below Ring

In [None]:
mushroom.groupby(['stalk-color-below-ring'])['class'].value_counts().to_frame()


In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['stalk-color-below-ring'], order=mushroom['stalk-color-below-ring'].value_counts().index, ax=axarr[0], palette="magma").set_title('Stalk Color Below Ring')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Stalk Color Below Ring')
b = sns.countplot(x="stalk-color-below-ring", data=mushroom, hue="class", order=mushroom['stalk-color-below-ring'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

Observations:

White(w) stalk color below ring mushrooms are more in numbers.

Most of the white(w) stalk color below ring mushrooms are edible.

Most of the pink(p) stalk color below ring mushrooms are poisonous.

# **Veil Type**

In [None]:
mushroom.groupby(['veil-type'])['class'].value_counts().to_frame()


In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['veil-type'], ax=axarr[0], palette="magma").set_title('Veil Type')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Veil Type')
b = sns.countplot(x="veil-type", data=mushroom, hue="class", palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

**Observations:

There is only one veil type(p) present in dataset ie partial(p).

It is not significant data for classifying edible and poisonous mushroom.

we can drop this column while modelling

# **Veil Color**

In [None]:
mushroom.groupby(['veil-color'])['class'].value_counts().to_frame()


In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['veil-color'], ax=axarr[0], order=mushroom['veil-color'].value_counts().index, palette="magma").set_title('Veil Color')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Veil Color')
b = sns.countplot(x="veil-color", data=mushroom, hue="class", order=mushroom['veil-color'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

Observations:

More 90% White(w) veil color mushrooms are present in dataset.

There is no significant difference while classifying white(w) veil color mushrooms in to edible and poisonous.


# **Ring Number**

In [None]:
mushroom.groupby(['ring-number'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['ring-number'], ax=axarr[0], order=mushroom['ring-number'].value_counts().index, palette="magma").set_title('Ring Number')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Ring Number')
b = sns.countplot(x="ring-number", data=mushroom, hue="class", order=mushroom['ring-number'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')


**Observations:**

1(o) ring number mushrooms are more in number and hard to classify.

No ring(n) mushrooms are very less in numbers and all of them are poisonous.

2(t) ring number mushrooms are mostly edible.

# **Ring Type**

In [None]:
mushroom.groupby(['ring-type'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['ring-type'], ax=axarr[0], order=mushroom['ring-type'].value_counts().index, palette="magma").set_title('Ring Type')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Ring Type')
b = sns.countplot(x="ring-type", data=mushroom, hue="class", order=mushroom['ring-type'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')


**Observations:**

Pendant(p) ring type mushrooms are more in number and most of them are edible.

All large(l) ring type mushrooms are poisonous.

# **Spore Print Color**

In [None]:
mushroom.groupby(['spore-print-color'])['class'].value_counts().to_frame()


In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['spore-print-color'], order=mushroom['spore-print-color'].value_counts().index, ax=axarr[0], palette="magma").set_title('Spore Print Color')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Spore Print Color')
b = sns.countplot(x="spore-print-color", data=mushroom, hue="class", order=mushroom['spore-print-color'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

Observations:

White(w) spore print color mushrooms are more followed by brown(n), black(k) & chocolate(h)

More than 80% of black(k) & brown(n) spore print color mushrooms are edible.

More than 80% of white(w) & chocolate(h) spore print color mushrooms are poisonous.

This can also be one of the most important feature while classifying mushr

# **Population**

In [None]:
mushroom.groupby(['population'])['class'].value_counts().to_frame()


In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['population'], ax=axarr[0], order=mushroom['population'].value_counts().index, palette="magma").set_title('Population')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Population')
b = sns.countplot(x="population", data=mushroom, hue="class", order=mushroom['population'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

**Observations:**

Population type several(p) mushrooms are more in numbers and most of them are poisonous.

All numerous(n) and abundant(a) population type mushrooms are edible.

Most of the scattered(s) & solitary(y) mushrooms are also edible.

# **Habitat**

In [None]:
mushroom.groupby(['habitat'])['class'].value_counts().to_frame()


In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['habitat'], ax=axarr[0], order=mushroom['habitat'].value_counts().index, palette="magma").set_title('Habitat')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Habitat')
b = sns.countplot(x="habitat", data=mushroom, hue="class", order=mushroom['habitat'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

**Observations:**

Mushrooms those found in woods(d) are more in number and most of them are edible.

Most of the mushrooms found in the grass(g) are also edible

Also, all the mushrooms found on the waste(w) are edible.

# # **THANK YOU!**