# Mushroom EDA and Poison Prediction

Hey there, mushroom hunters and data enthusiasts! You know that thrilling feeling of foraging for wild mushrooms, but let's face it — sometimes, that excitement comes with a tinge of uncertainty. Which of these fungi are a tasty addition to your dinner and which might lead to a regrettable stomach ache, or worse?

Enter the 'Mushroom Classification' dataset, a blast from the past hailing from the UCI Machine Learning repository. Picture this: descriptions of 23 kinds of gilled mushrooms straight out of The Audubon Society Field Guide to North American Mushrooms (1981), each tagged as either definitely edible, downright poisonous, or chillingly lurking in the 'unknown' territory—a realm we've bundled with the poisonously unsafe ones.

Unlike those catchy rhymes warning us about Poisonous Oak and Ivy, mushroom edibility isn't as straightforward. The guidebook throws a curveball, stating there's no easy rule like "leaflets three, let it be" to navigate this culinary jungle. We're left to rely on keen observations and some serious data crunching.

This notebook is our ticket to deciphering the secrets hidden in mushroom characteristics. We're diving deep, armed with machine learning tricks and analytical mojo, to uncover patterns and insights. Our mission? To demystify the distinctions between what's a tasty treat and what's a potential trip to the ER. So, buckle up for an adventure into the quirky world of mushrooms! We're harnessing the power of data science to answer that burning question: just how sure can our models be about these funky fungi? Stick around as we explore, learn, and hopefully, make your next mushroom-picking adventure a whole lot safer and tastier!

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

mush = pd.read_csv('mushrooms.csv')
mush

Unnamed: 0,class,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,...,stalk-surface-below-ring,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat
0,p,x,s,n,t,p,f,c,n,k,...,s,w,w,p,w,o,p,k,s,u
1,e,x,s,y,t,a,f,c,b,k,...,s,w,w,p,w,o,p,n,n,g
2,e,b,s,w,t,l,f,c,b,n,...,s,w,w,p,w,o,p,n,n,m
3,p,x,y,w,t,p,f,c,n,n,...,s,w,w,p,w,o,p,k,s,u
4,e,x,s,g,f,n,f,w,b,k,...,s,w,w,p,w,o,e,n,a,g
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8119,e,k,s,n,f,n,a,c,b,y,...,s,o,o,p,o,o,p,b,c,l
8120,e,x,s,n,f,n,a,c,b,y,...,s,o,o,p,n,o,p,b,v,l
8121,e,f,s,n,f,n,a,c,b,n,...,s,o,o,p,o,o,p,b,c,l
8122,p,k,y,n,f,y,f,c,n,b,...,k,w,w,p,w,o,e,w,v,l


## Dataset description

As you can see above, basically the entire dataset consists of letters. This is because the data has almost no numeric measures, consisting majorly of visual characteristics. Of course we will have to deal with this later on, but it's important to keep a brief summary of what's what in here, since this format is mostly unusual.

* **classes**: edible=e, poisonous=p
* **cap-shape**: bell=b, conical=c, convex=x, flat=f, knobbed=k, sunken=s
* **cap-surface**: fibrous=f, grooves=g, scaly=y, smooth=s
* **cap-color**: brown=n, buff=b, cinnamon=c, gray=g, green=r, pink=p, purple=u, red=e, white=w, yellow=y
* **bruises**: bruises=t, no=f
* **odor**: almond=a, anise=l, creosote=c, fishy=y, foul=f, musty=m, none=n, pungent=p, spicy=s
* **gill-attachment**: attached=a, descending=d, free=f, notched=n
* **gill-spacing**: close=c, crowded=w, distant=d
* **gill-size**: broad=b, narrow=n
* **gill-color**: black=k, brown=n, buff=b, chocolate=h, gray=g, green=r, orange=o, pink=p, purple=u, red=e, white=w, yellow=y
* **stalk-shape**: enlarging=e, tapering=t
* **stalk-root**: bulbous=b, club=c, cup=u, equal=e, rhizomorphs=z, rooted=r, missing=?
* **stalk-surface-above-ring**: fibrous=f, scaly=y, silky=k, smooth=s
* **stalk-surface-below-ring**: fibrous=f, scaly=y, silky=k, smooth=s
* **stalk-color-above-ring**: brown=n, buff=b, cinnamon=c, gray=g, orange=o, pink=p, red=e, white=w, yellow=y
* **stalk-color-below-ring**: brown=n, buff=b, cinnamon=c, gray=g, orange=o, pink=p, red=e, white=w, yellow=y
* **veil-type**: partial=p, universal=u
* **veil-color**: brown=n, orange=o, white=w, yellow=y
* **ring-number**: none=n, one=o, two=t
* **ring-type**: cobwebby=c, evanescent=e, flaring=f, large=l, none=n, pendant=p, sheathing=s, zone=z
* **spore-print-color**: black=k, brown=n, buff=b, chocolate=h, green=r, orange=o, purple=u, white=w, yellow=y
* **population**: abundant=a, clustered=c, numerous=n, scattered=s, several=v, solitary=y
* **habitat**: grasses=g, leaves=l, meadows=m, paths=p, urban=u, waste=w, woods=d

## Exploring the data

In [10]:
mush.isnull().sum()

class                       0
cap-shape                   0
cap-surface                 0
cap-color                   0
bruises                     0
odor                        0
gill-attachment             0
gill-spacing                0
gill-size                   0
gill-color                  0
stalk-shape                 0
stalk-root                  0
stalk-surface-above-ring    0
stalk-surface-below-ring    0
stalk-color-above-ring      0
stalk-color-below-ring      0
veil-type                   0
veil-color                  0
ring-number                 0
ring-type                   0
spore-print-color           0
population                  0
habitat                     0
dtype: int64

This is a perfectly filled dataset, we have no missing data.

In [8]:
mush.describe().T

Unnamed: 0,count,unique,top,freq
class,8124,2,e,4208
cap-shape,8124,6,x,3656
cap-surface,8124,4,y,3244
cap-color,8124,10,n,2284
bruises,8124,2,f,4748
odor,8124,9,n,3528
gill-attachment,8124,2,f,7914
gill-spacing,8124,2,c,6812
gill-size,8124,2,b,5612
gill-color,8124,12,b,1728


## Visualizing the data

We will start by replacing all the initials for the word itself. This will allow better labels to appear on the data visualizations, avoiding the need to look at the relations set on the `Dataset description` step.

In [12]:
class0 = {'e': 'edible', 'p': 'poisonous'}
cap_shape0 = {'b': 'bell', 'c': 'conical', 'x': 'convex', 'f': 'flat', 'k': 'knobbed', 's': 'sunken'}
cap_surface0 = {'f': 'fibrous', 'g': 'grooves', 'y': 'scaly', 's': 'smooth'}
cap_color0 = {'n': 'brown', 'b': 'buff', 'c': 'cinnamon', 'g': 'gray', 'r': 'green', 'p': 'pink', 'u': 'purple', 'e': 'red', 'w': 'white', 'y': 'yellow'}
bruises0 = {'t': 'bruises', 'f': 'no'}
odor0 = {'a': 'almond', 'l': 'anise', 'c': 'creosote', 'y': 'fishy', 'f': 'foul', 'm': 'musty', 'n': 'none', 'p': 'pungent', 's': 'spicy'}
gill_attachment0 = {'a': 'attached', 'd': 'descending', 'f': 'free', 'n': 'notched'}
gill_spacing0 = {'c': 'close', 'w': 'crowded', 'd': 'distant'}
gill_size0 = {'b': 'broad', 'n': 'narrow'}
gill_color0 = {'k': 'black', 'n': 'brown', 'b': 'buff', 'h': 'chocolate', 'g': 'gray', 'r': 'green', 'o': 'orange', 'p': 'pink', 'u': 'purple', 'e': 'red', 'w': 'white', 'y': 'yellow'}
stalk_shape0 = {'e': 'enlarging', 't': 'tapering'}
stalk_root0 = {'b': 'bulbous', 'c': 'club', 'u': 'cup', 'e': 'equal', 'z': 'rhizomorphs', 'r': 'rooted', '?': 'missing'}
stalk_surface_above_ring0 = {'f': 'fibrous', 'y': 'scaly', 'k': 'silky', 's': 'smooth'}
stalk_surface_below_ring0 = {'f': 'fibrous', 'y': 'scaly', 'k': 'silky', 's': 'smooth'}
stalk_color_above_ring0 = {'n': 'brown', 'b': 'buff', 'c': 'cinnamon', 'g': 'gray', 'o': 'orange', 'p': 'pink', 'e': 'red', 'w': 'white', 'y': 'yellow'}
stalk_color_below_ring0 = {'n': 'brown', 'b': 'buff', 'c': 'cinnamon', 'g': 'gray', 'o': 'orange', 'p': 'pink', 'e': 'red', 'w': 'white', 'y': 'yellow'}
veil_type0 = {'p': 'partial', 'u': 'universal'}
veil_color0 = {'n': 'brown', 'o': 'orange', 'w': 'white', 'y': 'yellow'}
ring_number0 = {'n': 'none', 'o': 'one', 't': 'two'}
ring_type0 = {'c': 'cobwebby', 'e': 'evanescent', 'f': 'flaring', 'l': 'large', 'n': 'none', 'p': 'pendant', 's': 'sheathing', 'z': 'zone'}
spore_print_color0 = {'k': 'black', 'n': 'brown', 'b': 'buff', 'h': 'chocolate', 'r': 'green', 'o': 'orange', 'u': 'purple', 'w': 'white', 'y': 'yellow'}
population0 = {'a': 'abundant', 'c': 'clustered', 'n': 'numerous', 's': 'scattered', 'v': 'several', 'y': 'solitary'}
habitat0 = {'g': 'grasses', 'l': 'leaves', 'm': 'meadows', 'p': 'paths', 'u': 'urban', 'w': 'waste', 'd': 'woods'}

column0s = {
    'class': class0,
    'cap-shape': cap_shape0,
    'cap-surface': cap_surface0,
    'cap-color': cap_color0,
    'bruises': bruises0,
    'odor': odor0,
    'gill-attachment': gill_attachment0,
    'gill-spacing': gill_spacing0,
    'gill-size': gill_size0,
    'gill-color': gill_color0,
    'stalk-shape': stalk_shape0,
    'stalk-root': stalk_root0,
    'stalk-surface-above-ring': stalk_surface_above_ring0,
    'stalk-surface-below-ring': stalk_surface_below_ring0,
    'stalk-color-above-ring': stalk_color_above_ring0,
    'stalk-color-below-ring': stalk_color_below_ring0,
    'veil-type': veil_type0,
    'veil-color': veil_color0,
    'ring-number': ring_number0,
    'ring-type': ring_type0,
    'spore-print-color': spore_print_color0,
    'population': population0,
    'habitat': habitat0,
}

for column, renaming in column0s.items():
    mush[column] = mush[column].replace(renaming)