# Mushroom dataset: 1R and RIPPER
The following exercise is taken from <b> Machine Learning with R</b> by <b> Brett Lantz </b> (Third Edition)

The dataset used in the exercise is the <b>Mushrooms</b> dataset. The dataset was orginainally published by <b>Jeff Schlimer</b> of <b>Carnegie Mellon University</b>. The dataset here is downloaded from the textbook's github page. The dataset is also freely available at <b>UCI Machine Learning Repository</b>.

## Step 1: Collecting the data

In [1]:
mushrooms <- read.csv("https://raw.githubusercontent.com/PacktPublishing/Machine-Learning-with-R-Third-Edition/master/Chapter05/mushrooms.csv",
                     stringsAsFactors = T)

## Step 2: Exploring and preparing the data

In [2]:
str(mushrooms)

'data.frame':	8124 obs. of  23 variables:
 $ type                    : Factor w/ 2 levels "edible","poisonous": 2 1 1 2 1 1 1 1 2 1 ...
 $ cap_shape               : Factor w/ 6 levels "bell","conical",..: 3 3 1 3 3 3 1 1 3 1 ...
 $ cap_surface             : Factor w/ 4 levels "fibrous","grooves",..: 4 4 4 3 4 3 4 3 3 4 ...
 $ cap_color               : Factor w/ 10 levels "brown","buff",..: 1 10 9 9 4 10 9 9 9 10 ...
 $ bruises                 : Factor w/ 2 levels "no","yes": 2 2 2 2 1 2 2 2 2 2 ...
 $ odor                    : Factor w/ 9 levels "almond","anise",..: 8 1 2 8 7 1 1 2 8 1 ...
 $ gill_attachment         : Factor w/ 2 levels "attached","free": 2 2 2 2 2 2 2 2 2 2 ...
 $ gill_spacing            : Factor w/ 2 levels "close","crowded": 1 1 1 1 2 1 1 1 1 1 ...
 $ gill_size               : Factor w/ 2 levels "broad","narrow": 2 1 1 2 1 1 1 1 2 1 ...
 $ gill_color              : Factor w/ 12 levels "black","brown",..: 1 1 2 2 1 2 5 2 8 5 ...
 $ stalk_shape             : Factor w/

In [3]:
# veil_type is a factor with just one level, therefore it should be erased from the dataset
mushrooms$veil_type <- NULL

table(mushrooms$type)


   edible poisonous 
     4208      3916 

## Step 3: Training a model on the data

In [5]:
# First try, a 1R algorithm
library(OneR)

mushroom_1R <- OneR(type ~ ., mushrooms)
mushroom_1R

"package 'OneR' was built under R version 3.6.3"



Call:
OneR.formula(formula = type ~ ., data = mushrooms)

Rules:
If odor = almond   then type = edible
If odor = anise    then type = edible
If odor = creosote then type = poisonous
If odor = fishy    then type = poisonous
If odor = foul     then type = poisonous
If odor = musty    then type = poisonous
If odor = none     then type = edible
If odor = pungent  then type = poisonous
If odor = spicy    then type = poisonous

Accuracy:
8004 of 8124 instances classified correctly (98.52%)


In [6]:
mushroom_1R_pred <- predict(mushroom_1R, mushrooms)
table(actual = mushrooms$type, predicted = mushroom_1R_pred)

           predicted
actual      edible poisonous
  edible      4208         0
  poisonous    120      3796

## Step 5: Improving model performance

In [9]:
# Applying the RIPPER instead of the 1R algorithm, to allow for more than just one rule
library(RWeka)

mushroom_JRip <- JRip(type ~ . , mushrooms)
mushroom_JRip

JRIP rules:

(odor = foul) => type=poisonous (2160.0/0.0)
(gill_size = narrow) and (gill_color = buff) => type=poisonous (1152.0/0.0)
(gill_size = narrow) and (odor = pungent) => type=poisonous (256.0/0.0)
(odor = creosote) => type=poisonous (192.0/0.0)
(spore_print_color = green) => type=poisonous (72.0/0.0)
(stalk_surface_below_ring = scaly) and (stalk_surface_above_ring = silky) => type=poisonous (68.0/0.0)
(habitat = leaves) and (cap_color = white) => type=poisonous (8.0/0.0)
(stalk_color_above_ring = yellow) => type=poisonous (8.0/0.0)
 => type=edible (4208.0/0.0)

Number of Rules : 9


- a total of 9 rules
- any mushroom that is not covered by the rules is edible