Mushroom is simply the data, Jupyter notebook, and files that supported the writing of a Medium blog post, What Decision Trees Tell Us About Deadly Mushrooms.
The article attempts to answer three questions from a dataset containing 22 features of North American mushroom samples:
- Can a machine learning model reliably identify poisonous mushrooms based on the data?
- Does any one feature of the data reliably classify mushroom toxicity?
- Can we formulate simple, memorizable rules from the data that reliably classify mushroom toxicity?
Data was obtained fromm the UCI Machine Learning Repository Mushroom Data Set, donated by Jeff Schlimmer and drawn from The Audubon Society Field Guide to North American Mushrooms (1981). G. H. Lincoff (Pres.), New York: Alfred A. Knopf.
Note there are two datasets in the UCI-hosted data folder, seemingly due to some data recovery efforts on the part of the donor:
agaricus-lepiota.data
, which has fewer records and uses single-chararcter representations of categorical valuesexpanded
, which has more records and uses full-word representations of categorical values (unzipped fromexpanded.Z
) My exploration makes use of theexpanded
dataset.
View the Jupyter notebook.
To run the code locally you will need Python and JupyterLab or Jupyter Notebook as well as the following Python libraries:
The Jupyter notebook generates DOT data and several images from that data.
Everything outside the /data
folder is licensed under an MIT License. Use of the contents of /data
should be cited appropriately (see the Machine Learning Repository's citation policy).