# Mushroom dataset

This notebook uses a simple classifier (LogisticRegression) on the mushroom dataset. Before executing this code, the mushroom dataset should be downloaded either manually from here https://archive.ics.uci.edu/ml/datasets/Mushroom, or by executing the following bash command. 

```bash 
wget https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data
```

Once the dataset has been downloaded, we can learn a logistic regression model on the data. As we can see from the results, this is a very easy dataset to learn over. 

In [1]:
import pandas as pd 

from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegressionCV

from sklearn.metrics import classification_report

In [2]:
df = pd.read_csv('agaricus-lepiota.data', header=None) 

y = (df[0].values == 'e').astype(int)
X = pd.get_dummies(df[df.columns[1:]]).values

In [3]:
fold = StratifiedKFold(n_splits=10, shuffle=True)

for fi, (train_index, test_index) in enumerate(fold.split(X, y)): 
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    clf = LogisticRegressionCV() 
    clf.fit(X_train, y_train) 
    
    print 'Classification report for fold {}/{}'.format(fi + 1, fold.n_splits)
    print classification_report(y_test, clf.predict(X_test))
    print 

Classification report for fold 1/10
             precision    recall  f1-score   support

          0       1.00      1.00      1.00       392
          1       1.00      1.00      1.00       421

avg / total       1.00      1.00      1.00       813


Classification report for fold 2/10
             precision    recall  f1-score   support

          0       1.00      1.00      1.00       392
          1       1.00      1.00      1.00       421

avg / total       1.00      1.00      1.00       813


Classification report for fold 3/10
             precision    recall  f1-score   support

          0       1.00      1.00      1.00       392
          1       1.00      1.00      1.00       421

avg / total       1.00      1.00      1.00       813


Classification report for fold 4/10
             precision    recall  f1-score   support

          0       1.00      1.00      1.00       392
          1       1.00      1.00      1.00       421

avg / total       1.00      1.00      1.00     