We made progress with [OneR](../04_boc/boc.ipynb) which makes predictions on one feature of the dataset. The next logical step is to make predictions on combinations of features. This is what a [Decision Tree](https://en.wikipedia.org/wiki/Decision_tree) does.

## Easy button

We'll use bag of characters feature extraction like we did last chapter and feed that to the decision tree.

In [18]:
from sklearn.compose import ColumnTransformer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier
from nlpbook import get_train_test_data

# Grab the data and extract the features and labels.
train, test = get_train_test_data()
features = ["review"]
label = "label"
X, y = train[features], train[label]
X_test, y_test = test[features], test[label]

# Set up the pipeline.
transformers = ColumnTransformer(
    [
        (
            "boc",
            CountVectorizer(analyzer="char", lowercase=False),
            "review",
        )
    ]
)
model = DecisionTreeClassifier()
pipeline = Pipeline(
    [("bag_chars", transformers), ("decision_tree", model)]
)

# Train it!
pipeline.fit(X, y)
# Score it!
pipeline.score(X_test, y_test)

0.5540183112919633

How does this compare to our previous models?

In [3]:
# | echo: false
from nlpbook import get_results

get_results(["Baseline", "OneR (boc)"])

Unnamed: 0_level_0,Accuracy
Model,Unnamed: 1_level_1
Baseline,0.501119
OneR (boc),0.581282


Ah man, it actually performed worse than OneR! But why?

## Analyzing a model

Models are not infallible. In fact all of our models so far have been very fallible and it would be helpful to inspect these models to figure out why they get things wrong.

Previously we used the OneR model to inspect and learn about the data (@sec-models-as-analysis-tools). We found the most important feature with OneR was the number of question marks in a review. Then we plotted the data and saw reviews with no question marks were more likely to be positive and reviews with question marks were more likely to be negative. We can take this a step further and look at how important the model thinks all features are and it's actually quite simple.

### Feature permutation

Once we have a trained model, we can see how important that model thinks a feature is by randomly shuffling the values of that feature across the dataset and seeing how it impacts accuracy. If the accuracy stays roughly the same then the feature doesn't contribute and isn't important. On the other hand if we see the accuracy drop, then that feature plays a roll in the accuracy we see.

Let's test this out on the question mark feature.

In [None]:
# Get the bag of characters feature extractor.
# The column transformer can be accessed through `named_transformers` like so:
boc = transformers.named_transformers_["boc"]

# Find the index of "?".
i = boc.vocabulary["?"]

# Transform the data.
X_permute = transformers.transform(X_test)
# Shuffle the feature values.
X_permute[:, i] =