# EBM Internals - Multiclass

This is part 3 of a 3 part series describing EBM internals and how to make predictions. For part 1, click [here](./ebm-internals-regression.ipynb). For part 2, click [here](./ebm-internals-classification.ipynb).

In this part 3 we'll cover multiclass, specified bin cuts, term exclusion, and unknown values. Before reading this part you should be familiar with the information in [part 1](./ebm-internals-regression.ipynb) and  [part 2](./ebm-internals-classification.ipynb)

In [None]:
# boilerplate
from interpret import show
from interpret.glassbox import ExplainableBoostingClassifier
import numpy as np

from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())

In [None]:
# make a dataset composed of a nominal categorical feature, and a continuous feature 
X = np.array([["Sudan", "U", 0.75], ["Germany", "U", 1.75], ["Sudan", "U", 2.75], [None, "U", None]])
y = np.array([110, 80, 70, 110]) # integer classes

# Fit an EBM with no interactions
# Specify exact bin cuts for the continuous feature
# Exclude the middle feature during fitting
# Eliminate the validation set to handle the small dataset
ebm = ExplainableBoostingClassifier(
    interactions=0, 
    feature_types=['nominal', 'nominal', [1.125, 2.75]], 
    mains=[0, 2], # this excludes the middle feature
    validation_size=0, early_stopping_rounds=1000, min_samples_leaf=1)
ebm.fit(X, y)
show(ebm.explain_global())

<br/>
<br/>
<br/>
<br/>
<br/>


In [None]:
print(ebm.classes_)

Per scikit-learn convention, we store the list of classes in the ebm.classes_ attribute as a sorted array. In this example our classes are integers, but we also accept strings as seen in part 2.

In [None]:
print(ebm.feature_types)

In this example we passed feature_types into the \_\_init\_\_ function. Per scikit-learn convention, this is recorded unmodified in the ebm object.

In [None]:
print(ebm.feature_types_in_)

We translated the feature_types passed to \_\_init\_\_ into actualized feature types.

In [None]:
print(ebm.feature_names)

feature_names were not specified in the call to the \_\_init\_\_ function, so it was set to None following the scikit-learn convention of recording \_\_init\_\_ parameters unmodified.

In [None]:
print(ebm.feature_names_in_)

Since we passed in a numpy array without specifying column names, the EBM created some default names. If we had passed feature_names to the __init__ function, or if we had used a Pandas dataframe, then feature_names_in_ would have contained those names.  Following scikit-learn's [SLEP007 convention](https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep007/proposal.html), we recorded this in ebm.feature_names_in_

In [None]:
print(ebm.term_features_)

In the call to ExplainableBoostingClassifier(), mains was set to [0, 2], which means we excluded the middle feature in the list of terms that we boost on. We can see the missing feature in ebm.term_features_

In [None]:
print(ebm.term_names_)

ebm.term_names_ is also missing the middle feature since ebm.term_features_ is missing that feature 

In [None]:
print(ebm.bins_)

ebm.bins_ is a per-feature attribute, so the middle feature is listed here. We see however that the middle feature does not require binning since it does not affect the predictions of the model.

These bins are structured the same was as in the previous parts. Two thing to note though is that the continuous feature bin cuts are the same as the ones specified in the feature_types parameter to the \_\_init\_\_ function.

It is also noteworthy that the last cut we specified is exactly equal to the last feature value. In this instance where a feature value is identical to the cut value, the feature gets places into the upper bin.

In [None]:
print(ebm.intercept_)

As before, our intercept should be very close to the base rate. In the case of multiclass though, each class that we are predicting will have a logit value.

In [None]:
print(ebm.term_scores_[0])

ebm.term_scores_[0] is once again the lookup table for the nominal categorical feature. For multiclass we see though that each bin contains as many logits as there are classes being predicted.

The first index for the term scores of this additive term is the bin index from the nominal bin index. Missing values are once again placed in the 0th bin index, shown above as the first row.  The unknown bin is the last row of zeros.

Since the first feature is a nominal categorial, we use the dictionary {'Germany': 1, 'Sudan': 2} to lookup which row of logits to use for each categorical string.

In [None]:
print(ebm.term_scores_[1])

ebm.term_scores_[1] is for the continuous feature in our dataset.  As with categoricals, the 0th and last index (index 4) rows are for missing values, and unknown values respectively. This particular example has 5 bins (the 0th missing bin, the three partitions from the 2 cuts, and the unknown bin). 

This sample code encorporates everything discussed in all 3 sections. It could be used as a drop in replacement for the existing EBM predict/predict_proba functions of the EBMModel class.

In [None]:
from sklearn.utils.extmath import softmax

sample_scores = []
# we have 4 samples in X, so loop 4 times
for sample in X:
    # start from the intercept for each sample
    score = ebm.intercept_
    if not isinstance(ebm.intercept_, float):
        # make a copy of the ebm.intercept_ array
        score = ebm.intercept_.copy()

    # we have 2 terms, so add their score contributions
    for term_idx, features in enumerate(ebm.term_features_):
        # we'll be indexing into a tensor, so our index needs to be multi-dimensional
        tensor_index = []
        # for each feature that is a component of the term
        for feature_idx in features:
            feature_val = sample[feature_idx]

            if feature_val is None or feature_val is np.nan:
                # missing values are always in the 0th bin
                bin_idx = 0
            else:
                # we bin differently for main effects and pairs, so first 
                # get the list containing the bins for different resolutions
                bin_levels = ebm.bins_[feature_idx]

                # what resolution do we need for this term (main resolution, 
                # pair resolution, etc.), but limit to the last resolution available
                bins = bin_levels[min(len(bin_levels), len(features)) - 1]

                if isinstance(bins, dict):
                    # categorical feature
                    # 'unknown' category strings are in the last bin (-1)
                    bin_idx = bins.get(feature_val, -1)
                else:
                    # continuous feature
                    try:
                        # try converting to a float, if that fails it's 'unknown'
                        feature_val = float(feature_val)
                        # add 1 because the 0th bin is reserved for 'missing'
                        bin_idx = np.digitize(feature_val, bins) + 1
                    except ValueError:
                        # non-floats are 'unknown', which is in the last bin (-1)
                        bin_idx = -1
        
            tensor_index.append(bin_idx)
        score_tensor = ebm.term_scores_[term_idx]
        score += score_tensor[tuple(tensor_index)]
    sample_scores.append(score)

predictions = np.array(sample_scores)

if hasattr(ebm, 'classes_'):
    # classification
    if len(ebm.classes_) <= 2:
        # binary classification
        
        # softmax expects two logits for binary classfication
        # the first logit is alwasy equivalent to 0 for binary classification
        predictions = np.c_[np.zeros(predictions.shape), predictions]

    predictions = softmax(predictions)

if hasattr(ebm, 'classes_'):
    print("probabilities for classes " + str(ebm.classes_))
    print("")
    print(ebm.predict_proba(X))
else:
    print(ebm.predict(X))
print("")
print(predictions)