#### Usage example on congress dataset

These examples use the RIPPER algorithm. IREP usage is similar, with only slight hyperparameter differences.

In [1]:
import wittgenstein as lw
import pandas as pd

Load our dataset:

In [22]:
df = pd.read_csv('../datasets/adult.csv')
df.head(1)

Unnamed: 0,age,workclass,fnlwgt,education,education.num,marital.status,occupation,relationship,race,sex,capital.gain,capital.loss,hours.per.week,native.country,income
0,90,?,77053,HS-grad,9,Widowed,?,Not-in-family,White,Female,0,4356,40,United-States,<=50K


Split our data into train-test sets:

In [20]:
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, random_state=42)

#### Training

Create a ruleset classifier:

In [24]:
ripper_clf = lw.RIPPER(random_state=42, verbosity=2)
ripper_clf

<RIPPER(n_discretize_bins=10, max_rule_conds=None, random_state=42, max_rules=None, verbosity=2, max_total_conds=None, dl_allowance=64, prune_size=0.33, k=2)>

Train the ruleset classifier on the trainset:

In [None]:
ripper_clf.fit(train, class_feat='income', pos_class='>50K')
ripper_clf.ruleset_ # Access underlying model

fitting bins for features ['age', 'fnlwgt', 'education.num', 'capital.gain', 'capital.loss', 'hours.per.week']

growing ruleset...

pos_growset 3950 pos_pruneset 1947
neg_growset 12410 neg_pruneset 6113
grew rule: [marital.status=Married-civ-spouse^education.num=(13^9)^capital.gain=(0^0)^hours.per.week=50-75]
pruned rule unchanged
updated ruleset: [marital.status=Married-civ-spouse^education.num=(13^9)^capital.gain=(0^0)^hours.per.week=50-75]

pos_growset 3898 pos_pruneset 1920
neg_growset 12410 neg_pruneset 6113
grew rule: [marital.status=Married-civ-spouse^education.num=10-13^education=Bachelors^capital.gain=(0^0)^fnlwgt=(328610^162667)]
pruned rule unchanged
updated ruleset: [marital.status=Married-civ-spouse^education.num=(13^9)^capital.gain=(0^0)^hours.per.week=50-75] V [marital.status=Married-civ-spouse^education.num=10-13^education=Bachelors^capital.gain=(0^0)^fnlwgt=(328610^162667)]

pos_growset 3876 pos_pruneset 1910
neg_growset 12410 neg_pruneset 6113
grew rule: [marital.stat

pos_growset 3211 pos_pruneset 1583
neg_growset 12336 neg_pruneset 6076
grew rule: [marital.status=Married-civ-spouse^education.num=10-13^occupation=Exec-managerial^hours.per.week=40-50^workclass=Private^age=38-43]
pruned rule unchanged
updated ruleset: ...[marital.status=Married-civ-spouse^education.num=10-13^occupation=Exec-managerial^hours.per.week=50-75^fnlwgt=106812-131298] V [marital.status=Married-civ-spouse^education.num=10-13^occupation=Exec-managerial^hours.per.week=40-50^workclass=Private^age=38-43]

pos_growset 3205 pos_pruneset 1580
neg_growset 12336 neg_pruneset 6076
grew rule: [marital.status=Married-civ-spouse^occupation=Prof-specialty^education.num=(13^9)^education=Prof-school^hours.per.week=40-50^age=38-43]
pruned rule unchanged
updated ruleset: ...[marital.status=Married-civ-spouse^education.num=10-13^occupation=Exec-managerial^hours.per.week=40-50^workclass=Private^age=38-43] V [marital.status=Married-civ-spouse^occupation=Prof-specialty^education.num=(13^9)^educatio

pos_growset 2449 pos_pruneset 1207
neg_growset 11881 neg_pruneset 5852
grew rule: [marital.status=Married-civ-spouse^capital.gain=(0^0)^education=Some-college^hours.per.week=40-50^occupation=Craft-repair^native.country=United-States]
pruned rule: [marital.status=Married-civ-spouse^capital.gain=(0^0)^education=Some-college]
updated ruleset: ...[marital.status=Married-civ-spouse^education.num=(13^9)^age=48-55] V [marital.status=Married-civ-spouse^capital.gain=(0^0)^education=Some-college]

pos_growset 2353 pos_pruneset 1159
neg_growset 11838 neg_pruneset 5832
grew rule: [marital.status=Married-civ-spouse^occupation=Exec-managerial^age=48-55^capital.gain=(0^0)]
pruned rule unchanged
updated ruleset: ...[marital.status=Married-civ-spouse^capital.gain=(0^0)^education=Some-college] V [marital.status=Married-civ-spouse^occupation=Exec-managerial^age=48-55^capital.gain=(0^0)]

pos_growset 2343 pos_pruneset 1155
neg_growset 11838 neg_pruneset 5831
grew rule: [marital.status=Married-civ-spouse^o

pos_growset 2125 pos_pruneset 1047
neg_growset 11758 neg_pruneset 5792
grew rule: [marital.status=Married-civ-spouse^education=Some-college^occupation=Exec-managerial^workclass=Private^age=38-43^hours.per.week=37-40]
pruned rule: [marital.status=Married-civ-spouse^education=Some-college^occupation=Exec-managerial^workclass=Private^age=38-43]
updated ruleset: ...[marital.status=Married-civ-spouse^education=Some-college^age=43-48^hours.per.week=40-50^fnlwgt=158948-178686] V [marital.status=Married-civ-spouse^education=Some-college^occupation=Exec-managerial^workclass=Private^age=38-43]

pos_growset 2115 pos_pruneset 1043
neg_growset 11754 neg_pruneset 5790
grew rule: [marital.status=Married-civ-spouse^capital.gain=(0^0)^hours.per.week=40-50^education=HS-grad^workclass=Private^age=48-55]
pruned rule: [marital.status=Married-civ-spouse^capital.gain=(0^0)^hours.per.week=40-50^education=HS-grad]
updated ruleset: ...[marital.status=Married-civ-spouse^education=Some-college^occupation=Exec-man

grew rule: [marital.status=Married-civ-spouse^age=43-48^native.country=United-States^workclass=Federal-gov^hours.per.week=37-40^occupation=Adm-clerical^capital.loss=2559-0]
pruned rule: [marital.status=Married-civ-spouse^age=43-48^native.country=United-States^workclass=Federal-gov^hours.per.week=37-40^occupation=Adm-clerical]
updated ruleset: ...[marital.status=Married-civ-spouse^age=43-48^native.country=United-States^workclass=Federal-gov^education=Assoc-acdm] V [marital.status=Married-civ-spouse^age=43-48^native.country=United-States^workclass=Federal-gov^hours.per.week=37-40^occupation=Adm-clerical]

pos_growset 2009 pos_pruneset 990
neg_growset 11732 neg_pruneset 5779
grew rule: [marital.status=Married-civ-spouse^education=Some-college^age=55-68^fnlwgt=178686-197036^workclass=Self-emp-inc]
pruned rule unchanged
updated ruleset: ...[marital.status=Married-civ-spouse^age=43-48^native.country=United-States^workclass=Federal-gov^hours.per.week=37-40^occupation=Adm-clerical] V [marital.

grew rule: [marital.status=Married-civ-spouse^education=Some-college^age=43-48^fnlwgt=106812-131298^relationship=Husband]
pruned rule: [marital.status=Married-civ-spouse^education=Some-college^age=43-48^fnlwgt=106812-131298]
updated ruleset: ...[marital.status=Married-civ-spouse^education=Some-college^occupation=Prof-specialty^fnlwgt=(328610^162667)] V [marital.status=Married-civ-spouse^education=Some-college^age=43-48^fnlwgt=106812-131298]

pos_growset 1898 pos_pruneset 936
neg_growset 11688 neg_pruneset 5757
grew rule: [marital.status=Married-civ-spouse^education.num=10-13^age=48-55^hours.per.week=40-50^education=Assoc-voc^occupation=Craft-repair]
pruned rule unchanged
updated ruleset: ...[marital.status=Married-civ-spouse^education=Some-college^age=43-48^fnlwgt=106812-131298] V [marital.status=Married-civ-spouse^education.num=10-13^age=48-55^hours.per.week=40-50^education=Assoc-voc^occupation=Craft-repair]

pos_growset 1897 pos_pruneset 935
neg_growset 11688 neg_pruneset 5757
grew r

grew rule: [marital.status=Married-civ-spouse^education.num=(13^9)^capital.gain=(0^0)^hours.per.week=50-75]
grew rule: [marital.status=Married-civ-spouse^education.num=(13^9)^capital.gain=(0^0)^hours.per.week=50-75]

rule 1 of 87
original: [marital.status=Married-civ-spouse^education.num=(13^9)^capital.gain=(0^0)^hours.per.week=50-75]
replacement: [marital.status=Married-civ-spouse^education.num=(13^9)]
revision: [marital.status=Married-civ-spouse^education.num=(13^9)]
*best: unchanged
best already included in optimization -- retaining original

grew rule: [marital.status=Married-civ-spouse^education=Bachelors^capital.gain=(0^0)^hours.per.week=50-75^occupation=Exec-managerial]
grew rule: [marital.status=Married-civ-spouse^education.num=10-13^education=Bachelors^capital.gain=(0^0)^fnlwgt=(328610^162667)]

rule 2 of 87
original: [marital.status=Married-civ-spouse^education.num=10-13^education=Bachelors^capital.gain=(0^0)^fnlwgt=(328610^162667)]
replacement: [marital.status=Married-civ-sp


rule 14 of 87
original: [marital.status=Married-civ-spouse^education.num=10-13^education=Bachelors^capital.loss=(2559^0)^workclass=Private^hours.per.week=37-40]
replacement: [marital.status=Married-civ-spouse^education.num=10-13^education=Bachelors]
revision: [marital.status=Married-civ-spouse^education.num=10-13^education=Bachelors]
*best: unchanged
best already included in optimization -- retaining original

grew rule: [marital.status=Married-civ-spouse^education.num=10-13^occupation=Exec-managerial^age=48-55^workclass=Federal-gov]
grew rule: [marital.status=Married-civ-spouse^occupation=Prof-specialty^education.num=(13^9)^hours.per.week=37-40^race=White^workclass=State-gov^education=Doctorate]

rule 15 of 87
original: [marital.status=Married-civ-spouse^occupation=Prof-specialty^education.num=(13^9)^hours.per.week=37-40^race=White^workclass=State-gov]
replacement: [marital.status=Married-civ-spouse^education.num=10-13]
revision: [marital.status=Married-civ-spouse^occupation=Prof-spe


rule 26 of 87
original: [marital.status=Married-civ-spouse^occupation=Prof-specialty^education.num=(13^9)^hours.per.week=37-40]
replacement: [marital.status=Married-civ-spouse^education.num=10-13^education=Bachelors]
revision: [marital.status=Married-civ-spouse^occupation=Prof-specialty]
*best: unchanged
best already included in optimization -- retaining original

grew rule: [marital.status=Married-civ-spouse^education.num=10-13^education=Bachelors^hours.per.week=40-50^workclass=Private^fnlwgt=175360-65991]
grew rule: [marital.status=Married-civ-spouse^education.num=10-13^education=Bachelors^hours.per.week=40-50^native.country=United-States^fnlwgt=197036-220187^occupation=Prof-specialty^workclass=Private]

rule 27 of 87
original: [marital.status=Married-civ-spouse^education.num=10-13^education=Bachelors^hours.per.week=40-50^native.country=United-States^fnlwgt=197036-220187^occupation=Prof-specialty]
replacement: [marital.status=Married-civ-spouse^education.num=10-13^education=Bachelor

grew rule: [marital.status=Married-civ-spouse^education=Some-college^occupation=Exec-managerial^race=White^hours.per.week=50-75^fnlwgt=158948-178686]
grew rule: [marital.status=Married-civ-spouse^education=Some-college^occupation=Exec-managerial^age=48-55^hours.per.week=40-50^workclass=Private^fnlwgt=158948-178686]

rule 40 of 87
original: [marital.status=Married-civ-spouse^education=Some-college^occupation=Exec-managerial^age=48-55^hours.per.week=40-50]
replacement: [marital.status=Married-civ-spouse^education=Some-college^occupation=Exec-managerial]
revision: [marital.status=Married-civ-spouse^education=Some-college^occupation=Exec-managerial]
*best: unchanged
best already included in optimization -- retaining original

grew rule: [marital.status=Married-civ-spouse^education=Some-college^occupation=Exec-managerial^fnlwgt=220187-260617^age=48-55]
grew rule: [marital.status=Married-civ-spouse^hours.per.week=40-50^occupation=Prof-specialty^education.num=(13^9)^education=Prof-school^nati


rule 52 of 87
original: [marital.status=Married-civ-spouse^capital.gain=(0^0)^hours.per.week=40-50^education=HS-grad]
replacement: [marital.status=Married-civ-spouse^occupation=Sales^workclass=Self-emp-inc]
revision: [marital.status=Married-civ-spouse^capital.gain=(0^0)]
*best: unchanged
best already included in optimization -- retaining original

grew rule: [marital.status=Married-civ-spouse^education=Some-college^age=38-43^occupation=Sales^hours.per.week=40-50^fnlwgt=65991-106812]
grew rule: [marital.status=Married-civ-spouse^education=Some-college^fnlwgt=(328610^162667)^age=38-43^occupation=Adm-clerical]

rule 53 of 87
original: [marital.status=Married-civ-spouse^education=Some-college^fnlwgt=(328610^162667)^age=38-43^occupation=Adm-clerical]
replacement: [marital.status=Married-civ-spouse^education=Some-college^age=38-43^occupation=Sales^hours.per.week=40-50]
revision: [marital.status=Married-civ-spouse^education=Some-college^fnlwgt=(328610^162667)]
*best: unchanged
best already i


rule 64 of 87
original: [marital.status=Married-civ-spouse^education=Some-college^fnlwgt=178686-197036^hours.per.week=37-40^occupation=Protective-serv]
replacement: [marital.status=Married-civ-spouse^education=Some-college^occupation=Exec-managerial^age=34-38^workclass=Self-emp-inc]
revision: [marital.status=Married-civ-spouse^education=Some-college^fnlwgt=178686-197036^hours.per.week=37-40]
*best: unchanged
best already included in optimization -- retaining original

grew rule: [marital.status=Married-civ-spouse^education.num=(13^9)^native.country=United-States^occupation=Sales^education=Masters^workclass=Self-emp-inc]
grew rule: [marital.status=Married-civ-spouse^occupation=Prof-specialty^age=34-38^education.num=(13^9)^education=Prof-school]

rule 65 of 87
original: [marital.status=Married-civ-spouse^occupation=Prof-specialty^age=34-38^education.num=(13^9)^education=Prof-school]
replacement: [marital.status=Married-civ-spouse^education.num=(13^9)^native.country=United-States^occupat

grew rule: [marital.status=Married-civ-spouse^occupation=Prof-specialty^education.num=(13^9)^fnlwgt=178686-197036]
grew rule: [marital.status=Married-civ-spouse^occupation=Prof-specialty^age=34-38^education=Assoc-acdm^capital.loss=2559-0]

rule 77 of 87
original: [marital.status=Married-civ-spouse^occupation=Prof-specialty^age=34-38^education=Assoc-acdm]
replacement: [marital.status=Married-civ-spouse^occupation=Prof-specialty^education.num=(13^9)^fnlwgt=178686-197036]
revision: [marital.status=Married-civ-spouse^occupation=Prof-specialty^age=34-38]
*best: unchanged
best already included in optimization -- retaining original

grew rule: [marital.status=Married-civ-spouse^education=Some-college^occupation=Exec-managerial^fnlwgt=131298-158948^native.country=United-States^age=34-38]
grew rule: [marital.status=Married-civ-spouse^education=Some-college^fnlwgt=220187-260617^workclass=Self-emp-inc^hours.per.week=50-75]

rule 78 of 87
original: [marital.status=Married-civ-spouse^education=Some

pos_growset 3898 pos_pruneset 1920
neg_growset 12410 neg_pruneset 6113
grew rule: [marital.status=Married-civ-spouse^education.num=10-13^education=Bachelors^capital.gain=(0^0)^fnlwgt=(328610^162667)]
pruned rule unchanged
updated ruleset: ...[marital.status=Married-civ-spouse^education.num=(13^9)^capital.gain=(0^0)^hours.per.week=50-75] V [marital.status=Married-civ-spouse^education.num=10-13^education=Bachelors^capital.gain=(0^0)^fnlwgt=(328610^162667)]

pos_growset 3876 pos_pruneset 1910
neg_growset 12410 neg_pruneset 6113
grew rule: [marital.status=Married-civ-spouse^education=Bachelors^occupation=Exec-managerial^hours.per.week=40-50^capital.loss=(2559^0)^workclass=Self-emp-inc]
pruned rule: [marital.status=Married-civ-spouse^education=Bachelors^occupation=Exec-managerial^hours.per.week=40-50^capital.loss=(2559^0)]
updated ruleset: ...[marital.status=Married-civ-spouse^education.num=10-13^education=Bachelors^capital.gain=(0^0)^fnlwgt=(328610^162667)] V [marital.status=Married-civ-sp

The fit method is flexible and can be called in various ways, including with train_x and train_y, or with numpy arrays.  

Unlike dataframes, arrays don't have feature names...

In [None]:
X_train, y_train = train.drop('Party', axis=1), train['Party']
X_array, y_array = X_train.values, y_train.values
ripper_clf.fit(X_array, y_array, pos_class='democrat')
ripper_clf.ruleset_

But we can pass them in:

In [7]:
X_array, y_arry = train.drop('Party', axis=1).values, train['Party'].values
ripper_clf.fit(X_array, y_arry, 
               pos_class='democrat', class_feat='Party', 
               feature_names=df.columns[1:])
ripper_clf.ruleset_

<Ruleset [physician-fee-freeze=n] V [synfuels-corporation-cutback=y^adoption-of-the-budget-resolution=y^anti-satellite-test-ban=n] V [adoption-of-the-budget-resolution=?^Handicapped-infants=y]>

We can force a simpler ruleset using max_rules, max_total_conds, or max_rule_conds.

In [8]:
ripper_clf = lw.RIPPER(max_rules=2, random_state=1)
ripper_clf.fit(train, class_feat='Party', pos_class='democrat')
ripper_clf.ruleset_ 

<Ruleset [physician-fee-freeze=n] V [synfuels-corporation-cutback=y^physician-fee-freeze=?]>

Verbosity allows us to view training steps...

In [9]:
ripper_clf = lw.RIPPER(random_state=42, verbosity=1) # Scale of 0-5
ripper_clf.fit(train, class_feat='Party', pos_class='democrat')
ripper_clf.ruleset_


GREW INITIAL RULESET:
[[physician-fee-freeze=n] V
[synfuels-corporation-cutback=y] V
[superfund-right-to-sue=? ^ export-administration-act-south-africa=y] V
[adoption-of-the-budget-resolution=y ^ anti-satellite-test-ban=n ^ immigration=n]]

optimization run 1 of 2

OPTIMIZED RULESET:
[[physician-fee-freeze=n] V
[synfuels-corporation-cutback=y] V
[superfund-right-to-sue=? ^ export-administration-act-south-africa=y] V
[adoption-of-the-budget-resolution=y ^ anti-satellite-test-ban=n ^ immigration=n]]

No changes were made. Halting optimization.
GREW FINAL RULES
[[physician-fee-freeze=n] V
[synfuels-corporation-cutback=y] V
[superfund-right-to-sue=? ^ export-administration-act-south-africa=y] V
[adoption-of-the-budget-resolution=y ^ anti-satellite-test-ban=n ^ immigration=n] V
[physician-fee-freeze=n] V
[synfuels-corporation-cutback=y] V
[superfund-right-to-sue=? ^ export-administration-act-south-africa=y] V
[adoption-of-the-budget-resolution=y ^ anti-satellite-test-ban=n ^ immigration=n]

<Ruleset [physician-fee-freeze=n] V [adoption-of-the-budget-resolution=y^anti-satellite-test-ban=n^immigration=n]>

#### Model selection

Some sklearn methods are supported. Cross-validation:

In [10]:
from sklearn.model_selection import cross_val_score

# Dummify our data to make sklearn happy
X_train = pd.get_dummies(X_train, columns=X_train.select_dtypes('object').columns)
y_train = y_train.map(lambda x: 1 if x=='democrat' else 0)

ripper_clf = lw.RIPPER(random_state=42)
cross_val_score(ripper_clf, X_train, y_train)


array([0.95454545, 0.90769231, 0.89230769, 0.92307692, 0.90769231])

Grid-search:

In [11]:
from sklearn.model_selection import GridSearchCV
param_grid = {"prune_size": [0.1, 0.25, 0.33, 0.5], "k": [1, 2]}
grid = GridSearchCV(estimator=ripper_clf, param_grid=param_grid)
grid.fit(X_train, y_train)
grid.best_params_

{'k': 1, 'prune_size': 0.33}

Ensemble:

In [12]:
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier

nb = GaussianNB()
tree = DecisionTreeClassifier(random_state=42)
estimators = [("rip", ripper_clf), ("tree", tree), ("nb", nb)]
ensemble_clf = StackingClassifier(
  estimators=estimators, final_estimator=LogisticRegression()
)


#### Testing

How good is our model?

In [13]:
X_test = test.drop('Party', axis=1)
y_test = test['Party']
ripper_clf = lw.RIPPER(random_state=42)
ripper_clf.fit(train, class_feat='Party', pos_class='democrat')
ripper_clf.score(X_test, y_test) # Default metric is accuracy

0.908256880733945

We can also score it on custom metrics, including sklearn's:

In [14]:
from sklearn.metrics import precision_score, recall_score
precision = ripper_clf.score(X_test, y_test, precision_score)
recall = ripper_clf.score(X_test, y_test, recall_score)
print(f'precision: {precision}')
print(f'recall: {recall}')

precision: 1.0
recall: 0.855072463768116


#### Prediction

To make predictions, use the predict method.

In [15]:
ripper_clf.predict(X_test.tail(10))

[False, True, True, True, False, False, False, True, False, True]

For predicted probabilities, use predict_proba.

In [16]:
ripper_clf.predict_proba(X_test.tail(10))

array([[0.75903614, 0.24096386],
       [0.01086957, 0.98913043],
       [0.01086957, 0.98913043],
       [0.01086957, 0.98913043],
       [0.75903614, 0.24096386],
       [0.75903614, 0.24096386],
       [0.75903614, 0.24096386],
       [0.01086957, 0.98913043],
       [0.75903614, 0.24096386],
       [0.01086957, 0.98913043]])

We can also ask our model to give us the reasons for its predictions.

In [18]:
ripper_clf.predict(X_test.tail(), give_reasons=True)

([False, False, True, False, True],
 [[],
  [],
  [<Rule [physician-fee-freeze=n]>],
  [],
  [<Rule [physician-fee-freeze=n]>]])