### Example of RIPPER usage on congress dataset

In [1]:
import wittgenstein as lw
import pandas as pd

Load our dataset:

In [2]:
df = pd.read_csv('../datasets/house-votes-84.csv')

Split our data into train-test sets:

In [3]:
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, random_state=0)

Create a ruleset classifier:

In [4]:
ripper_clf = lw.RIPPER()
ripper_clf

<RIPPER object (unfit) (k=2, prune_size=0.33, dl_allowance=64)>

Train the ruleset classifier on the trainset:

In [5]:
ripper_clf.fit(train, class_feat='Party', random_state=0)
ripper_clf.ruleset_ # Access underlying model

<Ruleset object: [physician-fee-freeze=n^adoption-of-the-budget-resolution=y] V [synfuels-corporation-cutback=y^physician-fee-freeze=n]>

Verbosity allows us to transparently view training steps...

In [6]:
ripper_clf.verbosity = 1 # Scale of 1-5
ripper_clf.fit(train, class_feat='Party', random_state=0)
ripper_clf.ruleset_


GREW INITIAL RULESET:
[[physician-fee-freeze=n^adoption-of-the-budget-resolution=y] V
[synfuels-corporation-cutback=y^physician-fee-freeze=n] V
[synfuels-corporation-cutback=y^mx-missile=y] V
[adoption-of-the-budget-resolution=y^anti-satellite-test-ban=n] V
[physician-fee-freeze=n] V
[Handicapped-infants=?] V
[synfuels-corporation-cutback=y^superfund-right-to-sue=n]]

optimization run 1 of 2

OPTIMIZED RULESET:
[[physician-fee-freeze=n^adoption-of-the-budget-resolution=y] V
[synfuels-corporation-cutback=y^physician-fee-freeze=n] V
[synfuels-corporation-cutback=y^mx-missile=y] V
[adoption-of-the-budget-resolution=y^anti-satellite-test-ban=n] V
[physician-fee-freeze=n] V
[Handicapped-infants=?] V
[synfuels-corporation-cutback=y^superfund-right-to-sue=n]]

No changes were made. Halting optimization.
GREW FINAL RULES
[[physician-fee-freeze=n^adoption-of-the-budget-resolution=y] V
[synfuels-corporation-cutback=y^physician-fee-freeze=n] V
[synfuels-corporation-cutback=y^mx-missile=y] V
[ado

<Ruleset object: [physician-fee-freeze=n^adoption-of-the-budget-resolution=y] V [synfuels-corporation-cutback=y^physician-fee-freeze=n]>

How good is our model?

In [7]:
test_X = test.drop('Party', axis=1)
test_y = test['Party']
ripper_clf.score(test_X, test_y) # Default metric is accuracy

0.8990825688073395

We can also score it on custom metrics, including sklearn's:

In [8]:
from sklearn.metrics import precision_score
ripper_clf.score(test_X, test_y, precision_score)

1.0

To make predictions, use the predict method, or use predict_proba to get predicted probabilities.

In [9]:
ripper_clf.predict(test_X.tail(10))

[False, True, True, True, True, True, False, True, False, False]

For explainability, we can query the reasons responsible for each prediction:

In [10]:
ripper_clf.predict(test_X.tail(), give_reasons=True)

([True, False, True, False, False],
 [[<Rule object: [physician-fee-freeze=n^adoption-of-the-budget-resolution=y]>,
   <Rule object: [synfuels-corporation-cutback=y^physician-fee-freeze=n]>],
  [],
  [<Rule object: [physician-fee-freeze=n^adoption-of-the-budget-resolution=y]>,
   <Rule object: [synfuels-corporation-cutback=y^physician-fee-freeze=n]>],
  [],
  []])