### Example of IREP usage on congress dataset

In [1]:
import ruleset
import pandas as pd

Load our dataset:

In [2]:
df = pd.read_csv('../datasets/house-votes-84.csv')

Split our data into train-test sets:

In [3]:
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, random_state=0)

Create a ruleset classifier:

In [4]:
irep_clf = ruleset.IREP()
irep_clf

<IREP object (unfit)>

Train the ruleset classifier on the trainset:

In [5]:
irep_clf.fit(train, class_feat='Party', random_state=0)
irep_clf.ruleset_ # Access underlying model

<Ruleset object: [physician-fee-freeze=n] V [synfuels-corporation-cutback=y]>

Verbosity allows us to transparently view training steps...

In [6]:
irep_clf.verbosity = 2 # Scale of 1-5
irep_clf.fit(train, class_feat='Party', random_state=0)
irep_clf.ruleset_

pos_growset 135 pos_pruneset 67
neg_growset 83 neg_pruneset 41
grew rule: [physician-fee-freeze=n^adoption-of-the-budget-resolution=y]
pruned rule: [physician-fee-freeze=n]
updated ruleset: [physician-fee-freeze=n]

pos_growset 11 pos_pruneset 6
neg_growset 81 neg_pruneset 41
grew rule: [synfuels-corporation-cutback=y^immigration=n^superfund-right-to-sue=n]
pruned rule: [synfuels-corporation-cutback=y]
updated ruleset: ...[physician-fee-freeze=n] V [synfuels-corporation-cutback=y]

pos_growset 3 pos_pruneset 2
neg_growset 71 neg_pruneset 36
grew rule: [education-spending=n^physician-fee-freeze=?]
pruned rule unchanged

GREW RULESET:
[[physician-fee-freeze=n] V
[synfuels-corporation-cutback=y]]



<Ruleset object: [physician-fee-freeze=n] V [synfuels-corporation-cutback=y]>

With IREP, we can turn off pruning (it's fun, but not recommended, for obvious reasons)

In [7]:
unpruned_irep_clf = ruleset.IREP(prune_size=None)
unpruned_irep_clf

<IREP object (unfit)>

In [8]:
unpruned_irep_clf.fit(train, class_feat='Party', random_state=0)
unpruned_irep_clf.ruleset_ # Access underlying model

<Ruleset object: [physician-fee-freeze=n^adoption-of-the-budget-resolution=y] V [synfuels-corporation-cutback=y^physician-fee-freeze=n] V [synfuels-corporation-cutback=y^adoption-of-the-budget-resolution=y^anti-satellite-test-ban=n] V [mx-missile=y^duty-free-exports=y^physician-fee-freeze=n] V [physician-fee-freeze=?^immigration=y] V [immigration=n^synfuels-corporation-cutback=y^export-administration-act-south-africa=n^Handicapped-infants=y] V [immigration=n^education-spending=n^mx-missile=y] V [physician-fee-freeze=n^anti-satellite-test-ban=?] V [export-administration-act-south-africa=?^el-salvador-aid=?^mx-missile=y] V [immigration=n^synfuels-corporation-cutback=y^export-administration-act-south-africa=n^Water-project-cost-sharing=n] V [physician-fee-freeze=n^education-spending=y] V [export-administration-act-south-africa=?^adoption-of-the-budget-resolution=y^anti-satellite-test-ban=n] V [synfuels-corporation-cutback=y^superfund-right-to-sue=n^immigration=n]>

How good is our model?

In [9]:
test_X = test.drop('Party', axis=1)
test_y = test['Party']
irep_clf.score(test_X, test_y) # Default metric is accuracy

0.926605504587156

We can also score it on custom metrics, including sklearn's:

In [11]:
from sklearn.metrics import precision_score
irep_clf.score(test_X, test_y, precision_score)

0.9130434782608695

To make predictions:

In [12]:
irep_clf.predict(test_X.tail())

[True, False, True, False, False]

For explainability, we can query the reasons responsible for each prediction:

In [13]:
irep_clf.predict(test_X.tail(), give_reasons=True)

([True, False, True, False, False],
 [[<Rule object: [physician-fee-freeze=n]>,
   <Rule object: [synfuels-corporation-cutback=y]>],
  [],
  [<Rule object: [physician-fee-freeze=n]>,
   <Rule object: [synfuels-corporation-cutback=y]>],
  [],
  []])