# Create and evaluate handcrafted classification rules in decision-rules

In this tutorial we will create and evaluate decision rules for classification.

We begin by loading the iris dataset into a DataFrame.

In [1]:
import pandas as pd
IRIS_PATH = 'resources/iris.csv'
iris_df = pd.read_csv(IRIS_PATH)
display(iris_df)
print('Columns: ', iris_df.columns.values)
print('Class names:', iris_df['class'].unique())

Unnamed: 0,sepallength,sepalwidth,petallength,petalwidth,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica


Columns:  ['sepallength' 'sepalwidth' 'petallength' 'petalwidth' 'class']
Class names: ['Iris-setosa' 'Iris-versicolor' 'Iris-virginica']


The task is to predict the class of an example (Iris-setosa, Iris-versicolor or Iris-virginica) using the values in the other columns ('sepallength', 'sepalwidth', 'petallength', 'petalwidth'). We will store the predictors in the X variable and the target in y.

In [2]:
X = iris_df.drop(columns=['class'])
y = iris_df['class']



Someone suggested the following simple rules:

1. If $petallength < 2.5$, then it's Iris-setosa.
2. If $petallength \geq 2.5$ and $petalwidth < 1.65$, then it's Iris-versicolor.
3. Otherwise, it's Iris-virginica

Let's implement them. We will start from the first one.

In [3]:
from decision_rules.classification.rule import ClassificationConclusion
from decision_rules.classification.rule import ClassificationRule
from decision_rules.conditions import ElementaryCondition, CompoundCondition

rule_1 = ClassificationRule(
    premise=ElementaryCondition(
        column_index=X.columns.get_loc('petallength'),
        right = 2.5
    ),
    conclusion=ClassificationConclusion(
        value='Iris-setosa',
        column_name='class',
    ),
    column_names=X.columns,
)

We use `ClassificationRule` class to create the rule. Every rule has two parts: premise (e.g. $petallength < 2.5$) and conclusion (e.g. Iris-setosa).

You can create a premise using one of the conditions from decision_rules.conditions:

- `NominalCondition`: checks if a value of the attribute is equal to a value, e.g. $x = 1$. Useful for nominal attributes.
- `ElementaryCondition`: checks if a value is inside an interval, e.g. $x \in [2.3, 3.1)$.
- `CompoundCondition`: a conjunction or alternative of `ElementaryCondition`s, e.g. $x \in [2.3, 3.1)$ and $y \in (8, +\infty)$.
- `AttributesCondition`: checks if a relationship between two attributes is met, e.g. $x < y$.

In case of this rule, the premise $petallength < 2.5$ can be written as $petallength \in [-\infty, 2.5)$, so we can use the `ElementaryCondition`. The `column_index` argument is the index of the column with the relevant attribute (petallength). The arguments `left` and `right` are the boundaries of the interval, their default values are minus and plus infinity respectively. The interval is open by default, we can change that by setting the `left_closed` or `right_colsed` attribute to `True`.

The conclusion is a `ClassificationConclusion` object. It accepts two arguments: the predicted value (Iris-setosa) and the column name (class).

Now, let's create the rule 2. The premise is a conjunction (has two parts joined by "and"), so we will use `CompoundCondition`. It has one required argument named `subconditions` which should be a list of `ElementaryCondition` object.

In [4]:
rule_2 = ClassificationRule(
    premise=CompoundCondition(
        subconditions=[
            ElementaryCondition(
                column_index=X.columns.get_loc('petallength'),
                left = 2.5,
                left_closed=True,
            ),
            ElementaryCondition(
                column_index=X.columns.get_loc('petalwidth'),
                right = 1.65,
            ),
        ]
    ),
    conclusion=ClassificationConclusion(
        value='Iris-versicolor',
        column_name='class',
    ),
    column_names=X.columns,
)

Now that we have all the rules, we can create the rule set. To to this, we use the `ClassificationRuleSet` class. It has one mandatory argument `rules` which is a list of `ClassificationRule` objects

In [5]:
from decision_rules.classification.ruleset import ClassificationRuleSet

ruleset = ClassificationRuleSet(rules=[rule_1, rule_2])

We still need to add the 3rd rule (otherwise, it's Iris-virginica). We implement such a rule using the `default_conclusion` property of a ruleset.

In [6]:
ruleset.default_conclusion = ClassificationConclusion(
    value='Iris-virginica',
    column_name='class',
)

Now the rule set is almost ready. After defining the rules, you should call the update method. It accepts 3 arguments:

- a pd.DataFrame of predictors (X)
- y pd.Series of target values (y)
- a measure of rule quality

This function calculates the coverage matrix, which can be later used by the ruleset objects for prediction.

In [7]:
from decision_rules.measures import accuracy
coverage_matrix = ruleset.update(X, y, accuracy)
display(coverage_matrix)

array([[ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ Tr

The ruleset is ready now. Let's see what it will predict for our data stored in X variable.

In [8]:
# The predictions for each row in X will be stored in the y_pred array.
y_pred = ruleset.predict(X)
# The compare_df will show us the true class and the prediction for each example.
compare_df = iris_df.copy()
compare_df['predictions'] = y_pred
with pd.option_context('display.max_rows', 150):
    display(compare_df)

Unnamed: 0,sepallength,sepalwidth,petallength,petalwidth,class,predictions
0,5.1,3.5,1.4,0.2,Iris-setosa,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa,Iris-setosa
9,4.9,3.1,1.5,0.1,Iris-setosa,Iris-setosa


We will check now how well our rules perform on the data using various metrics. The `calculate_for_classification` function computes the typical classification metrics, such as accuracy, f1 or Cohen kappa.

In [11]:
from decision_rules.classification.prediction_indicators import calculate_for_classification
metrics = calculate_for_classification(y, y_pred)
display(metrics)

{'type_of_problem': 'classification',
 'general': {'Balanced_accuracy': 0.96,
  'Accuracy': 0.96,
  'Cohen_kappa': 0.94,
  'F1_micro': 0.96,
  'F1_macro': 0.9599839935974389,
  'F1_weighted': 0.9599839935974391,
  'G_mean_micro': 0.9699484522385713,
  'G_mean_macro': 0.9699484522385713,
  'G_mean_weighted': 0.9699484522385713,
  'Recall_micro': 0.96,
  'Recall_macro': 0.96,
  'Recall_weighted': 0.96,
  'Specificity': 1.0,
  'Confusion_matrix': {'classes': ['Iris-setosa',
    'Iris-versicolor',
    'Iris-virginica'],
   'Iris-setosa': [50, 0, 0],
   'Iris-versicolor': [0, 48, 2],
   'Iris-virginica': [0, 4, 46]}},
 'for_classes': {'Iris-setosa': {'TP': 50,
   'FP': 0,
   'TN': 100,
   'FN': 0,
   'Recall': 1.0,
   'Specificity': 1.0,
   'F1_score': 1.0,
   'G_mean': 1.0,
   'MCC': 1.0,
   'PPV': 1.0,
   'NPV': 1.0,
   'LR_plus': 0,
   'LR_minus': 0.0,
   'Odd_ratio': 0,
   'Relative_risk': 0,
   'Confusion_matrix': {'classes': ['Iris-setosa', 'other'],
    'Iris-setosa': [50, 0],
    'o

The `calculate_rules_metrics` function of a ruleset object computes metrics describing each of the rules in the rule set.

TODO: Co znaczą te wszystkie wartości?

In [9]:
metrics = ruleset.calculate_rules_metrics(X, y)
for rule_id, metrics in metrics.items():
    print('Rule', rule_id)
    print(metrics)

Rule 75e15631-c422-46eb-8ca3-cbd1350ec01c
{'p': 50, 'n': 0, 'P': 50, 'N': 100, 'p_unique': 50, 'n_unique': 50, 'support': 50, 'conditions_count': 1, 'precision': 1.0, 'coverage': 1.0, 'C2': 1.0, 'RSS': 1.0, 'correlation': 1.0, 'lift': 3.0, 'p_value': 4.968040370318492e-41, 'TP': 50, 'FP': 0, 'TN': 100, 'FN': 0, 'sensitivity': 1.0, 'specificity': 1.0, 'negative_predictive_value': 1.0, 'odds_ratio': inf, 'relative_risk': inf, 'lr+': inf, 'lr-': 0.0}
Rule 956b1919-c417-45d8-b124-48a91caf6a2c
{'p': 48, 'n': 4, 'P': 50, 'N': 100, 'p_unique': 48, 'n_unique': 48, 'support': 52, 'conditions_count': 2, 'precision': 0.9230769230769231, 'coverage': 0.96, 'C2': 0.8669230769230769, 'RSS': 0.9199999999999999, 'correlation': 0.9112931795128765, 'lift': 2.769230769230769, 'p_value': 6.403421751602081e-32, 'TP': 48, 'FP': 4, 'TN': 96, 'FN': 2, 'sensitivity': 0.96, 'specificity': 0.96, 'negative_predictive_value': 0.9795918367346939, 'odds_ratio': 576.0, 'relative_risk': 45.23076923076923, 'lr+': 23.999

The `calculate_ruleset_stats` function returns some general stats regarding the rules present in the rule set.

In [13]:
general_stats = ruleset.calculate_ruleset_stats()
print(general_stats)

{'rules_count': 2, 'avg_conditions_count': 1.0, 'avg_precision': 0.96, 'avg_coverage': 0.98, 'total_conditions_count': 3}


We can also calculate the condition importances and attribute importances.

In [17]:
condition_importances = ruleset.calculate_condition_importances(X, y, accuracy)
print(condition_importances)
attribute_importances = ruleset.calculate_attribute_importances(condition_importances)
print(attribute_importances)

{'Iris-setosa': [], 'Iris-versicolor': [{'condition': 'petallength >= 2.50', 'attributes': ['petallength'], 'importance': 25.0}, {'condition': 'petalwidth < 1.65', 'attributes': ['petalwidth'], 'importance': 19.0}]}
{'Iris-setosa': {}, 'Iris-versicolor': {'petallength': 25.0, 'petalwidth': 19.0}}


We can serialize the ruleset into a dict using `JSONSerializer.serialize`. The dict can be later stored in a string or in a text file.

In [18]:
import os
import json
from decision_rules.serialization import JSONSerializer

OUTPUT_DIR = 'output'
RULESET_FILENAME = 'manual_iris.json'
os.makedirs(OUTPUT_DIR, exist_ok=True)
ruleset_path = os.path.join(OUTPUT_DIR, RULESET_FILENAME)
# Serialize the ruleset
ruleset_dict = JSONSerializer.serialize(ruleset)
display(ruleset_dict)
# Save to JSON
with open(ruleset_path, 'w') as fp:
    json.dump(ruleset_dict, fp)

{'meta': {'attributes': ['sepallength',
   'sepalwidth',
   'petallength',
   'petalwidth'],
  'decision_attribute': 'class',
  'decision_attribute_distribution': {'Iris-setosa': 50,
   'Iris-versicolor': 50,
   'Iris-virginica': 50}},
 'rules': [{'uuid': '75e15631-c422-46eb-8ca3-cbd1350ec01c',
   'string': 'IF petallength < 2.50 THEN class = Iris-setosa',
   'premise': {'type': 'elementary_numerical',
    'left': None,
    'right': 2.5,
    'left_closed': False,
    'right_closed': False},
   'conclusion': {'value': 'Iris-setosa'},
   'coverage': {'p': 50, 'n': 0, 'P': 50, 'N': 100}},
  {'uuid': '956b1919-c417-45d8-b124-48a91caf6a2c',
   'string': 'IF petallength >= 2.50 AND petalwidth < 1.65 THEN class = Iris-versicolor',
   'premise': {'type': 'compound',
    'operator': 'CONJUNCTION',
    'subconditions': [{'type': 'elementary_numerical',
      'attributes': [2],
      'negated': False,
      'left': 2.5,
      'right': None,
      'left_closed': True,
      'right_closed': False},

The ruleset can be loaded using `JSONSerializer.deserialize`

In [19]:
with open(ruleset_path) as fp:
    reloaded_json = json.load(fp)
reloaded_ruleset = JSONSerializer.deserialize(reloaded_json, ClassificationRuleSet)
assert reloaded_ruleset == ruleset

ValidationError: 1 validation error for _Model
attributes
  Field required [type=missing, input_value={'type': 'elementary_nume..., 'right_closed': False}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.5/v/missing