Quantitative CBA: Small and Comprehensible Association Rule Classification Models

Quantitative CBA (QCBA) is a postprocessing algorithm intended for some machine learning models generating rule-based classifiers based on pre-discretized data. QCBA implements a number of optimization steps to improve handling of quantitative (numerical) attributes. Depending on input dataset and used rule classifier, the postprocessed models may become smaller and/or more accurate. The QCBA approach is described in:

Tomas Kliegr and Ebroul Izquierdo. "Quantitative CBA: Small and Comprehensible Association Rule Classification Models." Applied Intelligence https://link.springer.com/article/10.1007/s10489-022-04370-x (2023).

Inputs for QCBA

Rule model (rule list/set) learnt by a supported rule learning algorithm on prediscretized data
Raw dataset (before discretization)

Output of QCBA

Rule list: rules are sorted. The first rule matching a given instances is used to classify it.

Which postprocessing steps are performed by QCBA?

Refitting rules Literals originally aligned to borders of the discretized regions are refit to finer grid.
Attribute pruning Remove redundant attributes from rules.
Trimming Literals in discovered rules are trimmed so that they do not contain regions not covered by data.
Extension Ranges of literals in the body of each rule are extended, escaping from the coarse hypercubic created by discretization.
Data coverage pruning Remove some of the newly redundant rules
Default rule overlap pruning Some rules that classify into the same class as the default rule in the end of the classifier can be removed.

These algorithms are explained in the article and in an interactive tutorial(sources here).

Installation

First, you need to have correctly installed rJava package. For instructions on how to setup rJava please refer to rJava documentation. Note a common issue with installing and configuring rJava can be resolved according to these instructions.

The latest version can be installed from the R environment with:

devtools::install_github("kliegr/QCBA")

Example

This example shows how to learn the standard association-rule based classification model using the CBA algorithm (as implemented by the arc package) and then how to postprocess it with QCBA.

Baseline CBA model

Learn a CBA classifier.

library(arc)
set.seed(25)
allData <- datasets::iris[sample(nrow(datasets::iris)),]
trainFold <- allData[1:100,]
testFold <- allData[101:nrow(datasets::iris),]
classAtt<-"Species"
y_true <-testFold[[classAtt]]
rmCBA <- cba(trainFold, classAtt=classAtt)
predictionBASE <- predict(rmCBA,testFold)
inspect(rmCBA@rules)

The model:

lhs                                                   rhs                  support confidence coverage lift     count lhs_length orderedConf orderedSupp cumulativeConf
[1] {Petal.Length=(5.05; Inf]}                         => {Species=virginica}  0.34    1.00       0.34     2.702703 34    1          1.0        34          1.00      
[2] {Petal.Length=[-Inf;2.45]}                         => {Species=setosa}     0.32    1.00       0.32     3.125000 32    1          1.0        32          1.00        
[3] {Petal.Length=(2.45;5.05], Petal.Width=(0.8;1.55]} => {Species=versicolor} 0.29    1.00       0.29     3.225806 29    2          1.0        29          1.00         
[4] {}                                                 => {Species=virginica}  0.37    0.37       1.00     1.000000 37    0          0.6        3         0.98

The statistics:

print(paste0(
  "Number of rules: ",length(rmCBA@rules), ", ",
  "Total conditions: ",sum(rmCBA@rules@lhs@data), ", ", 
  "Accuracy: ", round(CBARuleModelAccuracy(predictionBASE, y_true),2)))

Returns:

  Number of rules: 4, Total conditions: 4, Accuracy: 0.92

Postprocessing CBA model with QCBA

Learn a QCBA model.

library(qCBA)
rmQCBA <- qcba(cbaRuleModel=rmCBA,datadf=trainFold)
predictionQCBA <- predict(rmQCBA,testFold)
print(rmQCBA@rules)

The model:

                                                              rules support confidence condition_count orderedConf orderedSupp
1                         {Petal.Length=[-Inf;1.9]} => {Species=setosa}    0.32       1.00               1   1.0000000          32
2 {Petal.Length=[-Inf;5.5],Petal.Width=[1;1.6]} => {Species=versicolor}    0.29       1.00               2   1.0000000          29
3                                             {} => {Species=virginica}    0.37       0.37               0   0.9487179          37

The statistics:

 print(paste0(
 "Number of rules: ",nrow(rmQCBA$rules), ", ",
 "Total conditions: ",sum(rmQCBA@rules$condition_count), ", ", 
 "Accuracy: ", round(CBARuleModelAccuracy(predictionQCBA, y_true),2)))

Returns:

Number of rules: 3, Total conditions: 3, Accuracy:  0.96

Effect of QCBA:

Improved accuracy from 0.92 to 0.96
Reduced number of rules from 4 to 3
Reduced total conditions in the rules from 4 to 3

Postprocessing other rule models with QCBA

The QCBA package is compatible also with other software generating rule models. This example shows how it can be used to postprocess models generated by arulesCBA package, which offers a range of rule models including CPAR, CMAR, FOIL2, and PRM.

When using arulesCBA, it is required to perform discretization of data externally and pass the generated cutpoints to QCBA:

set.seed(54)
library(arulesCBA)
allData <- datasets::iris[sample(nrow(datasets::iris)),]
trainFold <- allData[1:100,]
testFold <- allData[101:nrow(datasets::iris),]
classAtt <- "Species"
discrModel <- discrNumeric(trainFold, classAtt)
train_disc <- as.data.frame(lapply(discrModel$Disc.data, as.factor))
cutPoints <- discrModel$cutp
test_disc <- applyCuts(testFold, cutPoints, infinite_bounds=TRUE, labels=TRUE)
y_true <-testFold[[classAtt]]

Learn and evaluate CPAR model:

rmBASE <- CPAR(train_disc, formula=as.formula(paste(classAtt,"~ .")))
predictionBASE <- predict(rmBASE,test_disc) # CPAR (arulesCBA) predict function

Learn and evaluate QCBA model:

# Convert CPAR model to QCBA ecosystem datastructure
baseModel_arc <- arulesCBA2arcCBAModel(rmBASE, cutPoints,  trainFold, classAtt)

rmQCBA <- qcba(cbaRuleModel=baseModel_arc,datadf=trainFold)
predictionQCBA <- predict(rmQCBA,testFold)

Compare CPAR and CPAR+QCBA:

print(paste("CPAR: Number of rules: ",length(rmBASE$rules),", Total conditions:",sum(rmBASE$rules@lhs@data), ", Accuracy: ",round(CBARuleModelAccuracy(predictionBASE, y_true),2)))

print(paste("QCBA+CPAR: Number of rules: ",nrow(rmQCBA@rules),", Total conditions:",sum(rmQCBA@rules$condition_count), ", Accuracy: ",round(CBARuleModelAccuracy(predictionQCBA, y_true),2)))

Returns:

CPAR: Number of rules:  8, total conditions: 10, Accuracy:  0.98
QCBA+CPAR: Number of rules:  3, total conditions: 2, Accuracy:  1

Effect of QCBA:

Improved accuracy from 0.98 to 1.00
Reduced number of rules from 8 to 3
Reduced total conditions in the rules from 10 to 2

Note that QCBA is not generally guaranteed to improve the accuracy of input model or reduce it size.

Documentation

Additional resources

arc package sister project implementing the Classification-based on association rules (CBA) classification algorithm
arcbench sister project for benchmarking QCBA and arc against a number of other rule learning algorithms on a battery of datasets

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
R		R
inst/java		inst/java
java		java
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml		.travis.yml
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
NEWS		NEWS
QCBA.Rproj		QCBA.Rproj
README.md		README.md

kliegr/QCBA

Folders and files

Latest commit

History

Repository files navigation

Quantitative CBA: Small and Comprehensible Association Rule Classification Models

Inputs for QCBA

Output of QCBA

Which postprocessing steps are performed by QCBA?

Installation

Example

Baseline CBA model

Postprocessing CBA model with QCBA

Postprocessing other rule models with QCBA

Documentation

Additional resources

About

Resources

Stars

Watchers

Forks

Languages