Association Rule Classification (arc)
This package for R implements the Classification based on Associations algorithm (CBA):
Liu, B. Hsu, W. and Ma, Y (1998). Integrating Classification and Association Rule Mining. Proceedings KDD-98, New York, 27-31 August. AAAI. pp 80-86.
The arules package is used for the rule generation step.
The package is also availabe in R CRAN repository as Association Rule Classification (arc) package.
Features
- Pure R implementation*
- Supports numerical predictor attributes (via parameter-free supervised discretization)
- Supports numerical target attribute (k-means)
- No meta parameters with automatic tuning of support and confidence thresholds (optional)**
NOTES: * Requires the arules package for the rule generation step ** There are some metaparameters for automatic tuning, but in most cases the default values are OK.
Other use cases
prunefunction can be used to reduce the size of a rule set learnt by the apriori function from the arules packagetopRulesfunction can be used as a wrapper for apriori allowing to mine for a user specified number of rules.
Installation
The package can be installed directly from CRAN using the following command executed from the R environment:
install.packages("arc")Development version can be installed from github from the R environment using the devtools package.
devtools::install_github("kliegr/arc")Examples
Complete classification workflow
library(arc)
# note that the iris dataset contains numerical features
data(iris)
train <- iris[1:100,]
test <- iris[101:length(iris),]
classatt <- "Species"
# learn the classifier
rm <- cba(train, classatt)
prediction <- predict(rm, test)
acc <- CBARuleModelAccuracy(prediction, test[[classatt]])
print(acc)Prune rules
This shows how to apply arc data coverage pruning to reduce the size of the rule set. A prerequisite is a rule learning task with one attribute on the right hand side.
library(arc)
data(Adult)
classitems <- c("income=small","income=large")
rules <- apriori(Adult, parameter = list(supp = 0.05, conf = 0.5, target = "rules"), appearance=list(rhs=classitems, default="lhs"))
# now we have 1266 rules
pr_rules <- prune(rules,Adult,classitems)
# only 174 after pruningAdditional reduction of the size of the rule set can be achieved by setting greedy_pruning=TRUE.
pr_rules <- prune(rules, Adult, classitems, greedy_pruning=TRUE)
# produces 141 rulesPruning by default consists of two steps, data coverage pruning and default rule pruning, which replaces part of the rules surviving data coverage pruning with a new default rule (rule with empty LHS). Default rule pruning can be turned off:
pr_rules <- prune(rules, Adult, classitems, default_rule_pruning = FALSE)
# produces 198 rulesMine predefined number of rules with apriori
The arules documentation gives the following example:
data("Adult")
## Mine association rules.
rules <- apriori(Adult, parameter = list(supp = 0.5, conf = 0.9, target = "rules"))
summary(rules)This returns 52 rules. The default value for the minlen and maxlen parameters unspecified by the user was 1 and 10. Assuming that the user wishes to obtain 100 rules, this can be achieved with the arc package as follows:
data("Adult")
rules <- topRules(Adult, target_rule_count = 100, init_support = 0.5, init_conf = 0.9, minlen = 1, init_maxlen = 10)
summary(rules)This will return 100 rules. The mechanics behind are iterative step-wise changes to the initial values of the provided thresholds. In this case, there will be nine iterations, the minimum confidence threshold will be lowered to 0.65 and the final rule set will be trimmed.
Performance optimization
Rule learning
- In order to keep the number of iterations and thus run time low, it might be a good idea to set the
init_maxlenparameter to a low value:
data("Adult")
rules <- topRules(Adult, target_rule_count = 100, init_support = 0.5, init_conf = 0.9, minlen = 1, init_maxlen = 2)
summary(rules)Rule pruning
- Experiment with the value of the
rule_windowparameter. This has no effect on the quality of the classifier. - Set
greedy_pruningto TRUE. This will have generally adverse impact on the quality of the classifier, but it will decrease the size of the rule set and reduce the time required for pruning. Greedy pruning is not part of the CBA algorithm as published by Liu et al.