FOLD-RM

The implementation details of FOLD-RM algorithm and how to use it are described here. The target of FOLD-RM algorithm is to learn an answer set program for a classification task. Answer set programs are logic programs that permit negation of predicates and follow the stable model semantics for interpretation. The rules generated are essentially default rules. Default rules (with exceptions) closely model human thinking.

Installation

Only function library:


python3 -m pip install foldrm

With the dataset examples:


git clone https://github.com/hwd404/FOLD-RM.git

Prerequisites

The FOLD-RM algorithm is developed with only python3. Numpy is the only dependency:


python3 -m pip install numpy

Instruction

Data preparation

The FOLD-RM algorithm takes tabular data as input, the first line for the tabular data should be the feature names of each column. The FOLD-RM algorithm does not have to encode the data for training. It can deal with numerical, categorical, and even mixed type features (one column contains both categorical and numerical values) directly. However, the numerical features should be identified before loading the data, otherwise they would be dealt like categorical features (only literals with = and != would be generated).

There are many UCI example datasets that have been used to pre-populate the data directory. Code for preparing these datasets has already been added to datasets.py.

For example, the UCI wine dataset can be loaded with the following code:


attrs = ['alcohol','malic_acid','ash','alcalinity_of_ash','magnesium','tot_phenols','flavanoids',
'nonflavanoid_phenols','proanthocyanins','color_intensity','hue','OD_of_diluted','proline']
model = Classifier(attrs=attrs, numeric=attrs, label='label')
data = model.load_data('data/wine/wine.csv')
print('\n% wine dataset', np.shape(data))
return model, data

attrs lists all the features needed, nums lists all the numerical features, label is the name of the output classification label, model is an initialized classifier object with the configuration of wine dataset.

Training

The FOLD-RM algorithm generates an explainable model that is represented by an answer set program for classification tasks. Here's a training example for wine dataset:


model.fit(X_train, Y_train, ratio=0.9)

Note that the hyperparameter ratio in fit function can be set by the user, and ranges between 0 and 1. Default value is 0.5. This hyperparameter represents the ratio of training examples that are part of the exception to the examples implied by only the default conclusion part of the rule. We recommend that the user experiment with this hyperparameter by trying different values to produce a ruleset with the best F1 score. A range between 0.5 and 0.9 is recommended for experimentation.

The rules generated by foldrpp will be stored in the model object. These rules are organized in a nested intermediate representation. The nested rules will be automatically flattened and decoded to conform to the syntax of answer set programs by calling print_asp function:


model.print_asp()

An answer set program, compatible with the s(CASP) answer set programming system, is printed as shown below. The s(CASP) system is a system for direclty executing predicate answer set programs in a query-driven manner.


% wine dataset (178, 14)
label(X,'1') :- rule2(X), not rule1(X). 
label(X,'2') :- rule1(X). 
label(X,'2') :- rule4(X), not rule1(X), not rule2(X), not rule3(X). 
label(X,'2') :- rule5(X), not rule1(X), not rule2(X), not rule3(X), not rule4(X). 
label(X,'3') :- rule3(X), not rule1(X), not rule2(X). 
rule1(X) :- color_intensity(X,N9), N9=<3.4. 
rule2(X) :- flavanoids(X,N6), N6>2.03, not ab1(X). 
rule3(X) :- flavanoids(X,N6), N6=<1.57, not ab2(X). 
rule4(X) :- alcohol(X,N0), N0>11.56. 
rule5(X) :- alcohol(X,N0), N0=<11.56. 
ab1(X) :- proline(X,N12), N12=<678.0. 
ab2(X) :- hue(X,N10), N10>0.96. 
% acc 0.9722

Testing in Python

Given X_test, a list of test data samples, the Python predict function will predict the classification outcome for each of these data samples.


Y_test_hat = model.predict(X_test)

The classify function can also be used to classify a single data sample.


y_test_hat = model.classify(x_test)

Explanation

FOLD-RM provides simple format justification and rebuttal for predictions with explain function.


model.explain(X_test[i])

Here is an example for a instance from wine dataset. The generated answer set program is :


% wine dataset (178, 14)
label(X,'1') :- rule2(X), not rule1(X). 
label(X,'2') :- rule1(X). 
label(X,'2') :- rule4(X), not rule1(X), not rule2(X), not rule3(X). 
label(X,'2') :- rule5(X), not rule1(X), not rule2(X), not rule3(X), not rule4(X). 
label(X,'3') :- rule3(X), not rule1(X), not rule2(X). 
rule1(X) :- color_intensity(X,N9), N9=<3.4. 
rule2(X) :- flavanoids(X,N6), N6>2.03, not ab1(X). 
rule3(X) :- flavanoids(X,N6), N6=<1.57, not ab2(X). 
rule4(X) :- alcohol(X,N0), N0>11.56. 
rule5(X) :- alcohol(X,N0), N0=<11.56. 
ab1(X) :- proline(X,N12), N12=<678.0. 
ab2(X) :- hue(X,N10), N10>0.96. 
% acc 0.9722

And the generated justification for an instance prediction:


Explanation for example number 1 :
[F]ab1(X) :- [F]proline(X,N12), N12=<678.0. 
[T]rule2(X) :- [T]flavanoids(X,N6), N6>2.03, not [F]ab1(X). 
[F]rule1(X) :- [F]color_intensity(X,N9), N9=<3.4. 
[T]label(X,'1') :- [T]rule2(X), not [F]rule1(X). 
{'color_intensity: 6.38', 'proline: 970.0', 'flavanoids: 3.0'}

In the generated answers, each literal has been tagged with a label. [T] means True, [F] means False, and [U] means unnecessary to evaluate. And the smallest set of features of the instance is listed for each answer.

Citation


@misc{wang2022foldrm,
      title={FOLD-RM: A Scalable and Efficient Inductive Learning Algorithm for Multi-Category Classification of Mixed Data}, 
      author={Huaduo Wang and Farhad Shakerin and Gopal Gupta},
      year={2022},
      eprint={2202.06913},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}


@misc{wang2021foldr,
      title={FOLD-R++: A Scalable Toolset for Automated Inductive Learning of Default Theories from Mixed Data}, 
      author={Huaduo Wang and Gopal Gupta},
      year={2021},
      eprint={2110.07843},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
paper		paper
LICENSE		LICENSE
README.md		README.md
algo.py		algo.py
datasets.py		datasets.py
foldrm.py		foldrm.py
main.py		main.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

paper

paper

LICENSE

LICENSE

README.md

README.md

algo.py

algo.py

datasets.py

datasets.py

foldrm.py

foldrm.py

main.py

main.py

utils.py

utils.py

Repository files navigation

FOLD-RM

Installation

Prerequisites

Instruction

Data preparation

Training

Testing in Python

Explanation

Citation

About

Releases

Packages

Languages

License

hwd404/FOLD-RM

Folders and files

Latest commit

History

Repository files navigation

FOLD-RM

Installation

Prerequisites

Instruction

Data preparation

Training

Testing in Python

Explanation

Citation

About

Resources

License

Stars

Watchers

Forks

Languages