# How to use the R package `arules` from Python using `arulespy`

This document is also avaialble as a IPython notebook: https://github.com/mhahsler/arulespy/blob/main/examples/arules.ipynb


## Installation 

The package can be installed using pip.

```
pip install arulespy
```

The following may be necessary on Windows to set the 'R_HOME' for `rpy2` correctly:

In [1]:
# from rpy2 import situation
# import os
#
# os.environ['R_HOME'] = situation.r_home_from_registry()
# situation.get_r_home()

## Examples

Import the `arules` module from package `arulespy`.

In [14]:
from arulespy.arules import Transactions, apriori, parameters, concat

### Creating transaction data

The data need to be prepared as a Pandas dataframe. Here we have 9 transactions with three items called A, B and C. True means that a transaction contains the item.

In [3]:
import pandas as pd

df = pd.DataFrame (
    [
        [True,True, True],
        [True, False,False],
        [True, True, True],
        [True, False, False],
        [True, True, True],
        [True, False, True],
        [True, True, True],
        [False, False, True],
        [False, True, True],
        [True, False, True],
    ],
    columns=list ('ABC')) 

df

Unnamed: 0,A,B,C
0,True,True,True
1,True,False,False
2,True,True,True
3,True,False,False
4,True,True,True
5,True,False,True
6,True,True,True
7,False,False,True
8,False,True,True
9,True,False,True


Convert the pandas dataframe into a sparse transactions object.

In [6]:
trans = Transactions.from_df(df)
print(trans)

trans.as_df()

transactions in sparse format with
 10 transactions (rows) and
 3 items (columns)



Unnamed: 0,items,transactionID
1,"{A,B,C}",0
2,{A},1
3,"{A,B,C}",2
4,{A},3
5,"{A,B,C}",4
6,"{A,C}",5
7,"{A,B,C}",6
8,{C},7
9,"{B,C}",8
10,"{A,C}",9


We can calculate item frequencies, sample transactions or remove duplicate transactions. All available functions can be found at the end of this document.

In [7]:
trans.itemFrequency(type = 'relative')

array([0.8, 0.5, 0.8])

In [8]:
trans.sample(3).as_df()

Unnamed: 0,items,transactionID
10,"{A,C}",9
5,"{A,B,C}",4
3,"{A,B,C}",2


In [9]:
trans.unique().as_df()

Unnamed: 0,items,transactionID
1,"{A,B,C}",0
2,{A},1
6,"{A,C}",5
8,{C},7
9,"{B,C}",8


Create new data that uses the same encoding as an existing transaction set. Note that the following dataframe
has the columns (items) in reverse order and `transactions()` copies the itemcoding from `trans`. 

In [10]:
df2 = pd.DataFrame (
    [
        [True,True, False]
    ],
    columns=list ('CBA')) 

df2

Unnamed: 0,C,B,A
0,True,True,False


In [11]:
Transactions.from_df(df2, trans).as_df()

Unnamed: 0,items,transactionID
1,"{B,C}",0


Add the new transaction to the existing transactions.

In [15]:
concat([trans, Transactions.from_df(df2, trans)]).as_df()

Unnamed: 0,items,transactionID
1,"{A,B,C}",0
2,{A},1
3,"{A,B,C}",2
4,{A},3
5,"{A,B,C}",4
6,"{A,C}",5
7,"{A,B,C}",6
8,{C},7
9,"{B,C}",8
10,"{A,C}",9


Converting a dataframe with nominal and numeric variables. The nominal variables are converted into the form `variable=value` and
numeric variables are first discretized (see `arules.discretizeDF()`).

In [16]:
df2 = pd.DataFrame (
    [
        ['red',  12, True],
        ['blue', 10, False],
        ['red',  18, True],
        ['green',18, False],
        ['red',  16, True],
        ['blue',  9, False]
    ],
    columns=list(['color', 'size', 'class'])) 

trans2 = Transactions.from_df(df2)
trans2.as_df()

Unnamed: 0,items,transactionID
1,"{color=red,size=[11.3,16.7),class}",0
2,"{color=blue,size=[9,11.3)}",1
3,"{color=red,size=[16.7,18],class}",2
4,"{color=green,size=[16.7,18]}",3
5,"{color=red,size=[11.3,16.7),class}",4
6,"{color=blue,size=[9,11.3)}",5


Details on item label creation can be retrieved using `arules.itemInfo()`.

In [17]:
trans2.itemInfo()

Unnamed: 0,labels,variables,levels
1,color=blue,color,blue
2,color=green,color,green
3,color=red,color,red
4,"size=[9,11.3)",size,"[9,11.3)"
5,"size=[11.3,16.7)",size,"[11.3,16.7)"
6,"size=[16.7,18]",size,"[16.7,18]"
7,class,class,TRUE


## Mine association rules

`arules.apriori()` calls the apriori algorithm and converts the results into a Python `arulespy.arules.Rules` object. Parameters for the algorithm
are specified as `dict` inside the `arules.parameter()` funcition.

In [18]:
rules = apriori(trans,
                    parameter = parameters({"supp": 0.1, "conf": 0.8}), 
                    control = parameters({"verbose": False}))  


rules.as_df()

Unnamed: 0,LHS,RHS,support,confidence,coverage,lift,count
1,{},{A},0.8,0.8,1.0,1.0,8
2,{},{C},0.8,0.8,1.0,1.0,8
3,{B},{A},0.4,0.8,0.5,1.0,4
4,{B},{C},0.5,1.0,0.5,1.25,5
5,"{A,B}",{C},0.4,1.0,0.4,1.25,4
6,"{B,C}",{A},0.4,0.8,0.5,1.0,4


In [19]:
rules.quality()


Unnamed: 0,support,confidence,coverage,lift,count
1,0.8,0.8,1.0,1.0,8
2,0.8,0.8,1.0,1.0,8
3,0.4,0.8,0.5,1.0,4
4,0.5,1.0,0.5,1.25,5
5,0.4,1.0,0.4,1.25,4
6,0.4,0.8,0.5,1.0,4


Python-style `len()` and slicing is available.

In [20]:
len(rules)

6

In [21]:
rules[0:3].as_df()

Unnamed: 0,LHS,RHS,support,confidence,coverage,lift,count
1,{},{A},0.8,0.8,1.0,1.0,8
2,{},{C},0.8,0.8,1.0,1.0,8
3,{B},{A},0.4,0.8,0.5,1.0,4


In [22]:
rules[[True, False, True, False, True, False]].as_df()

Unnamed: 0,LHS,RHS,support,confidence,coverage,lift,count
1,{},{A},0.8,0.8,1.0,1.0,8
3,{B},{A},0.4,0.8,0.5,1.0,4
5,"{A,B}",{C},0.4,1.0,0.4,1.25,4


## Accessing Rules

rules can be converted into various Python data structures. 

In [23]:
rules.labels()

['{} => {A}',
 '{} => {C}',
 '{B} => {A}',
 '{B} => {C}',
 '{A,B} => {C}',
 '{B,C} => {A}']

In [24]:
rules.items().as_df()

Unnamed: 0,items
1,{A}
2,{C}
3,"{A,B}"
4,"{B,C}"
5,"{A,B,C}"
6,"{A,B,C}"


In [25]:
rules.lhs().as_df()

Unnamed: 0,items
1,{}
2,{}
3,{B}
4,{B}
5,"{A,B}"
6,"{B,C}"


In [30]:
rules.lhs().as_list()

[[], [], ['B'], ['B'], ['A', 'B'], ['B', 'C']]

In [29]:
rules.rhs().as_df()

Unnamed: 0,items
1,{A}
2,{C}
3,{A}
4,{C}
5,{C}
6,{A}


In [31]:
rules.sort(by = 'lift').as_df()

Unnamed: 0,LHS,RHS,support,confidence,coverage,lift,count
4,{B},{C},0.5,1.0,0.5,1.25,5
5,"{A,B}",{C},0.4,1.0,0.4,1.25,4
1,{},{A},0.8,0.8,1.0,1.0,8
2,{},{C},0.8,0.8,1.0,1.0,8
3,{B},{A},0.4,0.8,0.5,1.0,4
6,"{B,C}",{A},0.4,0.8,0.5,1.0,4


## Work With Interest Measures

Interest measures are stored as the quality attribute in rules and itemsets.

In [32]:
rules.quality()

Unnamed: 0,support,confidence,coverage,lift,count
1,0.8,0.8,1.0,1.0,8
2,0.8,0.8,1.0,1.0,8
3,0.4,0.8,0.5,1.0,4
4,0.5,1.0,0.5,1.25,5
5,0.4,1.0,0.4,1.25,4
6,0.4,0.8,0.5,1.0,4


Additional interest measures can be calculated with `interestMeasure()` and added to rules or itemsets using `addQuality()`. See all [available meassures](https://mhahsler.github.io/arules/docs/measures). To calculate some measures, transactions need to
be specified.

In [33]:
im = rules.interestMeasure(["phi", 'support'])
im

Unnamed: 0,phi,support
1,,0.8
2,,0.8
3,0.0,0.4
4,0.5,0.5
5,0.408248,0.4
6,0.0,0.4


In [34]:
rules.addQuality(im)
rules.as_df()

Unnamed: 0,LHS,RHS,support,confidence,coverage,lift,count,phi
1,{},{A},0.8,0.8,1.0,1.0,8,
2,{},{C},0.8,0.8,1.0,1.0,8,
3,{B},{A},0.4,0.8,0.5,1.0,4,0.0
4,{B},{C},0.5,1.0,0.5,1.25,5,0.5
5,"{A,B}",{C},0.4,1.0,0.4,1.25,4,0.408248
6,"{B,C}",{A},0.4,0.8,0.5,1.0,4,0.0


## Filter Redundant Rules

In [35]:
rules[[not x for x in rules.is_redundant()]].as_df()

Unnamed: 0,LHS,RHS,support,confidence,coverage,lift,count,phi
1,{},{A},0.8,0.8,1.0,1.0,8,
2,{},{C},0.8,0.8,1.0,1.0,8,
4,{B},{C},0.5,1.0,0.5,1.25,5,0.5


In [36]:
rules.is_redundant()


[False, False, True, False, True, True]

Find maximal rules.

In [None]:
rules.is_maximal()

[False, False, False, False, True, True]

## Help for Functions Available via arulespy

In [None]:
help(arules.apriori)

Help on function wrapper in module arulespy.arules:

wrapper(*args, **kwargs)
    Wrapper around an R function.
    
    The docstring below is built from the R documentation.
    
    description
    -----------
    
    
     Mine frequent itemsets, association rules or association hyperedges using
     the Apriori algorithm.
     
    
    
    apriori(
        data,
        parameter = rinterface.NULL,
        appearance = rinterface.NULL,
        control = rinterface.NULL,
        ___ = (was "..."). R ellipsis (any number of parameters),
    )
    
    Args:
       data :  object of class transactions. Any data structure which can be
      coerced into transactions (e.g., a binary matrix, a
      data.frame or a tibble) can also be specified and will be
      internally coerced to transactions.
    
       parameter :  object of class APparameter or named list.  The default
      behavior is to mine rules with minimum support of 0.1,
      minimum confidence of 0.8, maximum of 10 

## Low-level R arules interface

arules functions can also be directly called using
`arules.r.<arules R function>()`. The result will be a `rpy2` data type.
Transactions, itemsets and rules can manually be converted to Python
classes using.

In [43]:
from arulespy.arules import R, Itemsets

In [40]:
help(R.random_patterns)

Help on DocumentedSTFunction in module rpy2.robjects.functions:

<rpy2.robjects.functions.DocumentedSTFunction ob...31607c0> [RTYPES.CLOSXP]
R classes: ('function',)
    Wrapper around an R function.
    
    The docstring below is built from the R documentation.
    
    description
    -----------
    
    
     Simulate random  transactions  using different methods.
     
    
    
    random.patterns(
        nItems,
        nPats = 2000.0,
        method = rinterface.NULL,
        lPats = 4.0,
        corr = 0.5,
        cmean = 0.5,
        cvar = 0.1,
        iWeight = rinterface.NULL,
        verbose = False,
    )
    
    Args:
       nItems :  an integer. Number of items to simulate
    
       nTrans :  an integer. Number of transactions to simulate
    
       method :  name of the simulation method used (see Details Section).
    
       ... :  further arguments used for the specific simulation method
      (see details).
    
       verbose :  report progress?
    
     

In [41]:
its_r = R.random_patterns(100, 10)
its_r

<rpy2.robjects.methods.RS4 object at 0x7f69c3099380> [RTYPES.S4SXP]
R classes: ('itemsets',)

Since we directly called a R function, we need to manually wrap the R object as a Python object before we use it in Python.

In [47]:
its_p = Itemsets(its_r)
its_p.as_df()

Unnamed: 0,items,pWeights,pCorrupts
1,"{item13,item19,item23,item48,item73}",0.054325,0.402363
2,"{item8,item25,item35,item73}",0.034892,0.122777
3,"{item23,item24,item28,item48,item66}",0.104815,0.402232
4,"{item46,item86}",0.083704,0.170334
5,"{item47,item55,item63,item98}",0.211702,0.230507
6,"{item63,item72}",0.066091,0.0
7,"{item15,item37,item60,item63,item72,item75}",0.369357,0.562903
8,"{item15,item37,item63,item72,item75}",0.018943,0.071589
9,"{item15,item37,item64,item72,item97}",0.041531,0.806645
10,{item72},0.01464,0.627891


## Create Rules Objects

To import rules from other tools or to create rules manually, rules for `arules` can be created from lists 
of sets of items. The item labels (i.e., the sparse representation) is
taken from the transactions `trans`.

In [48]:
import rpy2.robjects as ro
from arulespy.arules import Rules, set, encode

new_rule_lhs = [
    set(['hair', 'milk', 'predator']),
    set(['hair', 'tail', 'predator']),
    set(['fins'])
]
new_rule_rhs = [
    set(['type=mammal']),
    set(['type=mammal']),
    set(['type=fish'])
]
                          
lhs = encode(new_rule_lhs, itemLabels = trans)
rhs = encode(new_rule_rhs, itemLabels = trans)

r = Rules.new(lhs, rhs)
r.as_df()

Unnamed: 0,LHS,RHS
1,{},{}
2,{},{}
3,{},{}


0
'dfdfd'


Next, we add interest measures calculated on the transactions.

In [46]:
r.addQuality(r.interestMeasure(['support', 'confidence', 'lift'], trans))
r.as_df()

Unnamed: 0,LHS,RHS,support,confidence,lift
1,{},{},1.0,1.0,1.0
2,{},{},1.0,1.0,1.0
3,{},{},1.0,1.0,1.0
