# How to use the R package `arules` from Python using `arulespy`

## Installation 

The package can be installed using pip.

```
pip install arulespy
```

## Examples

Import the `arules` module from package `arulespy`.

In [34]:
from arulespy import arules

### Creating transaction data

The data need to be prepared as a Pandas dataframe. Here we have 9 transactions with three items called A, B and C. True means that a transaction contains the item.

In [35]:
import pandas as pd

df = pd.DataFrame (
    [
        [True,True, True],
        [True, False,False],
        [True, True, True],
        [True, False, False],
        [True, True, True],
        [True, False, True],
        [True, True, True],
        [False, False, True],
        [False, True, True],
        [True, False, True],
    ],
    columns=list ('ABC')) 

df

Unnamed: 0,A,B,C
0,True,True,True
1,True,False,False
2,True,True,True
3,True,False,False
4,True,True,True
5,True,False,True
6,True,True,True
7,False,False,True
8,False,True,True
9,True,False,True


Convert the pandas dataframe into a sparse transactions object.

In [36]:
trans = arules.transactions(df)
print(trans)

trans.as_df()

transactions in sparse format with
 10 transactions (rows) and
 3 items (columns)



Unnamed: 0,items,transactionID
1,"{A,B,C}",0
2,{A},1
3,"{A,B,C}",2
4,{A},3
5,"{A,B,C}",4
6,"{A,C}",5
7,"{A,B,C}",6
8,{C},7
9,"{B,C}",8
10,"{A,C}",9


We can calculate item frequencies, sample transactions or remove duplicate transactions. All available functions can be found at the end of this document.

In [37]:
arules.itemFrequency(trans)

[0.8, 0.5, 0.8]

In [38]:
arules.sample(trans, 3).as_df()

Unnamed: 0,items,transactionID
10,"{A,C}",9
8,{C},7
4,{A},3


In [39]:
arules.unique(trans).as_df()

Unnamed: 0,items,transactionID
1,"{A,B,C}",0
2,{A},1
6,"{A,C}",5
8,{C},7
9,"{B,C}",8


Converting a dataframe with nominal and numeric variables. The nominal variables are converted into the form `variable=value` and
numeric variables are first discretized (see `arules.discretizeDF()`).

In [40]:
df2 = pd.DataFrame (
    [
        ['red',  12, True],
        ['blue', 10, False],
        ['red',  18, True],
        ['green',18, False],
        ['red',  16, True],
        ['blue',  9, False]
    ],
    columns=list(['color', 'size', 'class'])) 

trans2 = arules.transactions(df2)
trans2.as_df()

Unnamed: 0,items,transactionID
1,"{color=red,size=[11.3,16.7),class}",0
2,"{color=blue,size=[9,11.3)}",1
3,"{color=red,size=[16.7,18],class}",2
4,"{color=green,size=[16.7,18]}",3
5,"{color=red,size=[11.3,16.7),class}",4
6,"{color=blue,size=[9,11.3)}",5


## Mine association rules

`arules.apriori()` calls the apriori algorithm and converts the results into a Python `arulespy.arules.Rules` object. Parameters for the algorithm
are specified as `dict` inside the `arules.parameter()` funcition.

In [41]:
rules = arules.apriori(trans,
                    parameter = arules.parameters({"supp": 0.1, "conf": 0.8}), 
                    control = arules.parameters({"verbose": False}))  

print(rules)

print(type(rules))

rules.as_df()

set of 6 rules 

<class 'arulespy.arules.Rules'>


Unnamed: 0,LHS,RHS,support,confidence,coverage,lift,count
1,{},{A},0.8,0.8,1.0,1.0,8
2,{},{C},0.8,0.8,1.0,1.0,8
3,{B},{A},0.4,0.8,0.5,1.0,4
4,{B},{C},0.5,1.0,0.5,1.25,5
5,"{A,B}",{C},0.4,1.0,0.4,1.25,4
6,"{B,C}",{A},0.4,0.8,0.5,1.0,4


Python-style `len()` and slicing is available.

In [42]:
len(rules)

6

In [43]:
rules[0:3].as_df()

Unnamed: 0,LHS,RHS,support,confidence,coverage,lift,count
1,{},{A},0.8,0.8,1.0,1.0,8
2,{},{C},0.8,0.8,1.0,1.0,8
3,{B},{A},0.4,0.8,0.5,1.0,4


In [44]:
rules[[True, False, True, False, True, False]].as_df()

Unnamed: 0,LHS,RHS,support,confidence,coverage,lift,count
1,{},{A},0.8,0.8,1.0,1.0,8
3,{B},{A},0.4,0.8,0.5,1.0,4
5,"{A,B}",{C},0.4,1.0,0.4,1.25,4


## Accessing Rules

rules can be converted into various Python data structures. 

In [45]:
arules.labels(rules)

['{} => {A}',
 '{} => {C}',
 '{B} => {A}',
 '{B} => {C}',
 '{A,B} => {C}',
 '{B,C} => {A}']

In [46]:
arules.quality(rules)

Unnamed: 0,support,confidence,coverage,lift,count
1,0.8,0.8,1.0,1.0,8
2,0.8,0.8,1.0,1.0,8
3,0.4,0.8,0.5,1.0,4
4,0.5,1.0,0.5,1.25,5
5,0.4,1.0,0.4,1.25,4
6,0.4,0.8,0.5,1.0,4


In [47]:
arules.items(rules).as_df()

Unnamed: 0,items
1,{A}
2,{C}
3,"{A,B}"
4,"{B,C}"
5,"{A,B,C}"
6,"{A,B,C}"


In [48]:
arules.lhs(rules).as_df()

Unnamed: 0,items
1,{}
2,{}
3,{B}
4,{B}
5,"{A,B}"
6,"{B,C}"


In [49]:
arules.rhs(rules).as_df()

Unnamed: 0,items
1,{A}
2,{C}
3,{A}
4,{C}
5,{C}
6,{A}


In [50]:
arules.sort(rules, by = 'lift').as_df()

Unnamed: 0,LHS,RHS,support,confidence,coverage,lift,count
4,{B},{C},0.5,1.0,0.5,1.25,5
5,"{A,B}",{C},0.4,1.0,0.4,1.25,4
1,{},{A},0.8,0.8,1.0,1.0,8
2,{},{C},0.8,0.8,1.0,1.0,8
3,{B},{A},0.4,0.8,0.5,1.0,4
6,"{B,C}",{A},0.4,0.8,0.5,1.0,4


## Calculate Interest Measures 

Additional interest measures can be calculates. See all [available meassures](https://mhahsler.github.io/arules/docs/measures).

In [51]:
arules.interestMeasure(rules, ["lift", "confidence"], transactions = trans)

Unnamed: 0,lift,confidence
1,1.0,0.8
2,1.0,0.8
3,1.0,0.8
4,1.25,1.0
5,1.25,1.0
6,1.0,0.8


Redundant and maximal rules.

In [52]:
rules[[not x for x in arules.is_redundant(rules)]].as_df()

Unnamed: 0,LHS,RHS,support,confidence,coverage,lift,count
1,{},{A},0.8,0.8,1.0,1.0,8
2,{},{C},0.8,0.8,1.0,1.0,8
4,{B},{C},0.5,1.0,0.5,1.25,5


In [53]:
arules.is_maximal(rules)

[False, False, False, False, True, True]

## Help for Functions Available via arulespy

In [54]:
help(arules)

Help on module arulespy.arules in arulespy:

NAME
    arulespy.arules - The arules module provides an interface to R's arules package.

CLASSES
    rpy2.robjects.methods.RS4(rpy2.robjects.robject.RObjectMixin, rpy2.rinterface.SexpS4)
        ItemMatrix
            Itemsets
            Rules
            Transactions
    
    class ItemMatrix(rpy2.robjects.methods.RS4)
     |  ItemMatrix(sexp: Union[rpy2.rinterface_lib._rinterface_capi.SupportsSEXP, ForwardRef('_rinterface.SexpCapsule'), ForwardRef('_rinterface.UninitializedRCapsule')])
     |  
     |  Class for arules itemMatrix object
     |  
     |  Method resolution order:
     |      ItemMatrix
     |      rpy2.robjects.methods.RS4
     |      rpy2.robjects.robject.RObjectMixin
     |      abc.ABC
     |      rpy2.rinterface.SexpS4
     |      rpy2.rinterface_lib.sexp.Sexp
     |      rpy2.rinterface_lib._rinterface_capi.SupportsSEXP
     |      builtins.object
     |  
     |  Methods defined here:
     |  
     |  __getitem__(se



## Low-level R arules interface

arules functions can also be directly called using
`arules.r.<arules R function>()`. The result will be a `rpy2` data type.
Transactions, itemsets and rules can manually be converted to Python
classes using.

In [55]:
help(arules.r.random_patterns)

Help on DocumentedSTFunction in module rpy2.robjects.functions:

<rpy2.robjects.functions.DocumentedSTFunction ob...e5a0600> [RTYPES.CLOSXP]
R classes: ('function',)
    Wrapper around an R function.
    
    The docstring below is built from the R documentation.
    
    description
    -----------
    
    
     Simulate random  transactions  using different methods.
     
    
    
    random.patterns(
        nItems,
        nPats = 2000.0,
        method = rinterface.NULL,
        lPats = 4.0,
        corr = 0.5,
        cmean = 0.5,
        cvar = 0.1,
        iWeight = rinterface.NULL,
        verbose = False,
    )
    
    Args:
       nItems :  an integer. Number of items to simulate
    
       nTrans :  an integer. Number of transactions to simulate
    
       method :  name of the simulation method used (see Details Section).
    
       ... :  further arguments used for the specific simulation method
      (see details).
    
       verbose :  report progress?
    
     

In [56]:
its_r = arules.r.random_patterns(100, 10)
its_r

<rpy2.robjects.methods.RS4 object at 0x7fb519fb8ec0> [RTYPES.S4SXP]
R classes: ('itemsets',)

Since we directly called a R funciton, we need to manually wrap the R object as a Python object before we use it in Python.

In [57]:
its_p = arules.Itemsets(its_r)
its_p.as_df()

Unnamed: 0,items,pWeights,pCorrupts
1,"{item6,item23,item53,item85,item93}",0.007103,0.193144
2,"{item32,item53,item56}",0.025078,1.0
3,{item32},0.302056,0.31288
4,"{item32,item33,item83,item87}",0.080564,0.751472
5,"{item32,item33,item83,item87}",0.069465,0.640368
6,"{item2,item52,item87}",0.135161,0.540517
7,"{item2,item52}",0.070268,0.404173
8,"{item2,item6,item52,item53}",0.037563,0.387126
9,"{item6,item20,item21,item27,item37,item53}",0.154839,0.685114
10,"{item42,item87,item99}",0.117904,0.820294
