#### Recommendation Engine

Given large set of X rules available based on existing reconciliations,build a Y rule recommendation system based on the following

- Y rule is configured using one or more X attributes
- Given many X rules , find out which combination of X keys  occur frequently
- Also determine the which frequency combinations are unexpected

**Input** : Rec Type and Schema of X attributes

**Output**:Probable list of Y rules 


---

#### Association rules with antecedent and  consequent

To identify and evaluvate association rules, three things are key

- How frequent itemsets or consequent and antecedent purchased together : **support(Prevalence)**
- Given purchase of antecedent how likely is the purchase of the consequent : **Confidence(Predictablity)**
- How much likely is this association than we would expect by chance :**Lift(Interest)**

**Note**
- Association rules does not imply causality
- Type 1 errors - accepting false rules
- Type 2 errors  - missing significant rules

---

{A(i)}=> {C(i)}

This rule indicates that based on the history of all the transactions, when Item A is found in a transaction or a basket, there is a strong propensity of the occurrence of Item C within the same transaction.

The set of items on the left-hand side is the **antecedent** of the rule,while the one to the right is the **consequent**.

- The antecedent and consequent of the rule can contain more than one item, like Item A and Item C. 

|Session ID|News|Finance|Entertainment|Sports|Arts|
|----------|----|-------|-------------|------|----|
|1	|1|	1|	0|	0|	0|
|2	|1|	1|	0|	0|	0|
|3	|1|	1|	0|	1|	0|
|4	|0|	0|	0|	0|	1|
|5	|1|	1|	0|	1|	0|
|6	|1|	0|	1|	0|	1|

---
#### Support :
The probability that the antecedent event will occur, is the **support** of the rule.That simply refers to the relative frequency that an itemset appears in transactions. 

**S{itemset}** =  **# txn with item set / # txn**

From the above example:

- Support({News})=5/6=0.83
- Support({News, Finance})=4/6 =0.67
- Support({Sports})=2/6=0.33

The support measure for a rule indicates whether a rule is worth considering. Since the support measure favors the items where there is high occurrence, it uncovers the patterns that are worth taking advantage of and investigating.

In association analysis, a threshold of support is specified to filter out infrequent rules. Any rule that exceeds the support threshold is then considered for further analysis.

---
#### Confidence
The probability that a transaction that contains the items on the left hand side of the rule also contains the item on the right hand side.The higher the confidence, the greater the likelihood that the item on the right hand side

**C({A=>C})  = **# txn with both(A&C ) / # txn with A**

In the case of the rule {News, Finance}→{Sports}, the question that the confidence measure answers is, if a transaction has both News and Finance, what is the likelihood of seeing Sports in it?

Confidence({News, Finance,Sports}) = support({News, Finance,Sports})/support({News, Finance})

support({{News, Finance,Sports}) =2/6 =0.33

Support({News, Finance})= 4/6=0.67

Confidence({News, Finance,Sports}) =0.33/0.67 =0.5

##### What  does confidence means in this case  ? Half of the transactions that contain News and Finance also contain Sports. This means that 50% of the users who visit the news and finance pages also visit the sports pages.

---
#### Lift

**Need for Lift :**

Though confidence of the rule is widely used, the frequency of occurrence of a rule consequent (conclusion) is largely ignored. In some transaction itemsets, this can provide spurious scrupulous rule sets because of the presence of infrequent items in the rule consequent.

To solve this, the support of a consequent can be put in the denominator of a confidence calculation. This measure is called the lift of the rule.

Lift is the ratio of the observed support of {News and Finance} and {Sports} with what is expected if {News and Finance} and {Sports} usage were completely independent. 


**L({A=>C}) = S({A=>C}) /S(C).S(A)**

Lift({News, Finance} -> Sports) = support({News, Finance,Sports})/( support({News, Finance}) * support({Sports})

Lift({News, Finance} -> Sports)  =0.33/(0.67 * 0.33) =1.5


**Few things to note on lift:** 

- A lift greater than 1 suggests that the presence of the antecedent increases the chances that the consequent will occur in a given transaction

- Lift below 1 indicates that purchasing the antecedent reduces the chances of purchasing the consequent in the same transaction.
**Note**: This could indicate that the items are seen by customers as alternatives to each other
 
- When the lift is 1, then purchasing the antecedent makes no difference on the chances of purchasing the consequent

---

#### Conviction
The conviction of the rule X→Y is the ratio of the expected frequency of X occurring in spite of Y and the observed frequency of incorrect predictions.Conviction takes into account the direction of the rule. The conviction of (X→Y) is not the same as the conviction of (Y→X). The conviction of a rule (X→Y) can be calculated by

Conviction(X→Y) = 1 -Support(Y) / 1-Confidence(X→Y)

Conviction({News,Finance}-> {Sports}) =  1- 0.33 /1-0.5 = 1.32

A conviction of 1.32 means that the rule ({News, Finance}→{Sports}) would be incorrect 32% more often if the relationship between {News, Finance} and {Sports} is purely random.

#### Mining Association Rules

- Step 1: Prepare the data in transaction format. An association algorithm needs input data to be formatted in transaction format tx={i1, i2, i3}.

- Step 2: Short-list frequently occurring itemsets. Itemsets are combinations of items. An association algorithm limits the analysis to the most frequently occurring items, so that the final rule set extracted in the next step is more meaningful.

- Step 3: Generate relevant association rules from itemsets. Finally, the algorithm generates and filters the rules based on the interest measure.

---
#### Apriori Algorithm
The Apriori principles states that “If an itemset is frequent, then all its subset items will be frequent.” The itemset is “frequent” if the support for the itemset is more than that of the support threshold.




####  References
- https://blogs.gartner.com/martin-kihn/how-to-build-a-recommender-system-in-python/

- Chapter06 Data Science, 2nd Edition by Bala Deshpande, Vijay Kotu
