# Apriori Assossiation

*"Apriori is an algorithm for frequent item set mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database"* [Wikipedia](https://en.wikipedia.org/wiki/Apriori_algorithm)

The Apriori algorithm is very simple to implement and the associations created are clearly explainable. There are 3 main components part of it:

**Support**

```python
support(B) = (transactions containing (B))/(total transactions)
```

**Confidence**

```python
confidence(A→B) = (transactions containing both (A and B))/(transactions containing A)
```

**Lift**
```python
lift(A→B) = (confidence (A→B))/(support (B))
```

## Getting start

In [1]:
# installig apriori
# ! pip install apyori

In [2]:
# importing packages & libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori

**Exploring the Data**

In [3]:
# reading data 
# data source: https://drive.google.com/file/d/1y5DYn0dGoSbC22xowBq2d4po6h1JxcTQ/view
data = pd.read_csv('../data/store_data.csv', header=None)

In [4]:
# checking first rows
data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7501 entries, 0 to 7500
Data columns (total 20 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       7501 non-null   object
 1   1       5747 non-null   object
 2   2       4389 non-null   object
 3   3       3345 non-null   object
 4   4       2529 non-null   object
 5   5       1864 non-null   object
 6   6       1369 non-null   object
 7   7       981 non-null    object
 8   8       654 non-null    object
 9   9       395 non-null    object
 10  10      256 non-null    object
 11  11      154 non-null    object
 12  12      87 non-null     object
 13  13      47 non-null     object
 14  14      25 non-null     object
 15  15      8 non-null      object
 16  16      4 non-null      object
 17  17      4 non-null      object
 18  18      3 non-null      object
 19  19      1 non-null      object
dtypes: object(20)
memory usage: 1.1+ MB


**Data Processing**

In [6]:
# converting df to a list of lists
records = []
for i in range(0, 7501):
    records.append([str(data.values[i,j]) for j in range(0, 20)])

In [7]:
# checking list
records[:1]

[['shrimp',
  'almonds',
  'avocado',
  'vegetables mix',
  'green grapes',
  'whole weat flour',
  'yams',
  'cottage cheese',
  'energy drink',
  'tomato juice',
  'low fat yogurt',
  'green tea',
  'honey',
  'salad',
  'mineral water',
  'salmon',
  'antioxydant juice',
  'frozen smoothie',
  'spinach',
  'olive oil']]

**Modelling**

In [8]:
# creating model
rules = apriori(records, min_support=0.0045, min_confidence=0.2, min_lift=3, min_length=2)
results = list(rules)

In [9]:
# priting results
results[:1]

[RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)])]

In [10]:
# making it more readable
for item in results:
    assoc = item[0] 
    items = [x for x in assoc]
    print('Rule: ' + items[0] + ' & ' + items[1])
    print('Support: ' + str(item[1]))
    print('Confidence: ' + str(item[2][0][2]))
    print('Lift: ' + str(item[2][0][3]),'\n')

Rule: chicken & light cream
Support: 0.004532728969470737
Confidence: 0.29059829059829057
Lift: 4.84395061728395 

Rule: mushroom cream sauce & escalope
Support: 0.005732568990801226
Confidence: 0.3006993006993007
Lift: 3.790832696715049 

Rule: pasta & escalope
Support: 0.005865884548726837
Confidence: 0.3728813559322034
Lift: 4.700811850163794 

Rule: herb & pepper & ground beef
Support: 0.015997866951073192
Confidence: 0.3234501347708895
Lift: 3.2919938411349285 

Rule: ground beef & tomato sauce
Support: 0.005332622317024397
Confidence: 0.3773584905660377
Lift: 3.840659481324083 

Rule: whole wheat pasta & olive oil
Support: 0.007998933475536596
Confidence: 0.2714932126696833
Lift: 4.122410097642296 

Rule: pasta & shrimp
Support: 0.005065991201173177
Confidence: 0.3220338983050847
Lift: 4.506672147735896 

Rule: nan & chicken
Support: 0.004532728969470737
Confidence: 0.29059829059829057
Lift: 4.84395061728395 

Rule: chocolate & shrimp
Support: 0.005332622317024397
Confidence: 0.2

Learning with [Stack Abuse](https://stackabuse.com/association-rule-mining-via-apriori-algorithm-in-python/).