Apriori is an algorithm for frequent item set mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.

In [None]:
#you might need to install apyori
!pip install apyori

In [None]:
#importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
#importing dataset
ds = pd.read_csv('../input/groceries-dataset/Groceries_dataset.csv')
ds.head()

In [None]:
#dataset has 38765 rows and 3 columns
ds.shape

In [None]:
#setting index as Date
ds.set_index('Date',inplace = True)

In [None]:
#converting date into a particular format
ds.index=pd.to_datetime(ds.index)

In [None]:
ds.head()

In [None]:
#checking for mising values
ds.isnull().sum()

**NO MISSING VALUES**

In [None]:
#gathering information about products
total_item = len(ds)
total_days = len(np.unique(ds.index.date))
total_months = len(np.unique(ds.index.year))
print(total_item,total_days,total_months)

**Total 38765 items sold in 728 days throughout 24 months**

In [None]:
plt.figure(figsize=(15,5))
sns.barplot(x = ds.itemDescription.value_counts().head(20).index, y = ds.itemDescription.value_counts().head(20).values, palette = 'gnuplot')
plt.xlabel('itemDescription', size = 15)
plt.xticks(rotation=45)
plt.ylabel('Count of Items', size = 15)
plt.title('Top 20 Items purchased by customers', color = 'green', size = 20)
plt.show()

In [None]:
ds['itemDescription'].value_counts()

In [None]:
#grouping dataset to form a list of products bought by same customer on same date
ds=ds.groupby(['Member_number','Date'])['itemDescription'].apply(lambda x: list(x))

In [None]:
ds.head(10)

In [None]:
ds.shape

**NOW THE DASET HAS 14963 ROWS**

In [None]:
#apriori takes list as an input, hence converting dtaset to a list
transactions = ds.values.tolist()
transactions[:10]

In [None]:
#applying apriori
from apyori import apriori
rules = apriori(transactions, min_support=0.00030,min_confidence = 0.05,min_lift = 2,min_length = 2)
results = list(rules)
results

In [None]:
len(results)

In [None]:
def inspect(results):
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    supports    = [result[1] for result in results]
    confidences = [result[2][0][2] for result in results]
    lifts       = [result[2][0][3] for result in results]
    return list(zip(lhs, rhs, supports, confidences, lifts))
ordered_results = pd.DataFrame(inspect(results), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])

In [None]:
ordered_results