# *FP-Growth* Recommender Algorithm


Frequent Pattern Mining (FP-Growth)

* Based on user-item interaction matrix

* The Item-based model (off-line) : we "assume that users will mostly be interested by items similar to the one they have interacted with in the past". Here, the similarity between two items is based on how many users had bought both in the past.

## Loading and preparing Data

This steps manage to get data of sells. In order to show name of products this is loaded too. The relevant data is only the ticket as identifier as unique sell of group of products and the product id. The amount, quantity and family aren't taken in account.

The details and step by step explanation are done in Data Exploration Notebook. This part is a copy from that notebook.

In [1]:
import pandas as pd
import numpy as np
import random
from google.colab import drive
drive.mount('/content/drive')
path_owner = "/content/drive/MyDrive/owner.txt"

# Run only one time with your user to personalize your path
# print("USER_NAME", file=open(path_owner, "w"), end='')

with open(path_owner) as f:
    current_user = f.readline()

print("current user:", current_user)

# add the your path in a elif condition
if(current_user == "sergi"):
  base_path = '/content/drive/MyDrive/Master/Project Big Data/Project Big Data/'
else:
  base_path = '/content/drive/MyDrive/'

print("your base path: ", base_path)

file_products = base_path + 'data/ARTICLES.csv'
file_families = base_path + 'data/AECOC.csv'
file_sells_2020 = base_path + 'data/s20.csv'
file_sells_2019 = base_path + 'data/s19.csv'

#fix seed only for development 
random_seed = 11
random.seed(random_seed)

Mounted at /content/drive
current user: sergi
your base path:  /content/drive/MyDrive/Master/Project Big Data/Project Big Data/


In [2]:
products = pd.read_csv(file_products,sep=';',encoding='ISO-8859-1', skiprows=1,\
                    names=['product_id','product_desc','x1','x2','x3','family_id'])\
                    .drop(['x1','x2','x3'],axis=1).set_index('product_id').dropna()

In [3]:
sells_2020 = pd.read_csv(file_sells_2020,sep=';',encoding='ISO-8859-1',\
                    names=['invoice_id','product_id','units','amount','checkout','date','hour'])



In the Data Exploration some products are found to be ommited.

|product_id|units|description|
|-----|---------|---------------------|
|9117	|26066.0	|BOLSAS CAMISETA GALG |
|8055	|23754.0	|BOLSAS CAMISETA CON. |
|8419	|439.0	  |BOLSAS RAFIA COLOR N |



In [4]:
products_to_avoid = [
                     8055, # plastic bag
                     9117, # plastic bag
                     8419, # raffia bag
                     ]
sells_2020 = sells_2020[~sells_2020['product_id'].isin(products_to_avoid)]

## Preparing Data to apply the algorithm

To apply this algotirthm is condidered that only the present of a product on the card is valorated, so  the units, price, families and other information is ommited.

It's necessary to upgrade mlxtend library to get the functionalities that are necessary.

In [5]:
%%capture
!pip install mlxtend --upgrade --no-deps

First it's necessary to transform the data to One-hot format.

In [6]:
sells_2020_grouped_productlist = sells_2020[['invoice_id','product_id']].groupby(by='invoice_id').agg(list)


The sells that only has one product are ommited because don't provide any information.

In [7]:
# Filtering sales with only one product
sells_2020_grouped_productlist['n_products'] = sells_2020_grouped_productlist[['product_id']].apply(lambda x: len(x[0]),axis=1)
sells_2020_grouped_productlist[sells_2020_grouped_productlist['n_products']>1]

Unnamed: 0_level_0,product_id,n_products
invoice_id,Unnamed: 1_level_1,Unnamed: 2_level_1
2027-T0132C01-000001,"[6252, 4465, 4465, 4465, 4465, 6631, 6628, 818...",24
2027-T0132C01-000002,"[3066, 3066]",2
2027-T0132C01-000003,"[2412, 2402, 6183, 2408, 3227, 7587]",6
2027-T0132C01-000004,"[8286, 6170, 5880, 6631, 3635, 4488, 8348, 163...",13
2027-T0132C01-000005,"[7755, 1481, 5583]",3
...,...,...
2027-T0132C03-034305,"[7692, 7630, 5916, 6253, 6253]",5
2027-T0132C03-034307,"[7038, 3227, 6120, 1628]",4
2027-T0132C03-034308,"[9049, 8170]",2
2027-T0132C03-034309,"[7770, 7770, 7759, 7771, 6330, 7771, 2760]",7


The algorithm from mlxtend library doesn't work with pandas formats. So, this need a to transform to a simple pyhton list. 

In [8]:
from sklearn.model_selection import train_test_split
training_sells, test_sells = train_test_split(sells_2020_grouped_productlist, test_size=0.2, random_state=random_seed)

In [9]:
products_2020_to_fit = training_sells['product_id'].tolist()

To be applied the algorinthm *FP-Growth* we need to transform the data from lists of products to one-hot format. This format is a flat matrix with all products as a columns that if it is sold on each row it's marked as True.

In [10]:
from mlxtend.preprocessing import TransactionEncoder
#One-hot transformer
te = TransactionEncoder()
te_ary = te.fit(products_2020_to_fit).transform(products_2020_to_fit)
# convert result to Pandas Dataframe
one_hot_df = pd.DataFrame(te_ary, columns=te.columns_)
one_hot_df

Unnamed: 0,200,206,207,208,209,211,212,213,216,217,218,221,223,224,225,226,227,231,232,234,235,236,237,240,242,244,245,247,249,302,304,305,312,313,314,315,317,318,319,320,...,8596,8598,8920,8921,8922,8923,8929,8933,8939,8947,8951,8957,8966,8983,8993,8995,9044,9048,9049,9052,9053,9064,9066,9067,9110,9119,9127,9129,9136,9137,9138,9552,9771,85253,85254,85487,85488,86153,86790,91534
0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129288,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
129289,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
129290,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
129291,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


## Apply FP-Growth Recommender Algorithm



In [11]:
from mlxtend.frequent_patterns import fpgrowth
from mlxtend.frequent_patterns import association_rules

The **support** in *FP-Growth* algorith has a very importance. This means the relation with the times of one products or a set of them appears related with the number of sells. So the parameter min_support is a cutoff to discart all relations that don't have enought "importance". You can view that min_support is pretty low as the number of sales are very high with a huge of combinations we need to set this cutoff with a value that don't include a very rarely combinations that only happen very few times.
At ends we get a list of sets of products with its support.

In [12]:
frequent_itemsets = fpgrowth(one_hot_df, min_support=0.001, use_colnames=True)
frequent_itemsets.nlargest(10, 'support')

Unnamed: 0,support,itemsets
43,0.057343,(6253)
245,0.052725,(6252)
15,0.048525,(2111)
192,0.031974,(7550)
93,0.030713,(6251)
127,0.030497,(6255)
135,0.030056,(2567)
48,0.026096,(5690)
383,0.025732,(4490)
20,0.025175,(5954)


Applying the `association_rules` it's obtained several values. These are the definition of each one:
- **support**: The support metric is defined for itemsets, not assocication rules. The table produced by the association rule mining algorithm contains three different support metrics: 'antecedent support', 'consequent support', and 'support'. Here, 'antecedent support' computes the proportion of transactions that contain the antecedent A, and 'consequent support' computes the support for the itemset of the consequent C. The 'support' metric then computes the support of the combined itemset A ∪ C -- note that 'support' depends on 'antecedent support' and 'consequent support' via min('antecedent support', 'consequent support'). <br><br>Typically, support is used to measure the abundance or frequency (often interpreted as significance or importance) of an itemset in a database. We refer to an itemset as a "frequent itemset" if you support is larger than a specified minimum-support threshold. Note that in general, due to the downward closure property, all subsets of a frequent itemset are also frequent.

\begin{align}
 suport(A → C) = suport(A ∪ C); range [0,1]
\end{align}

- **confidence**: The confidence of a rule A$\rightarrow$C is the probability of seeing the consequent in a transaction given that it also contains the antecedent. Note that the metric is not symmetric or directed; for instance, the confidence for A$\rightarrow$C is different than the confidence for C$\rightarrow$A. The confidence is 1 (maximal) for a rule A$\rightarrow$C if the consequent and antecedent always occur together.

\begin{align}
 conficende(A → C) = \dfrac{suport(A → C)}{suport(A)}); range [0,1]
\end{align}

- **lift**: The lift metric is commonly used to measure how much more often the antecedent and consequent of a rule A$\rightarrow$C occur together than we would expect if they were statistically independent. If A and C are independent, the Lift score will be exactly 1.

\begin{align}
 lift(A → C) = \dfrac{confidence(A → C)}{suport(C)}); range [0,∞]
\end{align}

- **leverage**: Leverage computes the difference between the observed frequency of A and C appearing together and the frequency that would be expected if A and C were independent. A leverage value of 0 indicates independence.

\begin{align}
 leverage(A → C) = suport(A → C) - suport(A) × suport(C); range [-1,1]
\end{align}

- **convinction**: A high conviction value means that the consequent is highly depending on the antecedent. For instance, in the case of a perfect confidence score, the denominator becomes 0 (due to 1 - 1) for which the conviction score is defined as 'inf'. Similar to lift, if items are independent, the conviction is 1.

\begin{align}
 convinction(A → C) = \dfrac{1-suport(C)}{1-confidence(A \rightarrow C)}); range [0,∞]
\end{align}

source: http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/

From the he below list it's possible apply an algorithm to get the associations with products. The most important value that is get for our requirements is the confidence. The confidence offers a factor that mesure the quality of the rule (it mesure the number of times that the rule is correct against that can be applied)

It's pretty extrange to view a confidence of 100% and is's easly visible that always is present the product 9771. This product in fact is a tax. All drinks sold in Catalonia that have more than 5gr /100ml have to pay tax and if is upper than 8gr/100ml this tax is higher. This tax is usually inapreciate by the customers but as this data is extranct from the data that it prepared to send to tax authorities this kind of information can appear.

More information about sugar tax: https://www.hacienda.gob.es/Documentacion/Publico/PortalVarios/FinanciacionTerritorial/Autonomica/TributosPropios/Normativa/2018/15.%20Impuesto%20bebidas%20azucaradas%20envasadas%20CATALU%C3%91A.pdf)

In [13]:
ass_rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.01)
print ('The product 9771 is :',products.loc[9771]['product_desc'])
ass_rules.nlargest(10, 'confidence')

The product 9771 is : IMPOST SOBRE BEGUDES


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
268,(8122),(9771),0.007479,0.016397,0.007479,1.0,60.987264,0.007357,inf
932,(8393),(9771),0.002606,0.016397,0.002606,1.0,60.987264,0.002564,inf
984,(7746),(9771),0.003914,0.016397,0.003914,1.0,60.987264,0.003849,inf
1173,(8350),(9771),0.00123,0.016397,0.00123,1.0,60.987264,0.00121,inf
1179,(8351),(9771),0.001145,0.016397,0.001145,1.0,60.987264,0.001126,inf
1033,(8445),(6298),0.003473,0.004648,0.00215,0.619154,133.198396,0.002134,2.613526
1132,"(8403, 8229)",(3091),0.005197,0.020365,0.002676,0.514881,25.283138,0.00257,2.019371
1032,(6298),(8445),0.004648,0.003473,0.00215,0.462562,133.198396,0.002134,1.854219
195,(2454),(2453),0.002328,0.004316,0.001067,0.458472,106.231522,0.001057,1.838656
269,(9771),(8122),0.016397,0.007479,0.007479,0.456132,60.987264,0.007357,1.82493


It's better add the product to list of products_to_avoit. In order to avoid repeat all the process the same results is remove all the rules that has the product id: 9771.



In [14]:
ass_rules = ass_rules[ass_rules['antecedents'].apply(lambda a : 9771 not in a)]
ass_rules = ass_rules[ass_rules['consequents'].apply(lambda c : 9771 not in c)]
ass_rules.nlargest(10, 'confidence')

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
1033,(8445),(6298),0.003473,0.004648,0.00215,0.619154,133.198396,0.002134,2.613526
1132,"(8403, 8229)",(3091),0.005197,0.020365,0.002676,0.514881,25.283138,0.00257,2.019371
1032,(6298),(8445),0.004648,0.003473,0.00215,0.462562,133.198396,0.002134,1.854219
195,(2454),(2453),0.002328,0.004316,0.001067,0.458472,106.231522,0.001057,1.838656
1131,"(3091, 8403)",(8229),0.006226,0.017456,0.002676,0.429814,24.62202,0.002567,1.723197
507,(8229),(3091),0.017456,0.020365,0.007123,0.408064,20.037901,0.006768,1.654968
390,(7042),(7043),0.005228,0.017163,0.002119,0.405325,23.616829,0.002029,1.652732
1119,(8403),(3091),0.015747,0.020365,0.006226,0.395383,19.415218,0.005905,1.620258
978,(3776),(5583),0.005515,0.010194,0.002096,0.380084,37.285448,0.00204,1.596678
1130,"(3091, 8229)",(8403),0.007123,0.015747,0.002676,0.375679,23.856883,0.002564,1.576516


Trying to find more taxes. But this is the only tax that is managed as a product.

In [15]:
products[products['product_desc'].str.match('.*IMPOST.*')]

Unnamed: 0_level_0,product_desc,family_id
product_id,Unnamed: 1_level_1,Unnamed: 2_level_1
9771,IMPOST SOBRE BEGUDES,00*00*00


## Making a Recommender

With previous job we can take the strategy to consider confidence as points. So, it's possible to extract the recommended products from card and summing its points we can get an ordered set with products to be recommended it we only like one we can take the top of this list.

In [16]:
product_rules = ass_rules[['antecedents','consequents','confidence']];
product_rules

Unnamed: 0,antecedents,consequents,confidence
0,(5939),(7550),0.064641
1,(7550),(5939),0.032414
2,(5939),(6251),0.062711
3,(6251),(5939),0.032737
4,(5939),(2111),0.099855
...,...,...,...
1171,(6300),(2880),0.147084
1174,(7824),(8138),0.140411
1175,(8138),(7824),0.335378
1176,(3586),(6555),0.153767


Getting a recommendation from these products:

In [17]:
sample_products = [
                   2567, # Special Tomato to put over bread
                   5690 # Jam
                   ]

You can view you products using the resources from the web page of the company. this is a function to facilitate display these images.

In [18]:
from IPython.display import Image, HTML, display

def display_product_images(products, width=300):
  images = ''
  for p in products:
    html = f'<img src="https://www.bonarea.com/tendaprova/fotos/13_{p}_g.png" width="{width}"/>'
    images+=html
  display(HTML(images))

View the image of chose sample products.

In [19]:
display_product_images(sample_products)

Filtering the rules that has the sample products as antecedents. So antecedents is a set the products are found because the antecedent is only one of them or a set of both. Sets that contains this products with other product are discarded.

In [20]:
affected_rules = product_rules[product_rules['antecedents'].apply(lambda r : all(elem in sample_products for elem in r))]
affected_rules

Unnamed: 0,antecedents,consequents,confidence
15,(2567),(1624),0.038857
17,(5690),(1624),0.045347
23,(2567),(1623),0.033453
43,(2567),(5954),0.057643
49,(5690),(5954),0.040605
...,...,...,...
889,(2567),(2545),0.038085
909,(2567),(5861),0.034483
910,(5690),(5861),0.040012
940,(2567),(2007),0.040401


From the rules is we obtained a score getting the confidence value. So, the recommendation will be done using this score summing all confidence values. So if one product is recommended several times it has the score of summing all confidence values.

In [21]:
recomendations = {}
for index, row in affected_rules.iterrows():
    for c in row['consequents']:
      sum = recomendations[c] if c in recomendations else 0
      sum += row['confidence']
      recomendations[c] = sum

# remove products that also are in the cart (can be recommended from each other)
for p in sample_products:
  if p in recomendations:
    del recomendations[p]




You get a list of products recomended. 

An expectation could be that a recommended product was bread but it didn't apear. 

An important point to observe is that the products that are obtained are mostly the top sells.

In [22]:
products_desc_dict = products['product_desc'].to_dict()
html =  '<p>For the products:'
html += '<table class="dataframe">'
for p in sample_products:
  html += f'<tr><td>{p}</td><td>{products_desc_dict[p]}</td></tr>'
html += '</table></p><br>'

html += '<p>The top 10 recommended products in sorted order are:'
html += '<table class="dataframe">'
print("")
for p in sorted(recomendations, key=recomendations.get, reverse=True)[0:10]:
  html += f'<tr><td>{p}</td><td>{products_desc_dict[p]}</td></tr>'
html += '</table></p>'

display(HTML(html))




0,1
2567,TOMATE UNTAR PAN
5690,JAMON SERRANO

0,1
2111,PLATANOS CANARIAS BO
6252,HUEVOS L RUBIO BONAR
2760,PATATA MALLA BONAREA
5856,FINISIMO JAMON COCID
6251,HUEVOS XL RUBIO BONA
6253,HUEVOS M RUBIO BONAR
5954,BUTIFARRA FRESCA CON
4493,BUTIFARRA FRESCA DE
6899,FUET EXTRA
4487,LOMO CAï¿½A ENTERO CER


Images view of all products recommended that can helps to get some observations:

- There are some products related with a meal snack.
- There are other products related to a barbacue.


In [23]:
display_product_images(sorted(recomendations, key=recomendations.get, reverse=True),width=100)

## Test the algorithm

From a set of sells that have 2 or more products we try to remove one of the products we try to apply the recommender and comparing the results we can get a result.

The function to get a recommendation since a list of products. Is the same code of below lines but in all in one function.

In [24]:
def recommender(list_products):
  affected_rules = product_rules[product_rules['antecedents'].apply(lambda r : all(elem in list_products for elem in r))]
  recomendations = {}
  for index, row in affected_rules.iterrows():
      for c in row['consequents']:
        sum = recomendations[c] if c in recomendations else 0
        sum += row['confidence']
        recomendations[c] = sum
  # ensure not recommend an already sold product
  for p in list_products:
    if p in recomendations:
      del recomendations[p]
  recomendations_sorted = sorted(recomendations, key=recomendations.get, reverse=True)
  return recomendations_sorted[0] if recomendations_sorted else -1

As the process to do with all sells is pretty long it was done with a sample of 10000 sells.

In [25]:
sells_test = test_sells[sells_2020_grouped_productlist['n_products']>2].copy()
sells_test['product_expected'] = sells_test['product_id'].apply(lambda p: random.choice(p))
sells_test['product_without_expected'] = sells_test.apply(lambda a : [x for x in a['product_id'] if x != a['product_expected']],axis=1)
sells_test

  """Entry point for launching an IPython kernel.


Unnamed: 0_level_0,product_id,n_products,product_expected,product_without_expected
invoice_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2027-T0132C01-084649,"[2408, 8215, 6252]",3,8215,"[2408, 6252]"
2027-T0132C01-012079,"[3383, 3383, 6312]",3,6312,"[3383, 3383]"
2027-T0132C03-009310,"[5936, 353, 350]",3,353,"[5936, 350]"
2027-T0132C01-099297,"[8413, 8180, 7620, 8983, 3042]",5,8983,"[8413, 8180, 7620, 3042]"
2027-T0132C01-008080,"[6120, 6304, 6412, 8200, 8198, 8198, 6295, 811...",22,1623,"[6120, 6304, 6412, 8200, 8198, 8198, 6295, 811..."
...,...,...,...,...
2027-T0132C03-022771,"[8199, 5499, 5499, 7740, 1618, 1618, 4486, 7753]",8,7753,"[8199, 5499, 5499, 7740, 1618, 1618, 4486]"
2027-T0132C01-091102,"[223, 3482, 6253, 207, 212]",5,223,"[3482, 6253, 207, 212]"
2027-T0132C01-111007,"[7579, 4181, 4355, 5116, 5116]",5,5116,"[7579, 4181, 4355]"
2027-T0132C03-028305,"[5982, 7673, 7742, 3480, 3480, 8111]",6,7742,"[5982, 7673, 3480, 3480, 8111]"


Applying the recommender to set of testing sells.

In [26]:
sells_test['product_recomended'] = sells_test['product_without_expected'].apply(recommender)
sells_test


Unnamed: 0_level_0,product_id,n_products,product_expected,product_without_expected,product_recomended
invoice_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2027-T0132C01-084649,"[2408, 8215, 6252]",3,8215,"[2408, 6252]",2111
2027-T0132C01-012079,"[3383, 3383, 6312]",3,6312,"[3383, 3383]",4474
2027-T0132C03-009310,"[5936, 353, 350]",3,353,"[5936, 350]",-1
2027-T0132C01-099297,"[8413, 8180, 7620, 8983, 3042]",5,8983,"[8413, 8180, 7620, 3042]",7550
2027-T0132C01-008080,"[6120, 6304, 6412, 8200, 8198, 8198, 6295, 811...",22,1623,"[6120, 6304, 6412, 8200, 8198, 8198, 6295, 811...",6407
...,...,...,...,...,...
2027-T0132C03-022771,"[8199, 5499, 5499, 7740, 1618, 1618, 4486, 7753]",8,7753,"[8199, 5499, 5499, 7740, 1618, 1618, 4486]",5376
2027-T0132C01-091102,"[223, 3482, 6253, 207, 212]",5,223,"[3482, 6253, 207, 212]",7550
2027-T0132C01-111007,"[7579, 4181, 4355, 5116, 5116]",5,5116,"[7579, 4181, 4355]",4490
2027-T0132C03-028305,"[5982, 7673, 7742, 3480, 3480, 8111]",6,7742,"[5982, 7673, 3480, 3480, 8111]",2111


Next, is compared if the result got is the same as expected and calc the precision that it's obtained.

In [27]:
sells_test['success_recomended'] = sells_test.apply(lambda x : x['product_recomended'] == x['product_expected'],axis=1)
sells_test[sells_test['success_recomended']].size

ok = sells_test[sells_test['success_recomended']].index.size
total = sells_test.index.size
percent = 100 * ok / total
display(HTML(f'<p>The ratio of succes is: {ok} over {total:,}. So, the precision is <big><b>{percent:.2f}%</b></big></p>') )

## Observing the rules

It's a good tool to observe the habits of the customers. The rule more common that is observed is when one customer buy any product it's has high probability that it buys a similar product for example with a different flavor. For example, if they buy iogurt usually take two ore more packs with different flavor or if they buy fried potatoes it's common that takes two types of them. 


In [28]:
products_desc_serie = products['product_desc'].to_dict()

ass_rules['antecedents_prod'] = ass_rules['antecedents'].apply(lambda s: [products_desc_serie[p] for p in s])
ass_rules['consequents_prod'] = ass_rules['consequents'].apply(lambda s: [products_desc_serie[p] for p in s])
ass_rules[['antecedents_prod','consequents_prod','confidence']].sort_values(by="confidence",ascending=False).head(25)

Unnamed: 0,antecedents_prod,consequents_prod,confidence
1033,[POSTRE DE MANZANA Y],[POSTRE DE MANZANA BO],0.619154
1132,"[YOGUR CON PERA Y KIW, YOGUR CON FRESA BONA]",[YOGUR CON MELOCOTï¿½N],0.514881
1032,[POSTRE DE MANZANA BO],[POSTRE DE MANZANA Y],0.462562
195,[GRANILLO ALMENDRA 25],[HARINA ALMENDRA 250],0.458472
1131,"[YOGUR CON MELOCOTï¿½N, YOGUR CON PERA Y KIW]",[YOGUR CON FRESA BONA],0.429814
507,[YOGUR CON FRESA BONA],[YOGUR CON MELOCOTï¿½N],0.408064
390,[BULL NEGRE],[DONEGAL -BUTIFARRA D],0.405325
1119,[YOGUR CON PERA Y KIW],[YOGUR CON MELOCOTï¿½N],0.395383
978,[CERDO HUESOS BLANCOS],[CODILLOS DE JAMON CU],0.380084
1130,"[YOGUR CON MELOCOTï¿½N, YOGUR CON FRESA BONA]",[YOGUR CON PERA Y KIW],0.375679


## Conclusions

It's important think on target of recommendation. This target can change with the rules of bussines. Some rules that managers can say:
- Sell products with more profit
- Recommend products that are on the shelves that the customer has not visited.
- Any other commercial rule

Talking about the principal target since start, that is recommend a product that the customer can forget. This recommender system are recommending products that the customer has seen with high probability and if he didn't take it is possible that he don't want it. 

To improve the recommender system can be an interesting idea use another dataset as a source to get the rules. For example, try to use kitchen Recipes.  Starting from the products on the cart of customer, try to guess witch recipes the customer wants to do and if some ingredient he doesn't take it, then recommend this. 

Fp-Growth algorim allows to inspect the rules and the logical of them instead Neural Networks that this rules aren't readables.