# Creating A Recommender System

In [2]:
#import modules

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn')
plt.rcParams['figure.figsize'] = (16,6)

import datetime
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from mlxtend.frequent_patterns import fpgrowth

import my_funcs

In this notebook we're going to look at patterns in purchaser behaviour.

In [5]:
merged = pd.read_csv('outputs/merged.csv')
df = merged

# Support Table Function

## Using fpgrowth to generate a support table


make a binary 'recipe' for baskets in the data, using the COMMODITY_DESC labels

306 columns of COMMODITY_DESC labels;

one row for each basket given in product_lists.




### Product Lists for each `BASKET_ID`

In [6]:
product_lists = df.groupby('BASKET_ID')['COMMODITY_DESC' ].apply(list) 

In [7]:
product_lists[:3]

BASKET_ID
28179463886    [APPLES, STONE FRUIT, WATER - CARBONATED/FLVRD...
28179603462             [COLD CEREAL, FROZEN PIZZA, COLD CEREAL]
28179654108    [CANDY - CHECKLANE, CANDY - CHECKLANE, CANDY -...
Name: COMMODITY_DESC, dtype: object

In [8]:
len(product_lists), merged.shape[0] # theres about 10 products in each basket.

(233794, 2380950)

### Transaction Encoding -- Basket 'Recipes' as Binary Columns

In [9]:
# transaction encoding...
te = TransactionEncoder()
te_fit = te.fit_transform(product_lists.values, sparse=True) # encode each 
te_df = pd.DataFrame.sparse.from_spmatrix(te_fit, columns=[str(i) for i in te.columns_])

In [10]:
te_df.head()

Unnamed: 0,(CORP USE ONLY),ADULT INCONTINENCE,AIR CARE,ANALGESICS,ANTACIDS,APPAREL,APPLES,AUDIO/VIDEO PRODUCTS,AUTOMOTIVE PRODUCTS,BABY FOODS,...,VEAL,VEGETABLES - ALL OTHERS,VEGETABLES - SHELF STABLE,VEGETABLES SALAD,VITAMINS,WAREHOUSE SNACKS,WATCHES/CALCULATORS/LOBBY,WATER,WATER - CARBONATED/FLVRD DRINK,YOGURT
0,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,1,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,1,1
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Notes about the model

We can see that this technique could be used for any groupings (baskets) of sequences (items) for which **the order of the sequences in the basket does not matter**. 

In the next step, we'll be calculating the frequency of co-occurrence of these `tokens`. That step ends up being very similar to TermFrequency-InverseDocumentFrequency (TF-IDF) encoding, and the purpose is very much the same -- to get a general idea of the **relevance of tokens** based on their occurrence in baskets overall (from the sample set).

This means that in order for our model to make sense, we have to comparing the **token `support` for the appropriate baskets**.

### Interpretation and HyperParameters

- Now that we have those sequences, we can effectively transpose them against one another in an iterative fashion and develop a total frequency of co-occurence chart for each item with other items (as members of distinct baskets) with one another.

    - The time and space cost of this model can be adjusted using hyperparameters.

    - This ends up being a fairly expensive operation in terms of computing power, especially for the second step, and shouldn't necessarily be run on a streamlit app...maybe the cloud ;).

    - However, in exchange, we get a wonderfully insightful look into our customer's behaviour.
    
    
- The `support` metric, indicates that a certain item or sequence of items occurs in more than a certain percentage of our baskets; ie 0.005 indicates a 1 in 200 occurrence. 

    - By adjusting the `max_len` parameter, we can control how long these sequences can each be. 

    - The 'hyperparameter' `min_support` indicates how deep the model has to go, looking for patterns. A lot of combinations are cut off almost at once, if the relationship must occur in more than 1/10 baskets!


- **For the next step**, the `support` metric will refer to the **co-occurence of these `antecedent` and `consequent` sequences**. However here it refers simply to the support of an item with itself -- it's ratio of occurrence in all baskets.
    - These initial common `tokens` will make up the basis for our future analysis; they are `frequent item(set)s`, and are stored in a `support table`.



### Final 'Model'

In [21]:
# fpgrowth table
fp_table = fpgrowth(te_df, 
                            min_support=0.1, 
                    
                            use_colnames=True, 
#                                     verbose=True, 
                            max_len=1,   # can alter max_len here.
                            #, low_memory=True,                                   
                            )
# adding a length column for posterity and filtering
fp_table['size'] = fp_table['itemsets'].apply(lambda x: len(x))

In [22]:
fp_table.sort_values('support', ascending=False)

Unnamed: 0,support,itemsets,size
1,0.28357,(SOFT DRINKS),1
2,0.276906,(FLUID MILK PRODUCTS),1
3,0.240652,(BAKED BREAD/BUNS/ROLLS),1
4,0.187982,(CHEESE),1
5,0.167246,(BAG SNACKS),1
6,0.146005,(BEEF),1
9,0.129071,(TROPICAL FRUIT),1
7,0.111838,(EGGS),1
8,0.102646,(REFRGRATD JUICES/DRNKS),1
0,0.100306,(COLD CEREAL),1


**These are all of the labels of *individual items* which exist in more than 1/10 baskets**; those having "sequence length" 1.

We only see the **individual** category labels because we set the parameter `max_len` = `1` in the code above. We also used a `min_support` threshold of 0.1 -- that's why it was so fast to complete. 

There are fpgrowth or apriori algorithms which perform this functionality.

As we increase the `max_len` parameter, reduce the `min_support`, or increase the number of labels (represented in our recipes by the binary columns), we increase the computation power and time necessary to complete the algorithm. 

We can see the surface very quickly; but to go deeper and see more defined or unique relationships, it will take more time and computation power. Moreover, it might be difficult or irrelevant to distinguish those 'patterns' from statistical random chance among All users or transactions. My assumption here is that **distinct customer clusters** will purchase differently -- and if they do, that we might be able to recommend purchases for one household which another similar household also purchased (given the same `antecedent` chain...). 

It might be interesting to look at which products a customer purchased *the next time they visited the store*. 

---

### Generating Support Tables for Distinct Customer Segments

**Based on the above, I want to have a 'support' table for independent and distinct *customer segments*.**
- Since we want our recommendations to be offered on streamlit; we should complete the calculations locally and save the results.

These segments should be easy to interpret, and provide a meaningful distinction from the other groups.

Using something like `RFM score` to denote similar customers might work, but is perhaps not as meaningful as other labelling strategies could be. The **RFM score only takes into account those three attributes**; recency, frequency, and monetary ranks -- not the types of purchases, and certainly not **the motivation behind this behaviour**.

- **What are some alternatives** instead of recommending items that 'similar' customers have purchased? 
    - Generating a support table for all 5 metrics for each individual household might not make sense, as it is computationally intensive. Moreover, for customers who rarely vary their purchases, we might require different thresholds (and, incidentally, have a more difficult time swaying their behaviour). This is an optimization step we would prefer to avoid (especially when implementing on streamlit).

- With that being said; instead of producing a support table for our distinct clusters, can we **reverse-engineer our customer labels based on purchase behaviour**?
    - Since we have demographic information for some households, a complicated thought occurs to me:
        - By taking the transactions of only the households for which we have demographic information and converting that into an aggregate table; then splitting it into training and test sets; we could try to predict the aggregate 'demographic' columns for our remaining households. These labels would be much easier to interpret; for example to distinguish between single-member households, and families of 5+. Doing so would be introducing significant bias and potential for error (as our model will never be perfect). 

- More easily we might use **RFM score window-thresholds** (ie. RFM[0-5)], RFM[5-10)], RFM[(10-15]

- One more option would be to assign cluster labels using **unsupervised learning, like K-Means**.

## `get_support_table`

Let's put it in a function, so we can decide later.

In [13]:
max_len=10 # baskets are around 10 items each, on average
min_support=0.05 # which sequences exist in more than 1/200 transactions?

In [None]:
def get_support_table(df, 
                      max_len=10,
                      min_support=0.05,
                      baskets='BASKET_ID',
                      column='COMMODITY_DESC'): 
    '''   
    # create product lists for each basket
        # using pandas
    # encode using TransactionEncoder()
    # create fpgrowth table
        # using mlxtend
    #
    # return:
    # the support table of transactions from `df`:
    
    # grouped by `baskets`; 
    # of sequences no greater than `max_len` in size;
    # return a dataframe of those occuring more than (`min_support`/1) times
    '''                            
    product_lists = df.groupby('BASKET_ID')[column].apply(list) # apply list constructor

    # transaction encoding...
    te = TransactionEncoder()
    te_fit = te.fit_transform(product_lists.values, sparse=True) # encode each 
    te_df = pd.DataFrame.sparse.from_spmatrix(te_fit, columns=[str(i) for i in te.columns_])

    # fpgrowth table
    frequent_itemsets = fpgrowth(te_df, 
                                min_support=min_support, #support_threshold=0.05
                                use_colnames=True, 
#                                     verbose=True, 
                                max_len=max_len,   # can alter max_len here.
                                #, low_memory=True,                                   
                                )
    # adding a length column for posterity and filtering
    frequent_itemsets['size'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))

    # save variable for reference...
    return frequent_itemsets

In [24]:
# save variable for reference...
fp_table = get_support_table(merged)

In [25]:
# give me the top-supported sequences
print(f'{len(fp_table)} sequences had a support greater than {min_support}')
fp_table.sort_values('support', ascending=False).head()

# 0.05 is 1/20 occurrence.
# 0.25 is 1/4 occurrence.

74 sequences had a support greater than 0.05


Unnamed: 0,support,itemsets,size
6,0.28357,(SOFT DRINKS),1
7,0.276906,(FLUID MILK PRODUCTS),1
8,0.240652,(BAKED BREAD/BUNS/ROLLS),1
9,0.187982,(CHEESE),1
10,0.167246,(BAG SNACKS),1


We chose a large `max_len` (increasing computation time), but a relatively high `min_support` threshold, which made things easier; we aren't 'seeding' our table with a very large number of relationships between (`antecedent` and `consequent`) items (or sequences).

- you can see that the operation took significantly longer than a webpage takes to load. 

The sequences below are the only ones of length 3 or longer which occurred in more than 1/20 baskets;

In [26]:
print(f"The longest chain of items occurring in more than {min_support} baskets was of length {fp_table['size'].max()}, up to a possible of {max_len}.\nThey're listed below:")
fp_table[fp_table['size']>2]

The longest chain of items occurring in more than 0.05 baskets was of length 3, up to a possible of 10.
They're listed below:


Unnamed: 0,support,itemsets,size
52,0.056768,"(SOFT DRINKS, BAKED BREAD/BUNS/ROLLS, FLUID MI...",3
56,0.061263,"(CHEESE, BAKED BREAD/BUNS/ROLLS, FLUID MILK PR...",3


Now that we have a support table, we can derive some more meaningful insights using the *association rules algorithm*.

Since we ran the support table on such a large set of transactions, above, our `support` values are pretty low. 

- We'll talk more about other metrics in a minute... but they answer some of the questions you might want to ask, such as: 
    - which sequences co-occur most often?
    - which items are 'trigger' purchases and -only- purchased in the presence of another sequence?

For now let's call the `association_rules` function from mlxtend:

- the `min_threshold` parameter controls how deep the algorithm must go to look for patterns;

In [27]:
association_rules(fp_table, metric='support', min_threshold=0.1)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(SOFT DRINKS),(FLUID MILK PRODUCTS),0.28357,0.276906,0.101551,0.358116,1.293275,0.023029,1.126517
1,(FLUID MILK PRODUCTS),(SOFT DRINKS),0.276906,0.28357,0.101551,0.366734,1.293275,0.023029,1.131326
2,(BAKED BREAD/BUNS/ROLLS),(FLUID MILK PRODUCTS),0.240652,0.276906,0.124263,0.516361,1.86475,0.057625,1.49511
3,(FLUID MILK PRODUCTS),(BAKED BREAD/BUNS/ROLLS),0.276906,0.240652,0.124263,0.448756,1.86475,0.057625,1.377516


We get this table back. Notice that the `support` metric in **this** table refers to the co-occurence of the distinct sequences of items; while the original `support` value for each is listed under the `antecedent-` or `consequent-support` columns, respectively. 

Moreover, it should become obvious that we have redundant/mirrored values in the top 4.

We parsed this table using the 'support' metrics, and as you can see, the sequences share a `support` value -- a normalized occurrence value. In any places where that shared `support` value is above the threshold, there will be two rows; one each for the `antecedent` and `consequent` relationship between these items.

# Association Rules


Generating insights using the previously-generated co-occurrence table

In [28]:
def assoc_table(support_table, 
                metric='lift',
                min_threshold=1):
    '''thin wrapper(?) for association_rules call
    
    returns:
        adds on the consequent and antecedent chain lengths, for filtering.
    '''
    
    rules = association_rules(support_table,
                              metric=metric, # metric='confidence' 
                              min_threshold=min_threshold) 
    rules["consequent_len"] = rules["consequents"].apply(lambda x: len(x))
    rules["antecedent_len"] = rules["antecedents"].apply(lambda x: len(x))
    
    return rules

Let's take a look at the different metrics by example, but to do so, we'll take only the support table for one household.

A lift greater than 1 indicates that the consequent was purchased much more frequently in the presence of an antecedent chain;

In [32]:
merged[merged['household_key'] == 1].shape

(1632, 19)

In [None]:
%%time
df = merged[merged['household_key'] == 1]
sp_tbl = get_support_table(df, min_support=0.05) # 1 in 20 occurrence

In [42]:
%%time
rules = assoc_table(sp_tbl, metric='lift', min_threshold=1) 
rules.head()

Wall time: 2 ms


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,consequent_len,antecedent_len


In [35]:
rules.shape # holy. 10 million relationships.

(10052334, 11)

In [36]:
rules.sort_values('lift', ascending=False).head(3)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,consequent_len,antecedent_len
2288769,"(DRY NOODLES/PASTA, COOKIES/CONES, REFRGRATD J...","(VEGETABLES - SHELF STABLE, BAKED BREAD/BUNS/R...",0.061728,0.061728,0.061728,1.0,16.2,0.057918,inf,5,3
3638273,"(BAKED BREAD/BUNS/ROLLS, FRUIT - SHELF STABLE,...","(LUNCHMEAT, HEAT/SERVE, SOUP, COOKIES/CONES, D...",0.061728,0.061728,0.061728,1.0,16.2,0.057918,inf,5,5
2320966,"(FLUID MILK PRODUCTS, LUNCHMEAT, MARGARINES, C...","(VEGETABLES - SHELF STABLE, REFRGRATD JUICES/D...",0.061728,0.061728,0.061728,1.0,16.2,0.057918,inf,5,5


In [37]:
all(list(rules['antecedents'].value_counts()) == (rules['consequents'].value_counts()))

True

## Understanding Metrics

 From the mlxtend association_rules() docs:
    
Metric to evaluate if a rule is of interest.
Automatically set to 'support' if `support_only=True`.

**Otherwise, supported metrics are 'support', 'confidence', 'lift',
'leverage', and 'conviction'**

These metrics are computed as follows:

- support(A->C) = support(A+C) [aka 'support'], range: [0, 1]

- confidence(A->C) = support(A+C) / support(A), range: [0, 1]

We see that confidence is simply the support, divided by the occurence of the `antecedent` in the dataset. This is effectively tf-idf.

The support of A->C and C->A will be the same.

The remaining three metrics are derived from the `support` and `confidence`;

- lift(A->C) = confidence(A->C) / support(C), range: [0, inf]

- leverage(A->C) = support(A->C) - support(A)*support(C),
range: [-1, 1]

- conviction = [1 - support(C)] / [1 - confidence(A->C)],
range: [0, inf]

In [100]:
metrics = ['lift', 'conviction', 'confidence', 'support', 'leverage']
min_thresholds = [1, 1, 0.5, 0.1, 0.05] # we'll trim the outputs down a bit anyways.

def make_tests(support_table, metrics, thresholds):
    
    container = dict()
    
    for metric, threshold in zip(metrics, thresholds):
        container[f'{metric}'] = assoc_table(support_table, 
                                             metric=metric, 
                                             min_threshold=threshold)
    return container

rules = make_tests(sp_tbl, metrics, min_thresholds)

In [107]:
rules['lift']['antecedents'].value_counts()

(PAPER TOWELS)                                                          881
(EGGS)                                                                  723
(TROPICAL FRUIT)                                                        629
(MARGARINES)                                                            579
(BEEF)                                                                  535
                                                                       ... 
(CHEESE, EGGS, CARROTS)                                                   1
(LAXATIVES)                                                               1
(EGGS, TROPICAL FRUIT, PAPER TOWELS, FRZN MEAT/MEAT DINNERS, CHEESE)      1
(EGGS, TROPICAL FRUIT, FRZN MEAT/MEAT DINNERS, CHEESE, BEEF)              1
(FRZN JCE CONC/DRNKS, BEEF, FRZN MEAT/MEAT DINNERS, TROPICAL FRUIT)       1
Name: antecedents, Length: 1580, dtype: int64

This is a critical juncture in the creation of our class. 

From what should the Recommender base draw its 'support' from? How do we decide which `antecedent` to use?

Weeelllll, we have these product lists from campaigns...

In [141]:
campaigns = pd.read_csv('outputs/campaign_summary.csv', index_col=0).T

In [142]:
campaigns

Unnamed: 0,First Day,Last Day,Duration,Listed Products,Section Label Counts,Listed Products Total Sales,Listed Products Sales Before,Listed Products Sales During,Listed Products Sales After,First Date,Last Date,timedelta
1,346,383,38,"[28929, 29096, 32387, 32805, 33198, 34180, 343...","{'produce': 158, 'dairy': 150, 'meat': 44, 'be...",57095.46000000001,21854.53,4145.12,31095.81,2005-03-03 00:00:00,2005-04-09 00:00:00,37 days 00:00:00
2,351,383,33,"[49910, 61481, 61509, 67573, 80730, 82937, 857...","{'produce': 97, 'junk_food': 61, 'grocery': 56...",47662.55,19708.03,2713.86,25240.660000000003,2005-03-08 00:00:00,2005-04-09 00:00:00,32 days 00:00:00
3,356,412,57,"[34214, 70714, 70714, 71794, 72290, 72290, 723...","{'produce': 172, 'home_family': 151, 'misc': 1...",43282.66,18003.9,4418.59,20860.17,2005-03-13 00:00:00,2005-05-08 00:00:00,56 days 00:00:00
4,372,404,33,"[27160, 29977, 31349, 32491, 64543, 68122, 711...","{'misc': 91, 'grain_goods': 78, 'meat': 12, 'd...",35229.93,16032.94,1858.32,17338.67,2005-03-29 00:00:00,2005-04-30 00:00:00,32 days 00:00:00
5,377,411,35,"[65969, 66323, 67208, 67481, 67676, 69613, 697...","{'home_family': 425, 'misc': 12, 'grocery': 6}",29514.660000000003,15481.51,1827.91,12205.24,2005-04-03 00:00:00,2005-05-07 00:00:00,34 days 00:00:00
6,393,425,33,"[13005962, 13007355, 13007356, 13007435, 13007...",{'dairy': 18},1999.56,320.93999999999994,155.71999999999997,1522.9,2005-04-19 00:00:00,2005-05-21 00:00:00,32 days 00:00:00
7,398,432,35,"[73428, 74892, 80493, 80553, 110801, 558298, 8...","{'home_family': 199, 'drug': 37, 'meat': 16, '...",14246.100000000002,6429.740000000001,761.44,7054.92,2005-04-24 00:00:00,2005-05-28 00:00:00,34 days 00:00:00
8,412,460,49,"[1062425, 5581193, 903261, 2081690, 1096556, 9...","{'meat': 6130, 'produce': 2603, 'dairy': 1698,...",2501917.54,1256429.13,208834.62,1036653.7899999996,2005-05-08 00:00:00,2005-06-25 00:00:00,48 days 00:00:00
9,435,467,33,"[27754, 28929, 29096, 29340, 30699, 31999, 341...","{'grain_goods': 202, 'drug': 131, 'beverages':...",107146.88,54826.12,6071.83,46248.93,2005-05-31 00:00:00,2005-07-02 00:00:00,32 days 00:00:00
10,463,495,33,"[33555, 55021, 59433, 60997, 61750, 64322, 128...","{'home_family': 367, 'misc': 14, 'drug': 7, 'g...",37958.89,24395.78,1777.0300000000002,11786.08,2005-06-28 00:00:00,2005-07-30 00:00:00,32 days 00:00:00


In [133]:
campaign_products = campaigns.loc[:,'13']['Listed Products'].split(',').strip()

In [134]:
campaign_products

['[1085765',
 ' 12673308',
 ' 316011',
 ' 1049788',
 ' 1164410',
 ' 1913500',
 ' 2701357',
 ' 977014',
 ' 1025176',
 ' 1807637',
 ' 1013615',
 ' 944172',
 ' 490823',
 ' 1127758',
 ' 388636',
 ' 2056950',
 ' 9837539',
 ' 9837565',
 ' 13653496',
 ' 10121829',
 ' 5996007',
 ' 1885849',
 ' 979551',
 ' 642584',
 ' 27978',
 ' 892923',
 ' 29431',
 ' 1129805',
 ' 821553',
 ' 1096332',
 ' 1017782',
 ' 1005149',
 ' 79954',
 ' 1060831',
 ' 1038743',
 ' 28889',
 ' 1653930',
 ' 76249',
 ' 1070748',
 ' 1058115',
 ' 897733',
 ' 1038515',
 ' 847966',
 ' 834061',
 ' 6391149',
 ' 6391176',
 ' 6391203',
 ' 100505',
 ' 1016360',
 ' 955937',
 ' 95924',
 ' 13512387',
 ' 901039',
 ' 6979723',
 ' 6961684',
 ' 6944606',
 ' 9337208',
 ' 9337171',
 ' 9337170',
 ' 9296440',
 ' 935200',
 ' 983784',
 ' 10285149',
 ' 10285187',
 ' 1101771',
 ' 13513135',
 ' 13512682',
 ' 12523563',
 ' 12527821',
 ' 12523993',
 ' 12538907',
 ' 12524960',
 ' 12526314',
 ' 12523992',
 ' 12541857',
 ' 12524009',
 ' 1014107',
 ' 9368403'

In [126]:
campaign_products.merge(merged, on='PRODUCT_ID')

AttributeError: 'str' object has no attribute 'merge'

# The Recommend Function

- Items that might make us the most money alongside other products?
    - to do this, we could calculate an 'average item cost' using the REAL SHELF PRICE mean for all transactions with that PRODUCT_ID. --> how much money a transaction for that PRODUCT brings into our stores, on average, as revenue.
    
- Items that similar customers have been interested in
    -> by conviction
    -> by lift
    
- Items that this household has purchased before?
    -> which -categories of -products-- are most influenced by:
        - advertising campaigns
    -> less influenced? perhaps advertising is better suited in other categories
    
- Items that this household has recently purchased?
    -> that are out of the usual?



What is necessary for each to function?

In [None]:
df = merged[merged['household_key'] == 1]
sp_tbl = get_support_table(df, min_support=0.001) # 1 in 1000 occurrence
rules = assoc_table(sp_tbl, metric='lift', min_threshold=1) 
rules.head()

In [None]:
metric = 'lift'
assoc_table = rules
prev_purchases = ['APPLES']

def recommend(self, prev_purchases:list, howmany=5):
    '''meat and bones of the recommender system...
    accepts:
        prev_purchases: a list of previously purchased items
        howmany: (int) how many recommendations you want

    returns:
        a series consisting of the top 5 results given the metric value.
        '''
    search_terms = list(prev_purchases) # handles frozensets?
    # apply list to 'antecedent' in assoc_table
    search_series = pd.Series(assoc_table['antecedents'].apply(list)) 

    print(f'Searching for {search_terms}...')
    indexes_of_matches = []

    # for each antecedent chain...
    for item in search_terms:
        # iterate through the list of "antecedent" rows **search_series**
        for idx, val in search_series.iteritems():
            if item in val: # if the item is in the row (list of antecedents)..
                indexes_of_matches.append(idx)

    rules = assoc_table.loc[indexes_of_matches]


    # RETURN TOP 5 LIFT CONSEQUENTS
    return rules.sort_values(metric, ascending=False)[:howmany]['consequents']



In [None]:
def recommend(self, antecedents:list, metric='lift'):
    '''meat and bones of the recommender system...
    accepts:
        prev_purchases: a list of previously purchased items
        howmany: (int) how many recommendations you want

    returns:
        a series consisting of the top 5 results given the self.metric value.
        '''
    search_terms = list(prev_purchases) # handles frozensets?
    
    # apply list to 'antecedent' in self.assoc_table
    search_series = pd.Series(self.assoc_table['antecedents'].apply(list)) 

    print(f'Searching for {search_terms}...')
    indexes_of_matches = []

    # for each antecedent chain...
    for item in search_terms:
        # iterate through the list of "antecedent" rows **search_series**
        for idx, val in search_series.iteritems():
            if item in val: # if the item is in the row (list of antecedents)..
                indexes_of_matches.append(idx)

    rules = self.assoc_table.loc[indexes_of_matches]
