# In-class review of Notebook 2: Association rule mining

In [1]:
import sys
print(f"=== Python version ===\n{sys.version}")

**Duck typing:** The `make_itemsets()` function uses it implicitly because it assumes `words` and each `w` in `words` are both iterable.

In [2]:
def make_itemsets(words):
    return [set(w) for w in words]
    
make_itemsets(['sed', 'ut', 'perspiciatis', 'unde', 'omnis'])

In [3]:
x = {'sed', 'ut', 'perspiciatis', 'unde', 'omnis'}
make_itemsets(x)

**Problem:** Although this implementation of `make_itemsets()` is perfectly fine and meets the specifications given, we cannot reuse it easily for the "actual basksets" version of the problem (Exercise 11) without preprocessing.

Recall the input to Exercise 11 (example here):

In [4]:
csv_input = """citrus fruit,semi-finished bread,margarine,ready soups
tropical fruit,yogurt,coffee
whole milk
pip fruit,yogurt,cream cheese ,meat spreads
other vegetables,whole milk,condensed milk,long life bakery product"""

It's a string where newlines separate basket-strings:

In [5]:
baskets = csv_input.split('\n')
baskets

We can't reuse `make_itemsets()` directly on `baskets`:

In [6]:
make_itemsets(baskets)

Why does that happen? Because applying `set(b)` to any `b` in `baskets` will just produce a set of letters. For instance:

In [7]:
set('abc')

That's because what `set(b)` expects is an _iterable_ `b`. And strings are iterable, where each iterate (or iteration of a loop) produces one character from the string. Example:

In [8]:
for e in 'abc':
    print(e)

Conceptually, you can imagine that the implementation of `set(x)` is just like the function, `make_set(x)`, below:

In [9]:
# set(x) # what is `x`? it's something that you can iterate over
def make_set(x): # emulates `set(x)`
    S = set()
    for e in x:
        S.add(e)
    return S

make_set('abc')

Therefore, if `s` is a basket-string, then `set(s)` or `make_set(s)` will just return a set from the characters:

In [10]:
first_basket_string = baskets[0]
first_basket_string

'citrus fruit,semi-finished bread,margarine,ready soups'

In [11]:
set(first_basket_string)  # ... or `make_set(first_basket_string)`

{' ',
 ',',
 '-',
 'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'm',
 'n',
 'o',
 'p',
 'r',
 's',
 't',
 'u',
 'y'}

If instead we have a list, `L`, then `set(L)` or `make_set(L)` will convert `L` to a list, preserving the elements:

In [12]:
set(['cat', 'dog', 'mouse'])

{'cat', 'dog', 'mouse'}

Thus, we should take the basket-string, split it on the comma-character to get a list of items, and then turn that into a set for use with our association miner:

In [13]:
first_basket_list = first_basket_string.split(',')
first_basket_list

['citrus fruit', 'semi-finished bread', 'margarine', 'ready soups']

In [14]:
set(first_basket_list) # ... or `make_set(first_basket_list)`

{'citrus fruit', 'margarine', 'ready soups', 'semi-finished bread'}

**Generalizing `make_itemsets`.** Recall the original `make_itemsets` solution:

In [15]:
def make_itemsets(baskets):
    return [set(b) for b in baskets]

It "hard-codes" the use of the `set` constructor to get a set from each element `b` of `baskets`. So, one step toward generalizing it is to allow the caller to supply a different function for that purpose. To preserve the same interface to `make_itemsets`, we can make that a new but _optional_ parameter whose default value is `set`:

In [16]:
def make_itemsets2(baskets, to_set=set):
    return [to_set(b) for b in baskets]

If the caller _omits_ the `to_set` argument, this function behaves identically to `make_itemsets`:

In [17]:
make_itemsets(['abc', 'xyz'])

[{'a', 'b', 'c'}, {'x', 'y', 'z'}]

In [18]:
make_itemsets2(['abc', 'xyz'])

[{'a', 'b', 'c'}, {'x', 'y', 'z'}]

But for our grocery-baskets input, we can provide a different way to convert each basket-string into an itemset.

In [19]:
def basket_string_to_itemset(s):
    return set(s.split(','))

basket_string_to_itemset('other vegetables,whole milk,condensed milk,long life bakery product')

{'condensed milk',
 'long life bakery product',
 'other vegetables',
 'whole milk'}

Recall the `baskets` input:

In [20]:
baskets

['citrus fruit,semi-finished bread,margarine,ready soups',
 'tropical fruit,yogurt,coffee',
 'whole milk',
 'pip fruit,yogurt,cream cheese ,meat spreads',
 'other vegetables,whole milk,condensed milk,long life bakery product']

Applying our new function with the auxiliary itemset-creator yields a nice solution:

In [21]:
make_itemsets2(baskets, to_set=basket_string_to_itemset)

[{'citrus fruit', 'margarine', 'ready soups', 'semi-finished bread'},
 {'coffee', 'tropical fruit', 'yogurt'},
 {'whole milk'},
 {'cream cheese ', 'meat spreads', 'pip fruit', 'yogurt'},
 {'condensed milk',
  'long life bakery product',
  'other vegetables',
  'whole milk'}]

This example uses the ability to pass functions as arguments to other functions to create a nice generalization of the original `make_itemsets` solution. 