In [152]:
import pandas
%matplotlib inline
salesDF = pandas.read_excel("SampleSalesData.xlsx")

# Affinity Analysis

Affinity Analysis is the search for patterns of co-occurence in the dataset. <br>
A simple co-occurence detection of what a customer has also bought within a span of time can yield a simple market basket analysis. <br>
Note, the example below is a simple algorithm for market basket analysis; it can also be further tested rigorously with apriori algorithm to get more defined results.<br>
The purpose of the example below is to illustrate how a quick analysis can be done<br>


In [153]:
weeklySalesGroup = salesDF.groupby(['Customer Name', pandas.Grouper(key='Order Date', freq='W-MON')])["Product Name"]

In [154]:
weeklySalesGroup.value_counts()

Customer Name          Order Date  Product Name                                                                         
Aaron Davies Bruce     2012-02-06  Eldon Radial Chair Mat for Low to Medium Pile Carpets                                    1
                                   Eldon Shelf Savers™ Cubes and Bins                                                       1
                                   Electrix Halogen Magnifier Lamp                                                          1
                                   Storex Dura Pro™ Binders                                                                 1
Aaron Day              2010-11-29  Trav-L-File Heavy-Duty Shuttle II, Black                                                 1
                       2013-08-26  i270                                                                                     1
Aaron Dillon           2010-08-16  Avery 491                                                                               

In [155]:
# we are interested in what the user has bought within the same week (ignoring repeats within the week)
weeklySalesGroupProductSets = weeklySalesGroup.apply(set)

In [156]:
weeklySalesGroupProductSets.values

array([ {'Eldon Radial Chair Mat for Low to Medium Pile Carpets', 'Electrix Halogen Magnifier Lamp', 'Eldon Shelf Savers™ Cubes and Bins', 'Storex Dura Pro™ Binders'},
       {'Trav-L-File Heavy-Duty Shuttle II, Black'}, {'i270'}, ...,
       {'Hon GuestStacker Chair'},
       {'GBC Poly Designer Binding Covers', 'Global High-Back Leather Tilter, Burgundy'},
       {'Bevis Round Conference Table Top, X-Base'}], dtype=object)

In [157]:
numOfGroupedWeeklyTransactions = len(weeklySalesGroupProductSets.values) 
numOfGroupedWeeklyTransactions

6538

In [158]:
# if the next line looks complicated to you, please check out list comprehensions
# https://www.pythonforbeginners.com/basics/list-comprehensions-in-python
weeklyBasketWithMultipleItems  = [aSet for aSet in weeklySalesGroupProductSets.values if len(aSet) > 2]
numOfMultipleItemBaskets = len(weeklyBasketWithMultipleItems)
numOfMultipleItemBaskets

286

In [159]:
numOfMultipleItemBaskets/numOfGroupedWeeklyTransactions

0.043744264301009486

Note: at this point, it would be clear to the analyst that a market basket analysis should not yield any significant results as it shows that the chance of a basket with multiple items is very low.<br>
In addition, the ratio also indicates how much maximum impact or benefit such analysis will afford us.

Following section will step through more of the market basket analysis in order to find out if there are co-occurences and how much effect (support) there is

In [174]:
# itertools is a standard library in python to deal with some common operations across common datatypes
from itertools import combinations, groupby, chain
# Counter is a frequency counter. It will automatically update counts as new elements are updated
from collections import Counter

First, lets check out how many unique items frequently appear. <br>
We will also set a threshold for minimum counts in order for it to be considered "frequent"

In [176]:
weeklySalesGroupProductSets.values

array([ {'Eldon Radial Chair Mat for Low to Medium Pile Carpets', 'Electrix Halogen Magnifier Lamp', 'Eldon Shelf Savers™ Cubes and Bins', 'Storex Dura Pro™ Binders'},
       {'Trav-L-File Heavy-Duty Shuttle II, Black'}, {'i270'}, ...,
       {'Hon GuestStacker Chair'},
       {'GBC Poly Designer Binding Covers', 'Global High-Back Leather Tilter, Burgundy'},
       {'Bevis Round Conference Table Top, X-Base'}], dtype=object)

In [194]:
# chain is used to flatten all the items in a nested data structure e.g. list of lists
# the return type is an iterator which is a special data object that reads through 
# large datasets through the __next__() function
allBasketProducts = chain(*weeklySalesGroupProductSets.values)
allBasketProducts

<itertools.chain at 0x11b979940>

In [195]:
# we can also force the iterator into a list for our convenience
allBasketProducts = list(allBasketProducts)
allBasketProducts

['Eldon Radial Chair Mat for Low to Medium Pile Carpets',
 'Electrix Halogen Magnifier Lamp',
 'Eldon Shelf Savers™ Cubes and Bins',
 'Storex Dura Pro™ Binders',
 'Trav-L-File Heavy-Duty Shuttle II, Black',
 'i270',
 'Avery White Multi-Purpose Labels',
 'Avery 491',
 'Executive Impressions 8-1/2" Career Panel/Partition Cubicle Clock',
 'Peel & Seel® Recycled Catalog Envelopes, Brown',
 'Hon 61000 Series Interactive Training Tables',
 'Wirebound Message Book, 4 per Page',
 'Canon MP41DH Printing Calculator',
 'Xerox 213',
 'Safco Industrial Wire Shelving',
 'Crayola Colored Pencils',
 '3M Organizer Strips',
 'Large Capacity Hanging Post Binders',
 'LX 677',
 'Eldon® Wave Desk Accessories',
 'Boston 1799 Powerhouse™ Electric Pencil Sharpener',
 'Eldon Radial Chair Mat for Low to Medium Pile Carpets',
 'Rediform Wirebound "Phone Memo" Message Book, 11 x 5-3/4',
 '2180',
 'Holmes Replacement Filter for HEPA Air Cleaner, Large Room',
 'Belkin ErgoBoard™ Keyboard',
 'Executive Impressions 14

In [198]:
# the counter data object is a convenient frequency counter
allBasketProductsCounter = Counter(allBasketProducts)
allBasketProductsCounter

Counter({'Xerox 1924': 4,
         'Micro Innovations Micro 3000 Keyboard, Black': 8,
         'Holmes Cool Mist Humidifier for the Whole House with 8-Gallon Output per Day, Extended Life Filter': 1,
         'TDK 4.7GB DVD-R Spindle, 15/Pack': 4,
         'Xerox 1973': 2,
         'Career Cubicle Clock, 8 1/4", Black': 8,
         'Avery 520': 7,
         'Xerox 1998': 2,
         'Safco Contoured Stacking Chairs': 2,
         'Kensington 6 Outlet Guardian Standard Surge Protector': 4,
         'Fellowes Premier Superior Surge Suppressor, 10-Outlet, With Phone and Remote': 2,
         'Accessory31': 8,
         'Computer Room Manger, 14"': 1,
         'Xerox 1937': 8,
         'iDENi80s': 3,
         'Fellowes Internet Keyboard, Platinum': 8,
         'Telephone Message Books with Fax/Mobile Section, 5 1/2" x 3 3/16"': 7,
         'Staples Wirebound Steno Books, 6" x 9", 12/Pack': 8,
         'Avery 501': 12,
         'Panasonic KX-P1131 Dot Matrix Printer': 6,
         'Xerox 20': 3,

In [199]:
allBasketProductsCounter.most_common()

[('Global High-Back Leather Tilter, Burgundy', 27),
 ('Bevis 36 x 72 Conference Tables', 25),
 ('Master Giant Foot® Doorstop, Safety Yellow', 24),
 ('BoxOffice By Design Rectangular and Half-Moon Meeting Room Tables', 24),
 ('Wilson Jones Hanging View Binder, White, 1"', 23),
 ('Fiskars® Softgrip Scissors', 22),
 ('Peel & Seel® Recycled Catalog Envelopes, Brown', 22),
 ('Xerox 210', 21),
 ('80 Minute CD-R Spindle, 100/Pack - Staples', 20),
 ('Staples® General Use 3-Ring Binders', 20),
 ('Staples 6 Outlet Surge', 20),
 ('Bush Westfield Collection Bookcases, Fully Assembled', 20),
 ('Avery Flip-Chart Easel Binder, Black', 20),
 ('Computer Printout Paper with Letter-Trim Perforations', 20),
 ('Avery 494', 19),
 ('Belkin 6 Outlet Metallic Surge Strip', 19),
 ('US Robotics 56K V.92 External Faxmodem', 19),
 ("O'Sullivan 3-Shelf Heavy-Duty Bookcases", 19),
 ('Storex DuraTech Recycled Plastic Frosted Binders', 19),
 ('Boston 1730 StandUp Electric Pencil Sharpener', 19),
 ('Avery 508', 18),
 (

In [203]:
# extract a set of frequently bought items 
# arbitrarily we will set a threshold of 10
frequentItemSets = set()
for productName, count in allBasketProductsCounter.most_common():
    if count > 10:
        frequentItemSets.add(productName)
frequentItemSets

# for the programmers: try changing this into a list comprehension

{'#10 White Business Envelopes,4 1/8 x 9 1/2',
 '#10- 4 1/8" x 9 1/2" Recycled Envelopes',
 '*Staples* Highlighting Markers',
 '12 Colored Short Pencils',
 '12-1/2 Diameter Round Wall Clock',
 '1726 Digital Answering Machine',
 '2160i',
 '2180',
 '3M Organizer Strips',
 '80 Minute CD-R Spindle, 100/Pack - Staples',
 '80 Minute Slim Jewel Case CD-R , 10/Pack - Staples',
 'Accessory21',
 'Accessory34',
 'Accessory35',
 'Accessory36',
 'Acco® Hot Clips™ Clips to Go',
 'Acme® 8" Straight Scissors',
 'Acme® Box Cutter Scissors',
 'Adams Phone Message Book, 200 Message Capacity, 8 1/16” x 11”',
 'Advantus Push Pins, Aluminum Head',
 'Aluminum Document Frame',
 'Anderson Hickey Conga Table Tops & Accessories',
 'Angle-D Binders with Locking Rings, Label Holders',
 'Array® Memo Cubes',
 'Array® Parchment Paper, Assorted Colors',
 'Avery 474',
 'Avery 479',
 'Avery 481',
 'Avery 487',
 'Avery 491',
 'Avery 493',
 'Avery 494',
 'Avery 498',
 'Avery 501',
 'Avery 506',
 'Avery 508',
 'Avery 51',


In [205]:
len(frequentItemSets), len(allBasketProductsCounter)

(209, 1255)

Here, see that only 200+ purchases are bought frequently. That limits us to any combinations of these 200 products
across all weekly purchases. <br>
In other words, out of 1200 products, only 200 have a slightly higher chance of appearing in the same basket<br>
Let's continue on with the analysis to further illustrate this point

In [166]:
# iter through all basket sets with only the filtered items.

In [207]:
exampleWeeklyBasket = weeklySalesGroupProductSets.values[0]
exampleWeeklyBasket

{'Eldon Radial Chair Mat for Low to Medium Pile Carpets',
 'Eldon Shelf Savers™ Cubes and Bins',
 'Electrix Halogen Magnifier Lamp',
 'Storex Dura Pro™ Binders'}

In [211]:
# if we are interested in only the frequent items, we can filter the weekly basket to only those items
# we make use of the sets operation here
# set operations are & for intersect, | for union, - for difference, ^ for symmetric difference
exampleWeeklyBasket & frequentItemSets

set()

In [214]:
# show examples of the weekly basket items; only those with more than 1 item
for weeklyBasket in weeklySalesGroupProductSets.values:
    frequentItemsWeeklyBasket = weeklyBasket & frequentItemSets
    if frequentItemsWeeklyBasket and len(frequentItemsWeeklyBasket) >= 2:
        print(frequentItemsWeeklyBasket)

{'Avery White Multi-Purpose Labels', 'Avery 491'}
{'White GlueTop Scratch Pads', 'Black Print Carbonless Snap-Off® Rapid Letter, 8 1/2" x 7"'}
{'Newell 312', 'Bretford CR8500 Series Meeting Room Furniture'}
{'Imation Neon Mac Format Diskettes, 10/Pack', 'Newell 310'}
{'Durable Pressboard Binders', 'Park Ridge™ Embossed Executive Business Envelopes'}
{'Rediform S.O.S. Phone Message Books', 'GBC Standard Therm-A-Bind Covers'}
{'Avery Poly Binder Pockets', 'Aluminum Document Frame'}
{'Newell 312', 'Accessory35'}
{'Eureka Disposable Bags for Sanitaire® Vibra Groomer I® Upright Vac', 'Avery 510'}
{'Global Adaptabilities™ Conference Tables', 'Accessory35'}
{'1726 Digital Answering Machine', 'Avery Flip-Chart Easel Binder, Black'}
{"O'Sullivan Elevations Bookcase, Cherry Finish", 'BoxOffice By Design Rectangular and Half-Moon Meeting Room Tables'}
{'Master Giant Foot® Doorstop, Safety Yellow', 'Black Print Carbonless Snap-Off® Rapid Letter, 8 1/2" x 7"'}
{'*Staples* Highlighting Markers', 'Pa

In [217]:
exampleFrequentItemsWeeklyBasket = {'80 Minute CD-R Spindle, 100/Pack - Staples', 'Executive Impressions 12" Wall Clock', 'Global Adaptabilities™ Conference Tables'}
exampleFrequentItemsWeeklyBasket

{'80 Minute CD-R Spindle, 100/Pack - Staples',
 'Executive Impressions 12" Wall Clock',
 'Global Adaptabilities™ Conference Tables'}

In [None]:
# we want to create all pairwise combinations if items in this example basket 

In [219]:
pairCombinations = combinations(exampleFrequentItemsWeeklyBasket, 2)
list(pairCombinations)

[('Global Adaptabilities™ Conference Tables',
  'Executive Impressions 12" Wall Clock'),
 ('Global Adaptabilities™ Conference Tables',
  '80 Minute CD-R Spindle, 100/Pack - Staples'),
 ('Executive Impressions 12" Wall Clock',
  '80 Minute CD-R Spindle, 100/Pack - Staples')]

In [172]:
# take note of all pairwise combinations and store them into a list called basketCombinations
basketCombinations = []
for weeklyBasket in weeklySalesGroupProductSets.values:
    frequentItemsWeeklyBasket = weeklyBasket & frequentItemSets
    if frequentItemsWeeklyBasket and len(frequentItemsWeeklyBasket) >= 2: 
        frequentItemsWeeklyBasketCombinations = combinations(frequentItemsWeeklyBasket, 2)
        for aCombination in frequentItemsWeeklyBasketCombinations:
            basketCombinations += [aCombination]

In [220]:
# we use the frequency counter on these combinations to see how often they appear
Counter(basketCombinations).most_common()

[(('Xerox 23',
   'Belkin 8 Outlet SurgeMaster II Gold Surge Protector with Phone Protection'),
  2),
 (('T60', 'Bush Mission Pointe Library'), 2),
 (('Wirebound Message Books, 2 7/8" x 5", 3 Forms per Page',
   'Tenex 46" x 60" Computer Anti-Static Chairmat, Rectangular Shaped'),
  2),
 (('Eureka Disposable Bags for Sanitaire® Vibra Groomer I® Upright Vac',
   'Recycled Premium Regency Composition Covers'),
  2),
 (('Document Clip Frames', 'Newell 337'), 2),
 (('Computer Printout Paper with Letter-Trim Perforations',
   'Carina Double Wide Media Storage Towers in Natural & Black'),
  2),
 (('Avery Hi-Liter Pen Style Six-Color Fluorescent Set', 'Avery 501'), 2),
 (('Newell 337', 'Hon 94000 Series Round Tables'), 2),
 (('Hoover Portapower™ Portable Vacuum',
   'Office Star - Contemporary Task Swivel chair with 2-way adjustable arms, Plum'),
  2),
 (('Cardinal Poly Pocket Divider Pockets for Ring Binders',
   'Holmes Replacement Filter for HEPA Air Cleaner, Medium Room'),
  2),
 (('Xerox

This is the final proof that affinity analysis is not suitable for this dataset and we do not need to go further.<br>

# Additional Exercises:
1. Discussion Exercise: Should we keep "bad" results? 
2. What other ways could we bin or basket the data?

Extra References:<br>
http://pbpython.com/market-basket-analysis.html <br>
A more complicated implementation of market basket analysis but it uses the apriori algorithm to extract the frequent item dataset; what we have used in this workbook is a simple filter.

