# Chapter 14: Association Rules and Collaborative Filtering


> (c) 2019-2020 Galit Shmueli, Peter C. Bruce, Peter Gedeck 
>
> _Data Mining for Business Analytics: Concepts, Techniques, and Applications in Python_ (First Edition) 
> Galit Shmueli, Peter C. Bruce, Peter Gedeck, and Nitin R. Patel. 2019.
>
> Date: 2020-03-08
>
> Python Version: 3.8.2
> Jupyter Notebook Version: 5.6.1
>
> Packages:
>   - mlxtend: 0.17.2
>   - numpy: 1.18.1
>   - pandas: 1.0.1
>   - scipy: 1.4.1
>   - scikit-learn: 0.22.2
>   - scikit-surprise: 1.1.0
>
> The assistance from Mr. Kuber Deokar and Ms. Anuja Kulkarni in preparing these solutions is gratefully acknowledged.


In [1]:
# Import required packages for this chapter
from pathlib import Path

import pandas as pd
import numpy as np
from scipy.spatial.distance import cosine
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from sklearn.metrics.pairwise import cosine_similarity

from surprise import Dataset
from surprise import Reader
from surprise import KNNBasic

%matplotlib inline

In [2]:
# Working directory:
#
# We assume that data are kept in the same directory as the notebook. If you keep your 
# data in a different folder, replace the argument of the `Path`
DATA = Path('.')
# and then load data using 
#
# pd.read_csv(DATA / ‘filename.csv’)

# Problem 14.1: Satellite Radio Customers
An analyst at a subscription-based
satellite radio company has been given a sample of data from their
customer database, with the goal of finding groups of customers who
are associated with one another. The data consist of company data,
together with purchased demographic data that are mapped to the
company data (see Table). The analyst
decides to apply association rules to learn more about the
associations between customers. Comment on this approach.

## Solution 14.1
Association rules is not the correct approach. It determines associations among items listed in the columns (demographic and other descriptor variables in this database), not associations between rows (customers in this database). Cluster analysis would be more appropriate.

# Problem 14.2: Identifying Course Combinations
The Institute for Statistics Education at Statistics.com offers online courses in statistics and analytics, and is seeking information that will help in packaging and sequencing courses.  Consider the data in the file _CourseTopics.csv_, the first few rows of which are shown in the Table. These data are for purchases of online statistics courses at Statistics.com. Each row represents the courses attended by a single customer.
The firm wishes to assess alternative sequencings and bundling of courses. Use association rules to analyze these data, and interpret several of the resulting rules.

In [3]:
df = pd.read_csv(DATA / 'CourseTopics.csv')
df.head()

Unnamed: 0,Intro,DataMining,Survey,Cat Data,Regression,Forecast,DOE,SW
0,1,1,0,0,0,0,0,0
1,0,0,1,0,0,0,0,0
2,0,1,0,1,1,0,0,1
3,1,0,0,0,0,0,0,0
4,1,1,0,0,0,0,0,0


## Solution 14.2


In [4]:
# create frequent itemsets
itemsets = apriori(df, min_support=0.01, use_colnames=True)

# and convert into rules
rules = association_rules(itemsets, metric='confidence', min_threshold=0.1)
rules.sort_values(by=['lift'], ascending=False).head(6)

print(rules.sort_values(by=['lift'], ascending=False)
      .drop(columns=['antecedent support', 'consequent support', 'conviction'])
      .head(6))

                antecedents             consequents   support  confidence  \
316            (Intro, DOE)        (SW, Regression)  0.019178    0.411765   
321        (SW, Regression)            (Intro, DOE)  0.019178    0.350000   
319       (Regression, DOE)             (Intro, SW)  0.019178    0.636364   
318             (Intro, SW)       (Regression, DOE)  0.019178    0.200000   
249     (Intro, DataMining)  (Forecast, Regression)  0.013699    0.250000   
248  (Forecast, Regression)     (Intro, DataMining)  0.013699    0.357143   

         lift  leverage  
316  7.514706  0.016626  
321  7.514706  0.016626  
319  6.636364  0.016288  
318  6.636364  0.016288  
249  6.517857  0.011597  
248  6.517857  0.011597  


In [5]:
# filter rules to have only one consequent
single_rules = rules[[len(c) == 1 for c in rules.consequents]]
(single_rules.sort_values(by=['lift'], ascending=False)
      .drop(columns=['antecedent support', 'consequent support', 'conviction'])
      .head(6))

Unnamed: 0,antecedents,consequents,support,confidence,lift,leverage
243,"(Forecast, Intro, Regression)",(DataMining),0.013699,0.714286,4.010989,0.010283
262,"(Intro, Survey, DOE)",(Cat Data),0.010959,0.8,3.842105,0.008107
233,"(Intro, DataMining, Cat Data)",(Regression),0.016438,0.75,3.601974,0.011875
255,"(Intro, Survey, Cat Data)",(Forecast),0.013699,0.5,3.578431,0.009871
245,"(Intro, DataMining, Regression)",(Forecast),0.013699,0.5,3.578431,0.009871
312,"(Intro, Regression, DOE)",(SW),0.019178,0.777778,3.504801,0.013706


Interpreting some rules:

- The first rule is "If Intro, Regression and Forecasting are taken, Data Mining is also taken." It has confidence of 71.4%, and a lift of 4.
- The second rule is "If Intro, Survey and DOE are taken, Categorical Data is also taken." It has confidence of 80% and lift of 3.84.

The support (a U c) for all rules is very low. under 4%. This means that the 
applicability of these rules is not great, and also that the chances are 
greater that we are not picking up true associations that will persist into 
the future -- just random noise.

# Problem 14.3: Recommending Courses
We again consider the data in _CourseTopics.csv_ describing course purchases at Statistics.com (see Problem 14.2 and data sample in Table). We want to provide a course recommendation to a student who purchased the Regression and Forecast courses. Apply user-based collaborative filtering to the data. All recommendations will be 1. Explain why this happens.

## Solution 14.3

In [6]:
course_df = pd.read_csv(DATA / 'Coursetopics.csv')
course_df.head()

Unnamed: 0,Intro,DataMining,Survey,Cat Data,Regression,Forecast,DOE,SW
0,1,1,0,0,0,0,0,0
1,0,0,1,0,0,0,0,0
2,0,1,0,1,1,0,0,1
3,1,0,0,0,0,0,0,0
4,1,1,0,0,0,0,0,0


In [7]:
ratings = []
for customer, row in course_df.iterrows():
    for course, value in row.iteritems():
        if value==0: continue
        ratings.append([customer, course, value])
ratings = pd.DataFrame(ratings, columns=['customer', 'course', 'rating'])

reader = Reader(rating_scale=(1, 1))
data = Dataset.load_from_df(ratings, reader)
trainset = data.build_full_trainset()
sim_options = {'name': 'cosine', 'user_based': False}  # compute cosine similarities between users
algo = KNNBasic(sim_options=sim_options)
algo.fit(trainset)

predictions = []
for user in course_df.index:
    predictions.append([algo.predict(user, course).est for course in course_df])
predictions = pd.DataFrame(predictions, columns=course_df.columns)
predictions.head()

Computing the cosine similarity matrix...
Done computing similarity matrix.


Unnamed: 0,Intro,DataMining,Survey,Cat Data,Regression,Forecast,DOE,SW
0,1,1,1,1,1,1,1,1
1,1,1,1,1,1,1,1,1
2,1,1,1,1,1,1,1,1
3,1,1,1,1,1,1,1,1
4,1,1,1,1,1,1,1,1


The resulting predictions are all 1. This is because the input is not a rating matrix but a binary one. 

# Problem 14.4: Cosmetics Purchases
The data and rules shown in the book are based on a subset of a dataset on cosmetic purchases (_Cosmetics.csv_) at a large chain drugstore. The store wants to analyze associations among purchases of these items for purposes of point-of-sale display, guidance to
sales personnel in promoting cross-sales, and guidance for piloting an
eventual time-of-purchase electronic recommender system to boost
cross-sales. Consider first only the data shown in Table, given in binary matrix form.

In [8]:
cosmetics_df = pd.read_csv(DATA / 'Cosmetics.csv', index_col='Trans. ')
cosmetics_df.head()

Unnamed: 0_level_0,Bag,Blush,Nail Polish,Brushes,Concealer,Eyebrow Pencils,Bronzer,Lip liner,Mascara,Eye shadow,Foundation,Lip Gloss,Lipstick,Eyeliner
Trans.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
1,0,1,1,1,1,0,1,1,1,0,0,0,0,1
2,0,0,1,0,1,0,1,1,0,0,1,1,0,0
3,0,1,0,0,1,1,1,1,1,1,1,1,1,0
4,0,0,1,1,1,0,1,0,0,0,1,0,0,1
5,0,1,0,0,1,0,1,1,1,1,0,1,1,0


## Solution 14.4.a
Select several values in the matrix and explain their meaning.

The "0" in the first row, first column under "bag" indicates that, in the 
first transaction (i.e. the first row), no bag was purchased. The "1" to its 
right indicates that blush was purchased in that first transaction.


## Solution 14.4.b
Consider the results of the association rules analysis shown in the book.

### Solution 14.4.b.i
For the first row, explain the `confidence` output and how it is calculated.

If Blush, Concealer, Mascara, Eye.shadow, Lipstick were purchased, 30% of the 
time Eyebrow.Pencils were also purchased. The calculation is:

> 100 * (# transactions with Blush + Concealer + Mascara + Eye.shadow + Lipstick) / (# transactions with Eyebrow.Pencils)


### Solution 14.4.b.ii
For the first row, explain the `support` output and how it is calculated.

The Support of 0.013 means there were 13 transactions in which 
> Blush + Concealer + Mascara + Eye.shadow + Lipstick + Eyebrow.Pencils 

were purchased.


### Solution 14.4.b.iii
For the first row, explain the `lift` and how it is calculated.

Lift Ratio = 7.19 means we are 7.19 times more likely to find a transaction 
with Eyebrow.Pencils IF we look only in those transactions where 
> Blush + Concealer + Mascara + Eye.shadow + Lipstick 

are purchased, compared to searching randomly in all transactions.
The calculation:

> ((# trans with Eyebrow.Pencils + Blush + Concealer + Mascara + Eye.shadow + Lipstick)/
   (# trans with Blush + Concealer + Mascara + Eye.shadow + Lipstick)) / 
  ((# trans with Eyebrow.Pencils) / (all transactions))


### Solution 14.4.b.iv
For the first row, explain the rule that is represented there in words.

The rule is "If a transaction includes Blush + Concealer + Mascara + Eye.shadow + Lipstick, 
it will also include Eyebrow.Pencils."  

If we are searching for transactions with Eyebrow.Pencils, limiting our
search to transactions with Blush + Concealer + Mascara + Eye.shadow + Lipstick will 
increase our probability of success by a factor of 7.19.

## Solution 14.4.c
Now, use the complete dataset on the cosmetics purchases (in the file _Cosmetics.csv_). Using Python, apply association rules to these data (for _apriori_ use `min_support=0.1` and `use_colnames=True`, for _association_rules_ use default parameters).

In [9]:
frequentItemsets = apriori(cosmetics_df, min_support=0.1, use_colnames=True)

rules = association_rules(frequentItemsets)

(rules.sort_values(by=['lift'], ascending=False).head(20)
     .drop(columns=['antecedent support', 'consequent support', 'conviction']))

Unnamed: 0,antecedents,consequents,support,confidence,lift,leverage
0,(Brushes),(Nail Polish),0.149,1.0,3.571429,0.10728
21,"(Concealer, Eye shadow, Blush)",(Mascara),0.119,0.959677,2.688172,0.074732
4,"(Eye shadow, Blush)",(Mascara),0.169,0.928571,2.60104,0.104026
6,"(Eye shadow, Nail Polish)",(Mascara),0.119,0.908397,2.544529,0.072233
11,"(Eye shadow, Concealer)",(Mascara),0.179,0.890547,2.49453,0.107243
13,"(Eye shadow, Bronzer)",(Mascara),0.124,0.879433,2.463397,0.073663
23,"(Eye shadow, Concealer, Eyeliner)",(Mascara),0.114,0.876923,2.456367,0.06759
5,"(Mascara, Blush)",(Eye shadow),0.169,0.918478,2.410704,0.098896
17,"(Lipstick, Eye shadow)",(Mascara),0.11,0.852713,2.388552,0.063947
18,"(Lipstick, Mascara)",(Eye shadow),0.11,0.909091,2.386065,0.063899


### Solution 14.4.c.i
Interpret the first three rules in the output in words.

In [10]:
(rules.sort_values(by=['lift'], ascending=False).head(3)
     .drop(columns=['antecedent support', 'consequent support', 'conviction']))

Unnamed: 0,antecedents,consequents,support,confidence,lift,leverage
0,(Brushes),(Nail Polish),0.149,1.0,3.571429,0.10728
21,"(Concealer, Eye shadow, Blush)",(Mascara),0.119,0.959677,2.688172,0.074732
4,"(Eye shadow, Blush)",(Mascara),0.169,0.928571,2.60104,0.104026


- First row: If brushes are purchased, nail polish is purchased. This rule has 100% confidence -- purchasing a brush guarantees purchase of nail polish. It has lift of 3.6, and support of about 15% (149 transactions out of 1000) for the two items together.

- Second row: if nail Blush, Concealer and Eye.shadow are purchased, Mascara is purchased. This rule has confidence of 96% -- if Blush, Concealer and Eye.shadow are purchased, Mascara is 96% likely to be purchased as well. It has lift of 3.571, and support of about 12%.

- Third row: If Blush and Eye.shadow are purchased, Mascara is also purchased. This rule has confidence of 93%, lift of 2.6, and support of about 17%.


### Solution 14.4.c.ii
Reviewing the first couple of dozen rules, comment on their redundancy and how you would assess their utility.

First, a note about utility. From a static retail presentation perspective 
(buy X together with Y), the shopper's attention can probably only handle a 
couple of rules. Coupon and web offer generating systems have no such limit, 
because, while one or two offers are presented to a give customer at a given 
time, other customers, and this customer at a different time may receive 
different offers.

Many rules come in pairs that are mirror images of one another, 
so we can tackle them that way.

The first rule is certain so no need to make an offer.

All remaining rules involve mascara, mostly as a consequent. Mascara is a good 
bet as a companion product in general -- say for a retail display. 

Rules 2-10 could be consolidated into a general offer covering the 5 products 
that keep reappearing in these "multi-item" rules: eyeliner, mascara, concealer,
eyeshadow, and blush. These seem to be the favorites of big spenders, so a 
"buy 3, 50% off on two others" or something similar might work.

# Problem 14.5: Course ratings
The Institute for Statistics Education at Statistics.com asks students to rate a variety of aspects of a course as soon as the student completes it. The Institute is contemplating instituting a recommendation system that would provide students with recommendations for additional courses as soon as they submit their rating for a completed course.  Consider the excerpt from student ratings of online statistics courses shown in Table 14.7, and the problem of what to recommend to student E.N.

In [11]:
rating_df = pd.read_csv(DATA / 'courserating.csv')
rating_df = rating_df.set_index('Unnamed: 0')
rating_df

Unnamed: 0_level_0,SQL,Spatial,PA1,DM in R,Python,Forecast,R Prog,Hadoop,Regression
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
LN,4.0,,,,3.0,2.0,4.0,,2.0
MH,3.0,4.0,,,4.0,,,,
JH,2.0,2.0,,,,,,,
EN,4.0,,,4.0,,,4.0,,3.0
DU,4.0,4.0,,,,,,,
FL,,4.0,,,,,,,
GL,,4.0,,,,,,,
AH,,3.0,,,,,,,
SA,,,4.0,,,,,,
RW,,,2.0,,,,,4.0,


## Solution 14.5.a
First consider a user-based collaborative filter.  This requires computing correlations between all student pairs. 
For which students is it possible to compute correlations with E.N.? Compute them.

We need to identify the users that share ratings with E.N. These are: L.N., M.H., J.H., D.U., and D.S. However, only L.N. and D.S. share more than one rating with E.N. 

To compute this correlation, we first compute average rating by each of these 
students.  Note that the average is computed over a different number of 
courses for each of these students, because they each rated a different set 
of courses.

Average ratings:

- LN: (4 + 3 + 2 + 4 + 2) / 5 = 3
- EN: (4 + 4 + 4 + 3) / 4 = 3.75
- DS: (4 + 2 + 4) / 3 = 3.33

Co-rated courses for users EN and LN: SQL, R Prog, Regression.

- Denominator LN: sqrt((4-3)^2 + (4-3)^2 + (2-3)^2) = 1.732051
- Denominator EN: sqrt((4-3.75)^2 + (4-3.75)^2 + (3-3.75)^2) = 0.8291562

**Corr(LN, EN) = ((4-3)*(4-3.75) + (4-3)*(4-3.75) + (2-3)*(3-3.75)) / (1.732051 * 0.8291562) = 0.8703882**

Co-rated courses for users EN and LN: SQL, DM in R, R Prog.

- Denominator EN: sqrt((4-3.75)^2 + (4-3.75)^2 + (4-3.75)^2) = 0.4330127
- Denominator DS: sqrt((4-3.33)^2 + (2-3.33)^2 + (4-3.33)^2) = 1.633003

**Corr(EN, DS) = ((4-3.75)*(4-3.33) + (4-3.75)*(2-3.33) + (4-3.75)*(4-3.33)) / (0.4330127 * 1.633003) = 0.003535513**

## Solution 14.5.b
Based on the single nearest student to E.N., which single course should we recommend to E.N.? Explain why. 

From the correlations computed in (a) above, student LN is nearest to EN. Among the courses that LN has taken (but not taken by EN), Python is highly preferred by LN. So Python should be recommended to EN.


## Solution 14.5.c
Use _scikit-learn_ function `sklearn.metrics.pairwise.cosine_similarity` to compute the cosine similarity between users. 

Co-rated courses for users EN and LN: SQL, R Prog, Regression.

- Denominator LN: sqrt(4^2 + 4^2 + 2^2) = 6
- Denominator EN: sqrt(4^2 + 4^2 + 3^2) = 6.403124

**Cosine(LN, EN) = (4*4 + 4*4 + 2*3) / (6 * 6.403124) = 0.9891005**

Co-rated courses for users EN and LN: SQL, DM in R, R Prog.

- Denominator EN: sqrt(4^2 + 4^2 + 4^2) = 6.928203
- Denominator DS: sqrt(4^2 + 2^2 + 4^2) = 6

**Cosine(EN, DS) = (4*4 + 4*2 + 4*4) / (6.928203 * 6) = 0.9622505**

In [12]:
print('cosine(LN, EN) = ',cosine_similarity(rating_df.loc[['LN', 'EN'], ['SQL', 'R Prog', 'Regression']])[0, 1])
print('cosine(EN, DS) = ',cosine_similarity(rating_df.loc[['EN', 'DS'], ['SQL', 'DM in R', 'R Prog']])[0, 1])

cosine(LN, EN) = 

 0.9891004919611718
cosine(EN, DS) =  0.9622504486493764


Here is an implementation of cosine similarity that handles NaN data. 

In [13]:
def cosine_similarity_NA(data):
    m = data.shape[0]
    # Initialize the similarity matrix to np.nan
    result = np.full((m, m), np.nan)
    # Iterate over all pairs of columns
    for i in range(m):
        # maski is true if a value exists in column i
        maski = ~np.isnan(data.iloc[i])
        for j in range(i, m):
            # maskij is true if a value exists in both column i and j
            maskij = maski & ~np.isnan(data.iloc[j])
            if np.any(maskij):
                # if there are values in both columns, calculate the cosine similarity
                # for these
                result[i, j] = 1 - cosine(data.iloc[i][maskij], data.iloc[j][maskij])
                result[j, i] = result[i, j]
    return pd.DataFrame(result, columns=data.index, index=data.index)

cosineSimilarity = cosine_similarity_NA(rating_df)
cosineSimilarity

Unnamed: 0,LN,MH,JH,EN,DU,FL,GL,AH,SA,RW,BA,MG,AF,KG,DS
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
LN,1.0,0.96,1.0,0.9891,1.0,,,,,,,1.0,,,1.0
MH,0.96,1.0,0.989949,1.0,0.989949,1.0,1.0,1.0,,,,,,,1.0
JH,1.0,0.989949,1.0,1.0,1.0,1.0,1.0,1.0,,,,,,,1.0
EN,0.9891,1.0,1.0,1.0,1.0,,,,,,,,,,0.96225
DU,1.0,0.989949,1.0,1.0,1.0,1.0,1.0,1.0,,,,,,,1.0
FL,,1.0,1.0,,1.0,1.0,1.0,1.0,,,,,,,
GL,,1.0,1.0,,1.0,1.0,1.0,1.0,,,,,,,
AH,,1.0,1.0,,1.0,1.0,1.0,1.0,,,,,,,
SA,,,,,,,,,1.0,1.0,1.0,1.0,1.0,1.0,
RW,,,,,,,,,1.0,1.0,1.0,1.0,1.0,1.0,


We can also calculate the cosine similarity after converting the rating matrix into binary form (course taken or not). In this case, we get:

In [14]:
binary_df = rating_df.copy()
binary_df[~np.isnan(binary_df)] = 1
binary_df[np.isnan(binary_df)] = 0
print('cosine(LN, EN) = ', cosine_similarity(binary_df)[0, 3])
print('cosine(EN, DS) = ', cosine_similarity(binary_df)[3, 14])

cosine(LN, EN) =  0.6708203932499369
cosine(EN, DS) =  0.8660254037844388


## Solution 14.5.d
Based on the cosine similarities of the nearest students to E.N., which course should be recommended to E.N.?

From the cosine similarities based on course ratings, student LN is nearest to EN. Among the courses 
that LN has taken (but not taken by EN), Python is highly preferred by LN. 
So Python should be recommended to EN.

If we use the binary matrix, student DS is more similar to EN based on courses taken. However, as DS hasn't taken any courses other than the ones EN already took, we cannot make a recommendation in this case.

## Solution 14.5.e
What is the conceptual difference between using the
correlation 
as opposed to cosine similarities?
\[_Hint_: how are the missing values in the matrix handled in each case?\]

If we consider the rating matrix, both methods basically only consider co-rated items. Correlation uses the not co-rated items to calculate the averages which will impact the correlation. 

If we calculate the cosine-similarity after converting to a binary form, we use all items in the similarity calculation. Using the actual ratings only on co-rated items does not take into 
consideration items that are not co-rated, which may be useful information. 

Using the binary form, is more useful if not all items are rated by most users. On the other hand, if most items are rated by most users, using the actual ratings will add power to the analysis, compared to just using binary data.



## Solution 14.5.f
With large datasets, it is computationally difficult to compute user-based recommendations in real time, and an item-based approach is used instead. Returning to the rating data (not the binary matrix), let's now take that approach.

### Solution 14.5.f.i
If the goal is still to find a recommendation for E.N., for which course pairs is it possible and useful to calculate correlations?  


There is enough data to find correlations for the following pairs:    
- SQL - Spatial    
- SQL - DM in R    
- SQL - Python    
- DM in R - R Prog    
- Spatial - Python

However, EN has already taken SQL, DM in R, and R Prog.  Hence, only the Spatial and Python correlations are useful.

### Solution 14.5.f.ii
Just looking at the data, and without yet calculating course pair correlations, which course would you recommend to E.N., relying on item-based filtering?  Calculate two course pair correlations involving your guess and report the results. 

The SQL - Spatial ratings match the best, and there are more co-rated items, 
so Spatial would be the best guess.

In [15]:
print(cosine_similarity(rating_df.loc[['MH', 'JH', 'DU'], ['SQL', 'Spatial']].transpose()))
cosine_similarity_NA(rating_df.transpose())

[[1.         0.99037514]
 [0.99037514 1.        ]]


Unnamed: 0,SQL,Spatial,PA1,DM in R,Python,Forecast,R Prog,Hadoop,Regression
SQL,1.0,0.990375,,0.948683,0.96,1.0,1.0,,0.980581
Spatial,0.990375,1.0,,,1.0,,,,
PA1,,,1.0,,,1.0,,1.0,
DM in R,0.948683,,,1.0,,,0.948683,,1.0
Python,0.96,1.0,,,1.0,1.0,1.0,,1.0
Forecast,1.0,,1.0,,1.0,1.0,1.0,,1.0
R Prog,1.0,,,0.948683,1.0,1.0,1.0,,0.980581
Hadoop,,,1.0,,,,,1.0,
Regression,0.980581,,,1.0,1.0,1.0,0.980581,,1.0


## Solution 14.5.g
Apply item-based collaborative filtering to this dataset (using Python) and based on the results, recommend a course to E.N. 

In [16]:
# convert the rating_df dataframe into a format suitable for the Surprise package
ratings = []
for customer, row in rating_df.iterrows():
    for course, value in row.iteritems():
        if np.isnan(value): continue
        ratings.append([customer, course, value])
ratings = pd.DataFrame(ratings, columns=['customer', 'course', 'rating'])

reader = Reader(rating_scale=(1, 4))
data = Dataset.load_from_df(ratings, reader)
trainset = data.build_full_trainset()
# compute cosine similarities between items
sim_options = {'name': 'cosine', 'user_based': False}  
algo = KNNBasic(sim_options=sim_options)
algo.fit(trainset)

courses = rating_df.columns
for course in courses: 
    print(course, algo.predict('EN', course).est)

Computing the cosine similarity matrix...
Done computing similarity matrix.
SQL 3.7504416393899813
Spatial 4
PA1 3.433333333333333
DM in R 3.743416490252569
Python 3.6621621621621623
Forecast 3.6666666666666665
R Prog 3.7504416393899813
Hadoop 3.433333333333333
Regression 3.747548783981962


The item-based collaborative filtering recommends the **Spatial** course to E.N.