# Content Based Recommender Systems Pt. 1

A content based recommender system recommends items based on the characteristics of those items and their similarity to items that have already been rated/purchased/liked/etc. by a user.

In this notebook, we'll be looking a data set containing the attributes of different scotches.

In [1]:
# import packages we'll be using
import pandas as pd
from scipy.spatial.distance import cdist

In [4]:
# read the data into a DataFrame
scotch_data = pd.read_csv('../data/scotch.csv')
scotch_data.set_index('Distillery', inplace=True)
scotch_data.head()

Unnamed: 0_level_0,RowID,Body,Sweetness,Smoky,Medicinal,Tobacco,Honey,Spicy,Winey,Nutty,Malty,Fruity,Floral,Postcode,Latitude,Longitude
Distillery,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Aberfeldy,1,2,2,2,0,0,2,1,2,2,2,2,2,\tPH15 2EB,286580,749680
Aberlour,2,3,3,1,0,0,4,3,2,2,3,3,2,\tAB38 9PJ,326340,842570
AnCnoc,3,1,3,2,0,0,2,0,0,2,2,3,2,\tAB5 5LI,352960,839320
Ardbeg,4,4,1,4,4,0,0,2,0,1,2,1,0,\tPA42 7EB,141560,646220
Ardmore,5,2,2,2,0,0,1,1,1,2,3,1,1,\tAB54 4NH,355350,829140


Now we need to determine which item properties will be used to calculate similarity

In [5]:
properties = ['Body',
              'Sweetness',
              'Smoky',
              'Medicinal',
              'Tobacco',
              'Honey',
              'Spicy',
              'Winey',
              'Nutty',
              'Malty',
              'Fruity',
              'Floral']

scotch_props = scotch_data[properties]

In [6]:
scotch_props.head()

Unnamed: 0_level_0,Body,Sweetness,Smoky,Medicinal,Tobacco,Honey,Spicy,Winey,Nutty,Malty,Fruity,Floral
Distillery,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Aberfeldy,2,2,2,0,0,2,1,2,2,2,2,2
Aberlour,3,3,1,0,0,4,3,2,2,3,3,2
AnCnoc,1,3,2,0,0,2,0,0,2,2,3,2
Ardbeg,4,1,4,4,0,0,2,0,1,2,1,0
Ardmore,2,2,2,0,0,1,1,1,2,3,1,1


Content recommenders work by finding items that are similar to items the user likes. So first, we need to determine which scotches the user likes and then calculate their preference. One way to do this is to average the properties of the items they've liked.

In [7]:
user_scotches = ['Macallan',
                 'Glenlivet']

user_preferences = scotch_props[scotch_props.index.isin(user_scotches)].mean(axis=0)

user_preferences

Body         3.0
Sweetness    3.0
Smoky        1.0
Medicinal    0.0
Tobacco      0.0
Honey        2.0
Spicy        1.5
Winey        3.0
Nutty        1.5
Malty        2.0
Fruity       2.5
Floral       2.0
dtype: float64

We now need to calculate the similarity (or dissimilarity) bewteen all of the scotches in the data set and the user's preferences.

In [9]:
metric = 'euclidean' # what other distance metrics can we use?
distances = cdist(scotch_props, user_preferences.to_frame().T, metric=metric).squeeze()
distances = pd.Series(data=distances,
                      index=scotch_props.index)

distances = distances[~distances.index.isin(user_scotches)] # remove scotches user likes

Now we need to find the scotches that are most similar to the user preferences.

In [10]:
distances.sort_values()[:5]

Distillery
Glendullan        1.658312
Auchroisk         1.936492
Aberfeldy         2.179449
RoyalLochnagar    2.179449
Glenfarclas       2.179449
dtype: float64

Let's create a function that takes scotches liked by the user and return recommended scotches

In [11]:
def scotch_recommender(user_scotches, n_recs=5):
    user_preferences = (scotch_props.loc[scotch_props.index.isin(user_scotches)]
                                    .mean(axis=0))
    distances = cdist(scotch_props, user_preferences.to_frame().T).squeeze()
    distances = pd.Series(data=distances,
                      index=scotch_props.index)
    distances = distances[~distances.index.isin(user_scotches)]
    
    return distances.sort_values()[:n_recs].index.tolist()

In [13]:
scotch_recommender(['Glenlivet', 'Glenfarclas'], n_recs=3)

['Glenturret', 'BlairAthol', 'Auchroisk']