## Building a Recommendation Engine using Collaborative Filtering

This practice is based on this source: https://realpython.com/build-recommendation-engine-collaborative-filtering/

### Memory based

Memory based algorithms are based on statistical techniques.

#### 1. Find similar users on the basis of ratings

In [2]:
from scipy import spatial
a = [1, 2]
b = [2, 4]
c = [2.5, 4]
d = [4.5, 5]

# the euclidean distance between two points 
# is used to asses the similarity of rankings
print(spatial.distance.euclidean(c,a))
print(spatial.distance.euclidean(c,b))
print(spatial.distance.euclidean(c,d))

2.5
0.5
2.23606797749979


In [5]:
# using the cosine to find another perspective on similarity
# the lower the value, the greater the similarity
print(spatial.distance.cosine(c,a))
print(spatial.distance.cosine(c,b))
print(spatial.distance.cosine(c,d))
print(spatial.distance.cosine(a,b))

0.004504527406047898
0.004504527406047898
0.015137225946083022
0.0


More information about cosine similarity: https://en.wikipedia.org/wiki/Cosine_similarity

<b>Note</b>: Euclidean distance and cosine similarity are some of the approaches to find users similar to one another and even items similar to one another.

#### 2. Calculate the ratings

Once you found a set of users similar to a user <b>U</b>, the next step is to calculate the rating <b>R</b> that <b>U</b> would give to a certain item <b>I</b>.

One way is to average the rating of the top n most similar users to <b>U</b>. Another is to assign a weight to these top n most similar users because a few of them will be much more similar to <b>U</b> than the rest, in this case a weighted average might help. Using this second approach, the weight (or similarity factor) should be the inverse of the distance discussed in the last step (i.e. let d be the distance between points A and B, 1/d is the similarity factor, thus 1/10 < 1/3). In other words, less distance means higher similarity.