## Introduction to Data Science

### Data Science Tasks: Recommender Systems

Based on [this](https://www.datacamp.com/community/tutorials/recommender-systems-python), [this](https://www.analyticsvidhya.com/blog/2016/06/quick-guide-build-recommendation-engine-python/) and [this](http://www.data-mania.com/blog/recommendation-system-python/) blog posts

In [1]:
import os
import sys
import re
import math
import time
import string
import datetime

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.neighbors import NearestNeighbors

Specifying the path to the files:

In [11]:
outputs = "../outputs/"


There are three main classes of recommendation systems. Those are:

+ Collaborative filtering systems – Collaborative systems generate recommendations based on crowd-sourced input. They recommend items based on user behavior, and similarities between users. (An example is Google PageRank, which recommends similar web pages based on a web pages’ back links)
+ Content-based filtering systems – Content-based systems generate recommendations based on items and similarities between them. (Pandora uses content-based filtering to make its music recommendations)
+ Hybrid recommendation systems – Hybrid recommendation systems combine both collaborative and content-based approaches. They help improve recommendations that are derived from sparse datasets. (Netflix is a prime example of a hybrid recommender)

Collaborative systems often deploy a nearest neighbor method or a item-based collaborative filtering system – a simple system that makes recommendations based on simple regression or a weighted-sum approach. The end goal of collaborative systems is to make recommendations based on customers’ behavior, purchasing patterns, and preferences, as well as product attributes, price ranges, and product categories. Content-based systems can deploy methods as simple as averaging, or they can deploy advanced machine learning approaches in the form of Naive Bayes classifiers,  clustering algorithms or artificial neural nets.

#### First Example: quick and dirty similarity system:

First let's create a dataset called X, with 6 records and 2 features each.

In [4]:
X = np.array([[-1, 2], [4, -4], [-2, 1], [-1, 3], [-3, 2], [-1, 4]])
print(X)

[[-1  2]
 [ 4 -4]
 [-2  1]
 [-1  3]
 [-3  2]
 [-1  4]]


Next we will instantiate a nearest neighbor object, and call it nbrs. Then we will fit it to dataset X.

In [10]:
nbrs = NearestNeighbors(n_neighbors=3, algorithm='ball_tree').fit(X)

Let's find the k-neighbors of each point in object X. To do that we call the kneighbors() function on object X.

In [None]:
distances, indices = nbrs.kneighbors(X)

In [6]:
print(indices)

[[0 3 2]
 [1 2 0]
 [2 4 0]
 [3 5 0]
 [4 2 0]
 [5 3 0]]


In [7]:
print(distances)

[[0.         1.         1.41421356]
 [0.         7.81024968 7.81024968]
 [0.         1.41421356 1.41421356]
 [0.         1.         1.        ]
 [0.         1.41421356 2.        ]
 [0.         1.         2.        ]]


Imagine you have a new incoming data point. It contains the values -2 and 4. To search object X and identify the most similar record, all you need to do is call the kneighbors() function on the new incoming data p

In [9]:
dist, idx = nbrs.kneighbors([[-2, 4]])
print('The closest are {}'.format(idx))
print('The distances are {}'.format(dist))

The closest are [[5 3 0]]
The distances are [[1.         1.41421356 2.23606798]]


The results indicate that the record that has neighbors with the indices [5, 3, 0] is the most similar to the new incoming data point. If you look back at the records in X, that is the last record: [-1, 4]. Just based on a quick glance you can see that, indeed, the last record in object X is the one that is most similar to this new incoming data point [-2, 4].  
In this way, you can use kNN to quickly classify new incoming data points and then make recommendations, all based 
on similarity.