# Simple recommender system 1

In the example we will recommend to Data Science users' in which topic they might be interested next. 


Objective: We want to create a general recommender based on what's popular. 
The example has been taken from the book [Data Science from scratch](https://github.com/joelgrus/data-science-from-scratch)

## Libraries

In [12]:
from collections import Counter

## Data 

In [13]:
users_interests = [
    ["Hadoop", "Big Data", "HBase", "Java", "Spark", "Storm", "Cassandra"],
    ["NoSQL", "MongoDB", "Cassandra", "HBase", "Postgres"],
    ["Python", "scikit-learn", "scipy", "numpy", "statsmodels", "pandas"],
    ["R", "Python", "statistics", "regression", "probability"],
    ["machine learning", "regression", "decision trees", "libsvm"],
    ["Python", "R", "Java", "C++", "Haskell", "programming languages"],
    ["statistics", "probability", "mathematics", "theory"],
    ["machine learning", "scikit-learn", "Mahout", "neural networks"],
    ["neural networks", "deep learning", "Big Data", "artificial intelligence"],
    ["Hadoop", "Java", "MapReduce", "Big Data"],
    ["statistics", "R", "statsmodels"],
    ["C++", "deep learning", "artificial intelligence", "probability"],
    ["pandas", "R", "Python"],
    ["databases", "HBase", "Postgres", "MySQL", "MongoDB"],
    ["libsvm", "regression", "support vector machines"]
]

## Most popular

One easy way is to just recommend what is popular. In this case we can just recommend to user the most popular interest, he/she doesn't have yet. The first step is to count the mentions per interest.

In [14]:
popular_interests = Counter(interest
                            for user_interests in users_interests
                            for interest in user_interests).most_common()

In [15]:
print (popular_interests)

[('Python', 4), ('R', 4), ('Big Data', 3), ('HBase', 3), ('Java', 3), ('statistics', 3), ('regression', 3), ('probability', 3), ('Hadoop', 2), ('Cassandra', 2), ('MongoDB', 2), ('Postgres', 2), ('scikit-learn', 2), ('statsmodels', 2), ('pandas', 2), ('machine learning', 2), ('libsvm', 2), ('C++', 2), ('neural networks', 2), ('deep learning', 2), ('artificial intelligence', 2), ('Spark', 1), ('Storm', 1), ('NoSQL', 1), ('scipy', 1), ('numpy', 1), ('decision trees', 1), ('Haskell', 1), ('programming languages', 1), ('mathematics', 1), ('theory', 1), ('Mahout', 1), ('MapReduce', 1), ('databases', 1), ('MySQL', 1), ('support vector machines', 1)]


## Recommender function

Now we need to define a function which will give back 5 suggestions which are popular and not in the interest of the user yet. 

In [16]:
def most_popular_new_interests(user_interests, max_results=5):
    suggestions = [(interest, frequency) 
                   for interest, frequency in popular_interests
                   if interest not in user_interests]
    return suggestions[:max_results]

## Evaluation

Let do a test with a imaginary user

In [17]:
print (most_popular_new_interests(["NoSQL", "MongoDB", "Cassandra", "HBase", "Postgres"]))

[('Python', 4), ('R', 4), ('Big Data', 3), ('Java', 3), ('statistics', 3)]


In [18]:
print (most_popular_new_interests(["R", "Python", "statistics", "regression", "probability"]))

[('Big Data', 3), ('HBase', 3), ('Java', 3), ('Hadoop', 2), ('Cassandra', 2)]
