# Cosine Similarity and Recommendation Engine
### Josue Antonio
### DS2500: Intermediate Programming with Data

Implement a recommendation algorithm that makes interest suggestions for each user A, B, C, .... Your algorithm should use cosine similarity to identify the most similar user who has one or more suggestions to offer.  You may use any code we developed in class.  

In [1]:
# Convert a set into a list (if needed)

s = {1,2,3}
print(list(s))

# declare an empty list (if needed)
s = set()
s

[1, 2, 3]


set()

In [2]:
# A dictionary of users and their SET of interests

data = {
    "A": {"Hadoop", "Big Data", "HBase", "Java", "Spark", "Storm", "Cassandra"},
    "B": {"NoSQL", "MongoDB", "Cassandra", "HBase", "Postgres"},
    "C": {"Python", "scikit-learn", "scipy", "numpy", "statsmodels", "pandas"},
    "D": {"R", "Python", "statistics", "regression", "probability"},
    "E": {"machine learning", "regression", "decision trees", "libsvm"},
    "F": {"Python", "R", "Java", "C++", "Haskell", "programming languages"},
    "G": {"statistics", "probability", "mathematics", "theory"},
    "H": {"machine learning", "scikit-learn", "Mahout", "neural networks"},
    "I": {"neural networks", "deep learning", "Big Data", "artificial intelligence"},
    "J": {"Hadoop", "Java", "MapReduce", "Big Data"},
    "K": {"statistics", "R", "statsmodels"},
    "L": {"C++", "deep learning", "artificial intelligence", "probability"},
    "M": {"pandas", "R", "Python"},
    "N": {"databases", "HBase", "Postgres", "MySQL", "MongoDB"},
    "O": {"libsvm", "regression", "support vector machines"}
}

In [3]:
# Convert a list of words to a word vector (code from class)

def vectorize(words, unique):
    return [1 if word in words else 0
              for word in unique]

# Vector functions

def mag(v):
    """ magnitude of a vector """
    return sum([i **2 for i in v]) ** 0.5


def dot(u,v):
    """ dot product of two vectors """
    return sum([ui * vi for ui, vi in zip(u,v)])
    

def cosine_similarity(u, v):
    cos_theta = dot(u,v)/(mag(u) * mag(v))
    return cos_theta

In [16]:
# Exercise: Print recommendations for every user in data

def recommend(data):
    
    # find unique interests across all users
    unique = set([item for subdic in data.values() for item in subdic])
    
    # For each user print suggestions
    for user in data: 
        best_suggestions = {}
        most_sim = 0
        for other_user in data:
            if other_user != user and len(set(data[other_user]) - set(data[user])) > 0:
                cos_sim = cosine_similarity(vectorize(data[user], unique), vectorize(data[other_user], unique))
                if most_sim < cos_sim and cos_sim < 1:
                    most_sim = cos_sim
                    best_suggestions = set(data[other_user]) - set(data[user]) 
        print(user, best_suggestions)
        
recommend(data)

A {'MapReduce'}
B {'databases', 'MySQL'}
C {'R'}
D {'statsmodels'}
E {'support vector machines'}
F {'pandas'}
G {'regression', 'R', 'Python'}
H {'regression', 'decision trees', 'libsvm'}
I {'C++', 'probability'}
J {'Cassandra', 'Spark', 'Storm', 'HBase'}
K {'regression', 'probability', 'Python'}
L {'neural networks', 'Big Data'}
M {'regression', 'statistics', 'probability'}
N {'Cassandra', 'NoSQL'}
O {'decision trees', 'machine learning'}
