# Content-Based Recommender System
Written by: Niko Escanilla

- In this demo I create a content-based recommender system that is used to recommend NFL articles based on user input.

# Data

- Data was extracted using BeautifulSoup package on Python.
- Variables: id, article title, link to article
- Number of articles: 1421

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [2]:
# read in dataset
df = pd.read_csv("nfl_article_title_data.csv")
df.head(10)

Unnamed: 0,ID,Article Title,Link
0,1,Uber driver: Glad to see discipline for Winston,http://www.espn.com/nfl/story/_/id/23948463/ub...
1,2,Eagles' Bradham banned 1 game for '16 assault,http://www.espn.com/nfl/story/_/id/23948484/ni...
2,3,Colts RB Turbin suspended for PED violation,http://www.espn.com/nfl/story/_/id/23947740/in...
3,4,McMahon expects to spend $500M on XFL,http://www.espn.com/nfl/story/_/id/23947732/vi...
4,5,Buccaneers release starting RG Sweezy,http://www.espn.com/nfl/story/_/id/23947723/ta...
5,6,NFL warns of training camp concussions,http://www.espn.com/nfl/story/_/id/23947051/nf...
6,7,Giants RB Barkley: I'll invest like Beast Mode,http://www.espn.com/nfl/story/_/id/23939975/ne...
7,8,Bucs QB Winston suspended 3 games by NFL,http://www.espn.com/nfl/story/_/id/23936785/ta...
8,9,"Complaint: Jenkins' brother, victim had dispute",http://www.espn.com/nfl/story/_/id/23938976/br...
9,10,NFL fines Richardson for workplace misconduct,http://www.espn.com/nfl/story/_/id/23936331/nf...


In [3]:
# get input
user_input = "Chicago Bears Khalil Mack trade"
new_input = [df.shape[0]+1, user_input, '']
df.loc[df.shape[0]] = new_input
df.tail()

Unnamed: 0,ID,Article Title,Link
1417,1418,Source: Seattle not planning to trade Thomas,http://www.espn.com/nfl/story/_/id/24547049/se...
1418,1419,"Bengals cut DE Johnson, put QB Barkley on IR",http://www.espn.com/nfl/story/_/id/24547346/ci...
1419,1420,Panthers acquire Robinson from Lions to help OL,http://www.espn.com/nfl/story/_/id/24544622/ca...
1420,1421,"Saints cut slew of vets, including WRs Floyd, ...",http://www.espn.com/nfl/story/_/id/24538571/ne...
1421,1422,Chicago Bears Khalil Mack trade,


In [4]:
# create object to convert collection of raw text docs to a matrix of TF-IDF features
tf = TfidfVectorizer(analyzer = 'word', ngram_range=(1,3),
                    min_df = 0, stop_words = 'english')

In [5]:
# learn vocab and idf, return term-doc matrix
tfidf_matrix = tf.fit_transform(df['Article Title'])

# compute similarities
cos_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

In [6]:
# save results as a dictionary
results = {}

# go through each row of df
for idx, row in df.iterrows():
    # store similar ids based on cosine similarity, then sort in ascending order
    similar_indices = cos_sim[idx].argsort(kind='quicksort')[:-20:-1]
    
    # get 5 most similar articles
    similar_items = [(cos_sim[idx][i], df['ID'][i]) for i in similar_indices]
    results[row['ID']] = similar_items[1:]  

In [7]:
# this function is going to return a row that matches the id
def item(id):
    return df.loc[df['ID'] == id]['Article Title'].tolist()[0]

# this function is going to return a row that matches the id
def itemLink(id):
    return df.loc[df['ID'] == id]['Link'].tolist()[0]


# this function returns the most similar articles
# input: id = id of the book, num = number of similar books to return
# output: most similar books
def recommend(id, num):
    if (num == 0):
        print("You haven't said anything, dawg! I can't recommend anything if you didn't say something.")
    elif (num==1):
        print("Here is " + str(num) + " article related to your input: " + item(id))
    else :
        print("Here are " + str(num) + " articles related to your input: " + item(id))

    print("----------------------------------------------------------")
    records = results[id][:num]
    for record in records:
        print(item(record[1]) + " (score:" + str(record[0]) + ")")
        print("Link: " + str(itemLink(record[1])) + "\n")

In [8]:
# recommend 
recommend(df.shape[0], 5)

Here are 5 articles related to your input: Chicago Bears Khalil Mack trade
----------------------------------------------------------
Barnwell: Answering biggest questions on Khalil Mack trade (score:0.36973961256054655)
Link: http://www.espn.com/nfl/story/_/id/24544109/answering-biggest-questions-khalil-mack-trade-chicago-bears-oakland-raiders

Chicago Bears depth chart (score:0.281174157381211)
Link: http://www.espn.com/nfl/story/_/id/24492615/chicago-bears-depth-chart

Carr: Raiders players over shock of Mack trade (score:0.161985614033348)
Link: http://www.espn.com/nfl/story/_/id/24569314/derek-carr-oakland-raiders-players-shock-khalil-mack-trade

Source: Bears give Mack record deal after trade (score:0.12802558232017894)
Link: http://www.espn.com/nfl/story/_/id/24543080/chicago-bears-reach-agreement-trade-khalil-mack-oakland-raiders

Bears will be 'smart' about Mack for Week 1 (score:0.10978968322631111)
Link: http://www.espn.com/nfl/story/_/id/24559428/chicago-bears-aggressive-sm

In [9]:
# remove user input
df = df.drop(df.index[len(df)-1])