# Final Project - Research Replication
- INFO 5871
- Spring 2019
- Falcon (Yu Li and Akshit Arora)

## Project Description 

Replicate the experiments from a paper [1] that proposes an efficient top-n recommendation for very large-scale binary rated (implicit feedback) datasets. The paper extends an algorithm that won the MSD Challenge [2]. It introduces a new adaptive similarity function called asymmetric cosine function and studies the impact of tuning different parameters of this function in both user-based and item-based collaborative filtering. Additionally, it explores concepts of locality, calibration and aggregation (described below) to further the performance. Lastly, it also replicates the same experiments for Movie Lens dataset.

[1] Aiolli, Fabio. "Efficient top-n recommendation for very large scale binary rated datasets." Proceedings of the 7th ACM <br>
[2] Kaggle Competition: https://www.kaggle.com/c/msdchallenge 

In [13]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from importlib import reload
%matplotlib inline

In [14]:
from lenskit import batch, topn, util
from lenskit.algorithms import Predictor, Recommender

In [15]:
%load_ext autoreload
%autoreload 2
from popularity_recommender import *
from us_recommender import *
from eval_tmap import *

## Loading and Preparing the Million Songs Dataset (MSD)

In [48]:
# Reading in test and train set csv files
train_set = pd.read_csv('/Users/mms/Documents/GitHub/recsys_data/train_set_10.csv', encoding='latin-1').drop(['Unnamed: 0','Unnamed: 0.1'], axis=1)
test_set = pd.read_csv('/Users/mms/Documents/GitHub/recsys_data/test_set_10.csv', encoding='latin-1').drop(['Unnamed: 0','Unnamed: 0.1'], axis=1)




In [49]:
train_set.head()

Unnamed: 0,Unnamed: 0.1.1,user,song,play_count,artist,title
0,0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1,Jack Johnson,The Cove
1,1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAPDEY12A81C210A9,1,Billy Preston,Nothing from Nothing
2,2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,1,Paco De Lucia,Entre Dos Aguas
3,3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBFNSP12AF72A0E22,1,Josh Rouse,Under Cold Blue Stars
4,4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBFOVM12A58A7D494,1,The Dead 60s,Riot Radio (Soundtrack Version)


In [50]:
combined = train_set
test_set_users = list(test_set['user'].drop_duplicates())
test_set_candidates = {}

for user in test_set_users:
    df = test_set[ test_set['user'] == user ]
    df = df.head( len(df) // 2 )
    combined = pd.concat([combined, df])
    test_set_candidates[user] = [None, list(df['song'].values)]
    
songs = combined['song'].drop_duplicates() # contains songs from only visible parts to training

In [51]:
# building candidates cache (test_set_candidates) for each test_set_user; user : (candidates, number of positively associated songs)
for user in test_set_users:
    
    # select 50% of these rows and assume those are already seen
    seen_lst = test_set_candidates[user][1]

    # select songs from train set that are not present in df
    # candidates = train_set[~train_set.song.isin(df['song'])]['song'].drop_duplicates()
    candidates = songs[~songs.isin(seen_lst)]
    
    test_set_candidates[user][0] = list(candidates.values)

In [20]:
# # songs meta data examination
# meta_song = pd.read_csv('../../../../recsys_data/unique_tracks.txt',sep=r'<SEP>',names = ['trackId','songId','artist','title'],engine = 'python')
# # sample one row per song ID, such that data and meta data have the same number of songId
# songs = meta_song.groupby('songId', group_keys=False).apply(lambda df: df.sample(1))
# songs = meta_song1.drop(['trackId'], axis = 1)
# songs.head()

## Utility Functions

In [52]:
def eval_recs(algo, train, test, N, candidateFunction=None):
    '''
    Fits the algorithm with train set and gives out recommendations for test set.
    Params:
        algo - Algorithm module (inherited from Recommender class)
        train - training set
        test - testing set
        N - Number of recommendations (top-N)
        candidateFunction - a function to select candidates for a given user
    '''
    fittable = util.clone(algo)
    fittable = Recommender.adapt(fittable)
    fittable.fit(train)
    users = test.user.unique()
    recs = batch.recommend(fittable, users, N, candidateFunction)
    return recs

## Experiment 1

#### Baseline: Recommendation by Popularity

In [22]:
# Instantiate the recommender 
pop_rec_algo = PopularityRecommender(test_set_candidates)

In [23]:
# Fit and generate recommendations
pop_recs = eval_recs(pop_rec_algo, combined[['user', 'song', 'play_count']], test_set[['user', 'song', 'play_count']], 500)
pop_recs.head()



Processed user bd4c6e843f00bd476847fb75c47b4fb430a06856 1 in 0.356658935546875
Processed user 45544491ccfcdc0b0803c34f201a6287ed4e30f8 2 in 0.14820313453674316
Processed user 12768858f6a825452e412deb1df36d2d1d9c6791 3 in 0.18183207511901855
Processed user ff4322e94814d3c7895d07e6f94139b092862611 4 in 0.181488037109375
Processed user ca80fbb6d0deb3cae53763099e2cae7306f005ec 5 in 0.1820681095123291
Processed user 99d57b8925d3c30c534d2e651cc029fcaaa86cce 6 in 0.15472984313964844
Processed user 3f9ed694a79835c921ef6d94acd28f876c1d901e 7 in 0.15716004371643066
Processed user daebcdcf97caaf54a327b8ff52eca9f320599a10 8 in 0.14296317100524902
Processed user c2cffe9ccaa09a327e8134e9a1f24901801fb2f8 9 in 0.15272188186645508
Processed user 9fba771d9731561eba47216f6fbfc0023d88641b 10 in 0.2280750274658203
Processed user b4e32cdb654ab914a0ac73d2bbc5cae142da405a 11 in 0.15346193313598633
Processed user 15415fa2745b344bce958967c346f2a89f792f63 12 in 0.1484971046447754
Processed user 95773b3725a96464c

Processed user 61e4b4c3450ea2b8314c7831aa9362f91e8b4958 103 in 0.2479102611541748
Processed user 0abab34c46d6b9242720bf55e50e325a911d9d3a 104 in 0.1861710548400879
Processed user 9a7bd7299b8f017f94e86a3c0b091aa69cea4b14 105 in 0.24249005317687988
Processed user 6721294e3d775b66c785ea0a3eecc8f0b499333a 106 in 0.20949006080627441
Processed user ffdaab327f2fc6b9fa01a4e3e7f41fdd0e468046 107 in 0.16220808029174805
Processed user 1f2d0f82aae27c374b5c09ab5ef122f6e39fbad2 108 in 0.2263181209564209
Processed user 02192554db8fe6d17b6309aabb2b7526a2e58534 109 in 0.18066191673278809
Processed user de5196164330c7811629893f4b5be3ccbcec392c 110 in 0.16704511642456055
Processed user f86d07cafe10bf1c1ac73c0c14e58ba01084087b 111 in 0.16158604621887207
Processed user c06d85f48d4c2c9b0d4a4d0adc13a10a1f258216 112 in 0.20813584327697754
Processed user ebbd489317f0bbe96533089e0c178609d773b0eb 113 in 0.19833827018737793
Processed user bf19818e6cd5b15250fc490fd7049429e728ffa4 114 in 0.20842695236206055
Process

Processed user 6b58f81e2f3e2d7eacea2cbd68c3b3ed566bfd7a 203 in 0.13902997970581055
Processed user 2e2dffbc31b14b94d1ddadeadac8dc913a71624c 204 in 0.1364128589630127
Processed user d6900cd6736aaa775227b273a994c8411c1e87da 205 in 0.13922715187072754
Processed user c6a6013f215c49f9dfaa8e6ac4ef3486f83269f3 206 in 0.13944101333618164
Processed user 2966043d4473e7540f8272049bed83c09ddc4160 207 in 0.15680599212646484
Processed user bc3551591ed12f1900f06d9499a4314a78348107 208 in 0.1539311408996582
Processed user 5c9ab15da2ea697663a15d9d432187ea103236d9 209 in 0.1502063274383545
Processed user 40e9ae728eaf4a167244b29fbf1fc5e2f5c284e8 210 in 0.15631604194641113
Processed user 499e78e607feed299196cfc56903bdeb30c850ac 211 in 0.15705490112304688
Processed user 2c12e9b742fa0b18ab56fc489122aa75a1bd6b01 212 in 0.15706610679626465
Processed user fd751879833f606b59efcc0276be3c0f5a655a33 213 in 0.17063188552856445
Processed user 447f7cbe6f1d1876078382eaf21f565eb2552ff5 214 in 0.17143774032592773
Process

Processed user 6758171b965b48a75407c4051dba3a5d7c46c006 303 in 0.1384410858154297
Processed user c671bd065e506ea3e3fdc56ab50a099d79f8ca0b 304 in 0.14099407196044922
Processed user 0e7203676cf2c2f6b362ec201871829dc5aa0243 305 in 0.14539003372192383
Processed user 235ccd8f290fa3dc20bb3391e4d23ababc776e67 306 in 0.13731694221496582
Processed user f37aea6dddf3aada98fa4e055f58dae136d04a61 307 in 0.1415538787841797
Processed user a0bd64f4ffc77591a67792fd2a15aa486ac7cb40 308 in 0.14516305923461914
Processed user 6488510fc5a0f5e5e2259bc38a832ac78ea7fe21 309 in 0.15848803520202637
Processed user c0683be4924b3384862192fcb2c5e48e0f9dfdd2 310 in 0.19169306755065918
Processed user d9ed71bcb7bfad24ec14801a284dd080ebdfd0a8 311 in 0.17717385292053223
Processed user ee2c375ee45ea130448542baf6164eb15677b35a 312 in 0.17563700675964355
Processed user 2724a8f04b2d16ed7d41f065fa76a6ed0dbd0b36 313 in 0.14894914627075195
Processed user 745a99e98642e160d2af67e33b2b228d713e27e1 314 in 0.14419937133789062
Proces

Processed user c0f046d55aedf266dea2701218799331f4c09727 402 in 0.1408398151397705
Processed user 4d52b822258d5be0529180abf1324e21858b252c 403 in 0.14151287078857422
Processed user fbe137944b164e3609ce07b28b73baade5a40518 404 in 0.14029884338378906
Processed user be491ad6c21762429585cfc3c1078bda4e82552d 405 in 0.137099027633667
Processed user 80edce8298dbbefc0f6113a55cd4d9be4a8cd28a 406 in 0.14017915725708008
Processed user 374a6276f4b48b11eb9fcc7fcce53065bcf28bea 407 in 0.1431288719177246
Processed user 3c1921d1401cbe61bf73285e0db877141e3d4e69 408 in 0.14366888999938965
Processed user 0fc66036ba2208e973a5a57e1f128a223bc77b4f 409 in 0.1452808380126953
Processed user 5adbb32b7d2402a020584b1f89aa74ee06710172 410 in 0.1416456699371338
Processed user ab6426671ef8c5c54b89ecef834710b0c63bcc22 411 in 0.13850188255310059
Processed user 05c81c1cfb308b9cf8966de20f12b64b95285ff7 412 in 0.13846707344055176
Processed user 9d4a1944bf541371a53dd79f8f3b516b2a23aa57 413 in 0.1410810947418213
Processed u

Processed user d3e3d3bf1f5552eeeecca312cdcf6d19319e87cc 503 in 0.19085288047790527
Processed user d5c55b1ee9ef9e162c39f21ba29a048143765060 504 in 0.1413278579711914
Processed user 80808257e4132ce49a14cbafd1416472486b980f 505 in 0.1943681240081787
Processed user f4bec46271f2b7bec48794dd5c176b1b3c17af1d 506 in 0.20979905128479004
Processed user 04eda5fe11f0168fd6618c593b006872c156ba1b 507 in 0.1450948715209961
Processed user 140d1594b5916a537af0c57d6ccf4974937e8553 508 in 0.15532994270324707
Processed user 3795939400e1269208e2eafb64f9de455a3401ab 509 in 0.1848599910736084
Processed user d7f30f52bc2b6a038c0eaf7b9e71ee961897fb2c 510 in 0.14885401725769043
Processed user ed5b57ae5f90c37c3b35c171f458b2a9a979200d 511 in 0.14101195335388184
Processed user 7223c0e9edc0151f1cf0161d08bebc4e42ef9860 512 in 0.14791011810302734
Processed user cabe8916960fc6ebaf0a872925fac7f4e17466ed 513 in 0.23013806343078613
Processed user f5ec108991a42e586a689399201a9702305ee4cc 514 in 0.16194581985473633
Processe

Processed user 1b40252dc7cb679d8bb2dbf36a7344251fde8625 604 in 0.14191007614135742
Processed user 0f40a8b5ef44e9b855bf0a22b457fb3190a62ecd 605 in 0.1704578399658203
Processed user 5b447c272a35d33be264974fb4f48e6235c8077b 606 in 0.17032718658447266
Processed user 63b8d254aa16523766538a93269f7ac05659c4d8 607 in 0.16275525093078613
Processed user 4b53c055eae3b31de615c4c4607a42d7fa84ad68 608 in 0.17693495750427246
Processed user 919363adf030be4cc68c0bf1c769af23bc3e7489 609 in 0.17846226692199707
Processed user 588849839b719e595f26a01ffbd93307c3821b58 610 in 0.18635177612304688
Processed user fd4d274647a3d36d1c65d2c775ef8d1901e27c2b 611 in 0.17061233520507812
Processed user f7bafda39bd768bb7a206dddf4357b332ccecb6d 612 in 0.14572405815124512
Processed user dc4ef35ec4b6360a5b5df0ac71dca0d19ce05996 613 in 0.14044499397277832
Processed user 151945c7dc89da9fc907d93ca9db761f069e30f5 614 in 0.17129898071289062
Processed user b1e1d804e283d4f53060b4fe824b0be8a3422249 615 in 0.20906400680541992
Proce

Processed user 001f22c638730aed5659034c447d3cf0e658898e 704 in 0.35462522506713867
Processed user 106d9f97ca1320bf8e56531e37daeb3d63666eb2 705 in 0.24603986740112305
Processed user e24730ff9e76ec0a1d358424fcf661ebdf278355 706 in 0.26183485984802246
Processed user 8f1e5c6916089d94ec015c2187ad3e04a4aa3c14 707 in 0.16145682334899902
Processed user 8f19716c42639eeb6ac35033068a172377342dee 708 in 0.17087674140930176
Processed user 5ec2d06b050e4edfc389bb71862e3d96eebc8506 709 in 11.24932599067688
Processed user 9bdafe49f413a9b32611d402a74e97f68f0efe63 710 in 0.5979070663452148
Processed user d6fd550298fc64fc8182e0a25b070c80e48634c5 711 in 0.37334299087524414
Processed user 0b8a66f97e1429c91e079bf66d30298b6b0a5a94 712 in 0.38794398307800293
Processed user 8f86d7e52bbd4452d44c1c923e1b35f2cbb5b251 713 in 0.2685422897338867
Processed user 7c54f0a497011fda8ff6efc9ce4ebec7a5550515 714 in 0.38243818283081055
Processed user b0e714723c53a82728a982ab7a2dc349ecf18c9a 715 in 0.34776806831359863
Processe

Processed user 7eae6681c5b804b538742765d9b1d7e55a7a76ce 804 in 0.15373992919921875
Processed user 6437f63677b37f04ac200b9e028cb15eced8daf8 805 in 0.1752159595489502
Processed user ea916c5bfd508195093b3ec27331c55b26ec6882 806 in 0.20464062690734863
Processed user 1e09a401af90061ccf29473069121e1e7d1b2e3e 807 in 0.1835041046142578
Processed user 824da2320710264defafcdd2b97cd58803c8c744 808 in 0.16916418075561523
Processed user 00b3622168c02e75d448656395f3f8dd4a8ac879 809 in 0.1776731014251709
Processed user 701828764b2b841814ad71b3632f9e81842bb844 810 in 0.16570329666137695
Processed user d60c2d61f4e79af020b4dd013f60540434a6d5a5 811 in 0.14887714385986328
Processed user ec15b14796ee8fe559edb301c4101b3e3f883308 812 in 0.15551328659057617
Processed user 730e40e216ac43cc1f9562bcab00882ec140286b 813 in 0.14595913887023926
Processed user 008d94690f74321c652b26859c0fd594718b6d6c 814 in 0.1503298282623291
Processed user 46595a94758df44d6c7d58b80ad0ecb792bf0ded 815 in 0.16854190826416016
Processe

Processed user 540b7f70e62ca3f874b814168bd1eceef4ab27cd 903 in 0.20226311683654785
Processed user 1341b375af6942584be80d816f7cf5fc5c88b0aa 904 in 0.16428017616271973
Processed user 875fbfa70e3a2b384a0e3c274b2d0a9c7840fa96 905 in 0.20000696182250977
Processed user c7cb4df5d741718136848436f5c46f033160c810 906 in 0.14156889915466309
Processed user 2cd74f687f21b8cd5ed90159b3f075ddfd599360 907 in 0.15419411659240723
Processed user 2afbc1ce8daaf8daa47b76cfd0e9718edc229f34 908 in 0.14890694618225098
Processed user 8ff3eaf06444d93beea9eba00a1d6cbee06b6f36 909 in 0.16273999214172363
Processed user 45c31469eac311e8195345d3876cb8d0fe4f14f7 910 in 0.195573091506958
Processed user 4f08b44145e67ab1951f82232e0a90bde5ae6dde 911 in 0.1712648868560791
Processed user 7c6750f6664870f7649133bab733d89ebd322482 912 in 0.1541910171508789
Processed user d7143b1258dcc1adb8272f8fd581dfe584c68668 913 in 0.16852498054504395
Processed user 58b8626c686b4bb9bfcc7666ecef28a362fec26d 914 in 0.14696907997131348
Processe

Unnamed: 0,item,score,user,rank
0,SOFRQTD12A81C233C0,1121.0,bd4c6e843f00bd476847fb75c47b4fb430a06856,1
1,SOAXGDH12A8C13F8A1,924.0,bd4c6e843f00bd476847fb75c47b4fb430a06856,2
2,SOAUWYT12A81C206F1,893.0,bd4c6e843f00bd476847fb75c47b4fb430a06856,3
3,SOBONKR12A58A7A7E0,791.0,bd4c6e843f00bd476847fb75c47b4fb430a06856,4
4,SONYKOW12AB01849C9,718.0,bd4c6e843f00bd476847fb75c47b4fb430a06856,5


In [24]:
# Transform generated recommendations into users_data (given in eval_tmap) for calculating tmap metric
users_data = pd.DataFrame(pop_recs.sort_values(by=['user','rank']).groupby('user')['item'].apply(list)).rename(index=str, columns={"item": "SongsRecommended"})
users_data['SongsPlayed'] = pd.DataFrame(test_set.groupby('user')['song'].apply(list))['song']
users_data = users_data.reset_index().rename(index=str, columns={'user':'UserId'})

In [25]:
# Evaluate the recommendations using truncated mAP
tmap(users_data, N =500)

0.026560754937484193

## Experiment 2 

#### Effect of parameter 'q'

In [53]:
expt2_user_based = us_recommender.UserBasedRecommender(alpha=0, q=1, beta=1)

NameError: name 'us_recommender' is not defined

In [20]:
expt2_sample_pop_recs = eval_recs(expt2_user_based, combined[['user', 'song', 'play_count']], test_set[['user', 'song', 'play_count']], 500)
expt2_sample_pop_recs.head()


Processed user b80344d063b5ccb3212f76538f3d9e43d87dca9e (0)
Processed user d6589314c0a9bcbca4fee0c93b14bc402363afea (10)
Processed user 403b3b867fc71dfdcc12652f30e88bdc7ccd9aa4 (20)
Processed user c2934b29d64e467297c608351ed9695ce62128bc (30)
Processed user 43683da3c6c5a93c7938ff550faf0d039a9a639a (40)
Processed user b6b799f34a204bd928ea014c243ddad6d0be4f8f (50)
Processed user 4c84359a164b161496d05282707cecbd50adbfc4 (60)
Processed user 179b2286bb4eea7193bcfa0c36fcfa4eade2b34d (70)
Processed user 5555d1bc4488a310753a9d7b4c4d0b92c2d5d674 (80)
Processed user 175fb4d4bf89b82021524e9485fffd47c3ab1aaf (90)
Processed user c2d86e9cbf756ce05ea3c163e24de394585b7c53 (100)
Processed user a520488fcf049bbb5cd847cfa4f884c740692780 (110)
Processed user c162cca4595e5b2fbeefed35ae0f247f648d7751 (120)
Processed user 06b31818386e598017a475f8e349b3ca31ba3178 (130)
Processed user 3187e658fd71be99d68d96f3a61a436c8a607365 (140)
Processed user 8caf9a87e266a22298bd977a63489d008af241c5 (150)
Processed user 4d4a

Processed user e19e1bc567f9bb2fcf98b70701657158242c5c6e (1320)
Processed user 9149ac71b8d1562b69ad56a9e2008139306dbff4 (1330)
Processed user 9bb607154fd3684a731100c293eac5ec55b7c27a (1340)
Processed user 644b6a7d16a62f2470a36256b84af253203502f5 (1350)
Processed user bd2e05c3c59fd4b3761b88301e827341ef2a11c7 (1360)
Processed user eb42f1ba4e42596e70512a61c414ff84f50301b7 (1370)
Processed user d93a25bf85c3ba69973f5544e10735581f12f9eb (1380)
Processed user 40c2f2e9f76571b80410ab11e8e4d8f7e02cb7fa (1390)
Processed user 9eff847355e390b7bbf7c84aed4da4da3681c68f (1400)
Processed user 19a0bcaab9ecfd33d07810ae076d57eb41a4538b (1410)
Processed user 39d8e855582959996c0e62b5590e7617c1b26910 (1420)
Processed user fe0b79f2a7639372fd1c3899bf9e41fd6d4b369e (1430)
Processed user 2d40a826e9b1fba468c3058fc81dd78014617780 (1440)
Processed user 46b49ba048a0e3cac35fbfb4baf1a053759da8a3 (1450)
Processed user 565b9ac08e2381634647f1e4ed27311f12ae8926 (1460)
Processed user ae2e3c533c5908e53d89955c9d384a005dc1af2d

KeyboardInterrupt: 

##Item based recommendation

In [58]:
import is_recommender

In [59]:
combined['user'].nunique()

10

In [60]:
expt2_item_based = is_recommender.ItemBasedRecommender(alpha=0, q=1, beta=1)

In [61]:
expt2_sample_pop_recs_item = eval_recs(expt2_item_based, combined[['user', 'song', 'play_count']], test_set[['user', 'song', 'play_count']], 500)
expt2_sample_pop_recs_item.head()



Processed song SOAKIMP12A8C130995 (0)
Processed song SOBYHAJ12A6701BF1D (10)
Processed song SOFFJPX12A6D4F7456 (20)
Processed song SOIQOQT12A8C136F96 (30)
Processed song SOLFJVY12A6D4F9DCA (40)
Processed song SONRXOY12AB0181E84 (50)
Processed song SOPZAUC12A58A7DB24 (60)
Processed song SORQHCG12A58A7EEBA (70)
Processed song SOTLVCL12AB0182D22 (80)
Processed song SOWEZSI12A81C21CE6 (90)
Processed song SOYYWMD12A58A7BCC9 (100)
Processed song SOHANDU12A8C13C47F (110)
Processed song SOTVFEF12AF729E6CE (120)
Processed song SOLCKXH12A58A79553 (130)
Processed song SOEBNJI12AB0187464 (140)
Processed song SOVIHUJ12AB018745D (150)
Processed song SOKLRPJ12A8C13C3FE (160)
Processed song SOIKDCZ12A67ADD8DB (170)
Processed song SOAFTRR12AF72A8D4D (180)
Processed song SOBFMHC12A6D4F9401 (190)
Processed song SOEMYLD12AB017F4DD (200)
Processed song SOGQTAC12A6D4FA26B (210)
Processed song SOIJLDG12A8C135B96 (220)
Processed song SOLRFKH12AF72A5CF6 (230)
Processed song SONYVIX12A6D4F7161 (240)
Processed s

TypeError: 'NoneType' object is not iterable