## Item to Item recommendation algorithm

This recommendation algorithm uses an item-to-item approach to find:

1. Artists that are similar to other artists
2. If a user has listened to a set of artists, which other artists they would likely enjoy listening to. 


### Data

1. Artists - 17.5K records
2. Users - 1892 records
3. Interactions - how many times a user has listened to an Artist - 92K records


### Approach
Matrix manipulation and cosine distance between item and user vectors.

#### Credits

* [Random Walks in Recommender Systems](https://nms.kcl.ac.uk/colin.cooper/papers/recommender-rw.pdf)



In [1]:
import pandas as pd
import numpy as np
from scipy import sparse
from scipy.sparse import csr_matrix
from sklearn.preprocessing import normalize

In [2]:
interactions_df = pd.read_csv(
    "data/lastfm_user_scrobbles.csv")
titles_df = pd.read_csv(
    "data/lastfm_artist_list.csv")

In [4]:
# artist names
titles_df.index = titles_df["artist_id"]
titles_dict = titles_df["artist_name"].to_dict()

In [4]:

rows, r_pos = np.unique(interactions_df.values[:, 0], return_inverse=True)
cols, c_pos = np.unique(interactions_df.values[:, 1], return_inverse=True)

In [5]:
# Sparse matrix Users Artists
interactions_sparse = sparse.csr_matrix(
    (interactions_df.values[:, 2], (r_pos, c_pos)))


In [6]:
# Normalize
Pui = normalize(interactions_sparse, norm='l2', axis=0)


In [8]:
# Simple cosine similarity
sim = Pui.T * Pui

In [9]:
# We can choose the artist and find some similar artists, 20 in this case
sim_artists = [titles_dict[i+1]
               for i in sim[13303].toarray().argsort()[0][-20:]]
sim_artists

['Anthrax',
 'Oceano',
 'Kreator',
 'Dysphoria',
 'Sodom',
 'Exodus',
 'The Acacia Strain',
 'Ion Dissonance',
 'Паровоз На Резиновом Ходу',
 'La Vida Cuesta Libertades',
 'Silent Decay',
 'Smaxone',
 'Фактор Страха',
 'Grade 8',
 'Немоляев-Селезнев',
 'Skorbut',
 'The Last Of Lucy',
 'Hatebreed',
 'Pro-Pain',
 'Slayer']

In [10]:
# We can use matrix multiplication to find 
fit = Pui * Pui.T * Pui

In [11]:
# User favourite artists, from the file 
initial_set = set([titles_dict[i+1]
                  for i in np.nonzero(interactions_sparse[2])[1].tolist()])

In [14]:
predicted_set = set([titles_dict[i+1]
                     for i in fit[2].toarray().argsort()[0][-70:].tolist()])

In [17]:
predicted_set - initial_set

{'A.R. Rahman',
 'Ai',
 'Alive Inside',
 'Armin Van Buuren',
 'Bedford',
 'Big Dismal',
 'Calexico',
 'Closure',
 'Css',
 'Dil Se',
 'Doping Panda',
 'Doubledrive',
 'Fighting Instinct',
 'Globe',
 'Green River Ordinance',
 'Honey Is Cool',
 'K.C. And The Sunshine',
 'Kco',
 'Love Psychedelico',
 'Old Man Shattered',
 'Piano Magic',
 'Raised By Swans',
 'Ravex',
 'Salyu',
 'Sega',
 'Silvergun',
 'Sneaky Sound System',
 'The Music',
 'Tiromancino',
 'W & Whale',
 'Wire Daisies',
 'Wise',
 'Ya-Kyim',
 '青山テルマ'}