# Importing all needed libraries

In [1]:
import pandas as pd
import seaborn as sns
from sklearn.preprocessing import LabelEncoder

from file_functions import load_dataset

# Downloading datasets

**For the first time, the function can take a long time, since it needs to download two large enough datasets.**

We check that the data files are downloaded, if not downloaded, then and saved to a file, if downloaded, then just read from the file.

We are using https://static.turi.com/datasets/millionsong/10000.txt set, as it is a subset of a million set of songs. Million songs set holds more than 600 GB of data, while its subset takes up far less. Also, to get more information about songs, we are using https://static.turi.com/datasets/millionsong/song_data.csv. With this set, we have a song name, artist name, and album name — all other things we don't need, as we don't make a deep analysis of the song.

In [2]:
songs = load_dataset('./data', 'https://static.turi.com/datasets/millionsong/10000.txt', 'https://static.turi.com/datasets/millionsong/song_data.csv', 'song.csv')

# Data analysis and simple data preprocessing

## Basic analysis

In [3]:
songs.head()

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1,The Cove,Thicker Than Water,Jack Johnson,0
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Flamenco Para Niños,Paco De Lucia,1976
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1,Stronger,Graduation,Kanye West,2007
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1,Constellations,In Between Dreams,Jack Johnson,2005
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1,Learn To Fly,There Is Nothing Left To Lose,Foo Fighters,1999


We drop all None values from the dataset.

We output the number of empty values before deleting them because, after that, they will be removed from the dataset, so that we will receive zeros.

In [4]:
print(songs.isnull().sum())
songs.dropna(inplace=True)

user_id         0
song_id         0
listen_count    0
title           0
release         0
artist_name     0
year            0
dtype: int64


In [5]:
songs.describe()

Unnamed: 0,listen_count,year
count,2000000.0,2000000.0
mean,3.045485,1628.645
std,6.57972,778.7283
min,1.0,0.0
25%,1.0,1984.0
50%,1.0,2002.0
75%,3.0,2007.0
max,2213.0,2010.0


In [6]:
songs.columns

Index(['user_id', 'song_id', 'listen_count', 'title', 'release', 'artist_name',
       'year'],
      dtype='object')

# Basic preprocess of data

We are adding a couple of new columns to our datasheet, as we want to save the past columns so that we can list the song names in the end.

This couple of columns is encoded columns of user id song id, album name and artist_name. We change user_id to our encoded user_id, as it is coded in start dataset, so that it wouldn't give us any information.

In [7]:
le = LabelEncoder()
songs['user_id'] = le.fit_transform(songs['user_id'])
songs['year'] = pd.to_numeric(songs['year'])
songs['song_id'] = le.fit_transform(songs['song_id'])
songs['encoded_artist_name'] = le.fit_transform(songs['artist_name'])
songs['encoded_release'] = le.fit_transform(songs['release'])
songs.head()

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year,encoded_artist_name,encoded_release
0,54961,153,1,The Cove,Thicker Than Water,Jack Johnson,0,1370,4810
1,54961,413,2,Entre Dos Aguas,Flamenco Para Niños,Paco De Lucia,1976,2239,1548
2,54961,736,1,Stronger,Graduation,Kanye West,2007,1577,1753
3,54961,750,1,Constellations,In Between Dreams,Jack Johnson,2005,1370,2113
4,54961,1188,1,Learn To Fly,There Is Nothing Left To Lose,Foo Fighters,1999,1115,4794


## Advanced data analysis

### Top 10 most popular songs

Firstly, we group our dataset by song_id and then sum it, to get result count of listen. Then we sorting this and get first 10 rows.

In [8]:
songs.groupby("song_id").sum().sort_values("listen_count", ascending=False).head(10)

Unnamed: 0_level_0,user_id,listen_count,year,encoded_artist_name,encoded_release
song_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
614,244725879,54136,12759880,5860568,13445964
317,267383552,49253,14071032,2426040,35842104
7416,236072986,41418,12339160,10145395,19565680
1664,207999124,31153,0,1389330,15492645
2220,315627938,31036,0,10520067,24201948
352,266230118,26663,0,7692543,21437665
5531,222122778,22100,11734569,12879405,30063627
6246,133923369,21019,0,3860970,15651914
7913,111995376,19645,5779774,9203756,4350709
8252,110773078,18309,0,7216588,11787858


1
The Cove Jack Johnson
2
1
Entre Dos Aguas Paco De Lucia
2
1
Stronger Kanye West
2
1
Constellations Jack Johnson
2
1
Learn To Fly Foo Fighters
2
1
Apuesta Por El Rock 'N' Roll Héroes del Silencio
2
1
Paper Gangsta Lady GaGa
2
1
Stacked Actors Foo Fighters
2
1
Sehr kosmisch Harmonia
2
1
Heaven's gonna burn your eyes Thievery Corporation feat. Emiliana Torrini
2
1
Let It Be Sung Jack Johnson / Matt Costa / Zach Gill / Dan Lebowitz / Steve Adams
2
1
I'll Be Missing You (Featuring Faith Evans & 112)(Album Version) Puff Daddy
2
1
Love Shack The B-52's
2
1
Clarity John Mayer
2
1
I?'m A Steady Rollin? Man Robert Johnson
2
1
The Old Saloon The Lonely Island
2
1
Behind The Sea [Live In Chicago] Panic At The Disco
2
1
Champion Kanye West
2
1
Breakout Foo Fighters
2
1
Ragged Wood Fleet Foxes
2
1
Mykonos Fleet Foxes
2
1
Country Road Jack Johnson / Paula Fuga
2
1
Oh No Andrew Bird
2
1
Love Song For No One John Mayer
2
1
Jewels And Gold Angus & Julia Stone
2
1
2
1
83 John Mayer
2
1
Neon John Mayer


1
Money Straight (Explicit Album Version) Plies
2
1
Elephant Gun Beirut
2
1
Annihilation By The Hands Of God (Album Version) Roadrunner United
2
1
Catch You Baby (Steve Pitron & Max Sanna Radio Edit) Lonnie Gordon
2
1
Why Worry The All-American Rejects
2
1
If It's Lovin' That You Want Rihanna
2
1
Three Peaches Neutral Milk Hotel
2
1
It's Only Divine Right The New Pornographers
2
1
Bling (Confession Of A King) The Killers
2
1
The Hazards Of Love 4 (The Drowned) The Decemberists
2
1
You're A Cad the bird and the bee
2
1
The Penalty Beirut
2
1
I'm A Cuckoo Belle & Sebastian
2
1
Baby the bird and the bee
2
1
Ice Dogs Man Man
2
1
What's In The Middle the bird and the bee
2
1
Poster Of A Girl Metric
2
1
Head Rolls Off Frightened Rabbit
2
1
The Greys Frightened Rabbit
2
1
Innocent When You Dream (78) Tom Waits
2
1
What Is It About Men Amy Winehouse
2
1
The Fake Headlines The New Pornographers
2
1
Red Socks Pugie Foals
2
1
Black Wave (Album) The Shins
2
1
Against The Peruvian Monster Man Man
2

1
Slush Hot Chip
2
1
Waters Of Nazareth (album version) Justice
2
1
Life In Technicolor ii Coldplay
2
1
Tenderoni (Radio edit) Chromeo
2
1
Alkime Soulive
2
1
A Dream Cut Copy
2
1
Yes_ I Don't Want This Digitalism
2
1
Around The World (Radio Edit) Daft Punk
2
1
Doperide Saliva
2
1
Leila Came Around And We Watched A Video Four Tet
2
1
Smile To Shine Down To The Bone featuring Hil St. Soul
2
1
Lies The Black Keys
2
1
Glass Of Water Coldplay
2
1
A Whisper Coldplay
2
1
Daft Punk Is Playing At My House LCD Soundsystem
2
1
Easy Love MSTRKRFT
2
1
One One One Hot Chip
2
1
Momma's Boy Chromeo
2
1
Exit Counselor Octopus Project
2
1
Things Ain't Like They Used To Be The Black Keys
2
1
Evil Thing Melody Club
2
1
All Of The Champs That Ever Lived Octopus Project
2
1
Hypnopaedia Octopus Project
2
1
Orbit Brazil Flying Lotus
2
1
Peaceful Valley Ryan Adams & The Cardinals
2
1
The Battery Boys Noize
2
1
Sinister Kid The Black Keys
2
1
One Life Stand Hot Chip
2
1
Tonight Jonas Brothers
2
1
I Remember Dea

1
2
1
All The Kings Horns Sufjan Stevens
2
1
Never As Tired As When I'm Waking Up LCD Soundsystem
2
1
You Could Ruin My Day Four Tet
2
1
Trouble Coldplay
2
1
Swing_ Swing The All-American Rejects
2
1
The Adjustor Octopus Project
2
1
My Angel Rocks Back And Forth Four Tet
2
1
Short Circuit Daft Punk
2
1
The Joker Fatboy Slim
2
1
Plastic People Four Tet
2
1
I'll Try Anything Once The Strokes
2
1
Alive Daft Punk
2
1
Forest Families The Knife
2
1
42 Coldplay
2
1
Golden Mummy Golden Bird Horse The Band
2
1
Disco Infiltrator (FK's Infiltrated Vocal) LCD Soundsystem
2
1
I Feel For You Stefan Schrom
2
1
For You Coldplay
2
1
Aerodynamic (Slum Village Remix) Daft Punk
2
1
Watch The Tapes LCD Soundsystem
2
1
Sunset (Bird Of Prey) Fatboy Slim
2
1
Angel From Montgomery Bonnie Raitt
2
1
Three Days (2006 Remastered Album Version) Jane's Addiction
2
1
Stop (2006 Remastered Album Version) Jane's Addiction
2
1
I Thought It Was You Julia Fordham
2
1
Mountain Song ( LP Version ) Jane's Addiction
2
1
In Sp

1
re:stacks Bon Iver
2
1
Superstition The Kills
2
1
Flume Bon Iver
2
1
Manhattan Kings Of Leon
2
1
The Penalty Beirut
2
1
Me & Mr Jones Amy Winehouse
2
1
Soul Suckers Amos Lee
2
1
Ain't No Rest For The Wicked (Original Version) Cage The Elephant
2
1
Plush (Acoustic) Stone Temple Pilots
2
1
The Scientist Coldplay
2
1
Not Ready To Make Nice Dixie Chicks
2
1
Somebody Told Me The Killers
2
1
This Is How We Do It Montell Jordan
2
1
Fireflies Charttraxx Karaoke
2
1
Amie Damien Rice
2
1
Yes LMFAO
2
1
The Trouble With Love Is Kelly Clarkson
2
1
Hysteric Yeah Yeah Yeahs
2
1
Hold You In My Arms Ray LaMontagne
2
1
Ghost At The Foot Of The Bed Soltero
2
1
So Insane Discovery
2
1
White Winter Hymnal Fleet Foxes
2
1
Float On Modest Mouse
2
1
Angie (1993 Digital Remaster) The Rolling Stones
2
1
Say It Ain't So Weezer
2
1
My Paper Heart The All-American Rejects
2
1
I Will Follow You into the Dark (Album Version) Death Cab for Cutie
2
1
Creep (Explicit) Radiohead
2
1
Boy With The Coin Iron And Wine
2
1

1
Decades Of Despair Carnal Forge
2
1
Bottom of a Bottle (Explicit Album Version) Smile Empty Soul
2
1
Wish You Were Here Incubus
2
1
They Might Follow You Tiny Vipers
2
1
Naturally Selena Gomez & The Scene
2
1
Enter Sandman Metallica
2
1
Bleed It Out [Live At Milton Keynes] Linkin Park
2
1
Clocks Coldplay
2
1
My Happy Ending Avril Lavigne
2
1
Whataya Want From Me Adam Lambert
2
1
Dental Care Owl City
2
1
American Idiot [feat. Green Day & The Cast Of American Idiot] (Album Version) Green Day
2
1
Day 'N' Nite Kid Cudi Vs Crookers
2
1
Somebody To Love Justin Bieber
2
1
Hey_ Soul Sister Train
2
1
Mia Emmy The Great
2
1
Breakeven The Script
2
1
One Time Justin Bieber
2
1
Hotel California Eagles
2
1
Hips Don't Lie (featuring Wyclef Jean) Shakira Featuring Wyclef Jean
2
1
Crawling (Album Version) Linkin Park
2
1
The Funeral (Album Version) Band Of Horses
2
1
Times Like These Jack Johnson
2
1
Monster Lady GaGa
2
1
Just Dance Lady GaGa / Colby O'Donis
2
1
Down To Earth Justin Bieber
2
1
Lean B

1
Love Is Stronger Than Pride Sade
2
1
Rosario Tijeras Juanes
2
1
Sunglasses At Night Corey Hart
2
1
Sueños Juanes
2
1
The Sweetest Taboo Sade
2
1
No Creo En El Jamas Juanes
2
1
Dress Me Like a Clown Margot & The Nuclear So And So's
2
1
Cherish The Day Sade
2
1
Relax Frankie Goes To Hollywood
2
1
P.D.A. (We Just Don't Care) John Legend
2


## Create a subset of the dataset

In [None]:
song_df = song_df.head(10000)

#Merge song title and artist_name columns to make a merged column
song_df['song'] = song_df['title'].map(str) + " - " + song_df['artist_name']

## Showing the most popular songs in the dataset

In [None]:
song_grouped = song_df.groupby(['song']).agg({'listen_count': 'count'}).reset_index()
grouped_sum = song_grouped['listen_count'].sum()
song_grouped['percentage']  = song_grouped['listen_count'].div(grouped_sum)*100
song_grouped.sort_values(['listen_count', 'song'], ascending = [0,1])

## Count number of unique users in the dataset

In [None]:
users = song_df['user_id'].unique()

In [None]:
len(users)

## Quiz 1. Count the number of unique songs in the dataset

In [None]:
###Fill in the code here
songs = song_df['song'].unique()
len(songs)

# Create a song recommender

In [None]:
train_data, test_data = train_test_split(song_df, test_size = 0.20, random_state=0)
print(train_data.head(5))

## Simple popularity-based recommender class (Can be used as a black box)

In [None]:
#Recommenders.popularity_recommender_py

### Create an instance of popularity based recommender class

In [None]:
pm = Recommenders.popularity_recommender_py()
pm.create(train_data, 'user_id', 'song')

### Use the popularity model to make some predictions

In [None]:
user_id = users[5]
pm.recommend(user_id)

### Quiz 2: Use the popularity based model to make predictions for the following user id (Note the difference in recommendations from the first user id).

In [None]:
###Fill in the code here
user_id = users[8]
pm.recommend(user_id)


## Build a song recommender with personalization

We now create an item similarity based collaborative filtering model that allows us to make personalized recommendations to each user. 

## Class for an item similarity based personalized recommender system (Can be used as a black box)

In [None]:
#Recommenders.item_similarity_recommender_py

### Create an instance of item similarity based recommender class

In [None]:
is_model = Recommenders.item_similarity_recommender_py()
is_model.create(train_data, 'user_id', 'song')

### Use the personalized model to make some song recommendations

In [None]:
#Print the songs for the user in training data
user_id = users[5]
user_items = is_model.get_user_items(user_id)
#
print("------------------------------------------------------------------------------------")
print("Training data songs for the user userid: %s:" % user_id)
print("------------------------------------------------------------------------------------")

for user_item in user_items:
    print(user_item)

print("----------------------------------------------------------------------")
print("Recommendation process going on:")
print("----------------------------------------------------------------------")

#Recommend songs for the user using personalized model
is_model.recommend(user_id)

### Quiz 3. Use the personalized model to make recommendations for the following user id. (Note the difference in recommendations from the first user id.)

In [None]:
user_id = users[7]
#Fill in the code here
user_items = is_model.get_user_items(user_id)
#
print("------------------------------------------------------------------------------------")
print("Training data songs for the user userid: %s:" % user_id)
print("------------------------------------------------------------------------------------")

for user_item in user_items:
    print(user_item)

print("----------------------------------------------------------------------")
print("Recommendation process going on:")
print("----------------------------------------------------------------------")

#Recommend songs for the user using personalized model
is_model.recommend(user_id)


### We can also apply the model to find similar songs to any song in the dataset

In [None]:
is_model.get_similar_items(['U Smile - Justin Bieber'])

### Quiz 4. Use the personalized recommender model to get similar songs for the following song.

In [None]:
song = 'Yellow - Coldplay'
###Fill in the code here
is_model.get_similar_items([song])

# Quantitative comparison between the models

We now formally compare the popularity and the personalized models using precision-recall curves. 

## Class to calculate precision and recall (This can be used as a black box)

In [None]:
#Evaluation.precision_recall_calculator

## Use the above precision recall calculator class to calculate the evaluation measures

In [None]:
start = time.time()

#Define what percentage of users to use for precision recall calculation
user_sample = 0.05

#Instantiate the precision_recall_calculator class
pr = Evaluation.precision_recall_calculator(test_data, train_data, pm, is_model)

#Call method to calculate precision and recall values
(pm_avg_precision_list, pm_avg_recall_list, ism_avg_precision_list, ism_avg_recall_list) = pr.calculate_measures(user_sample)

end = time.time()
print(end - start)

## Code to plot precision recall curve

In [None]:
import pylab as pl

#Method to generate precision and recall curve
def plot_precision_recall(m1_precision_list, m1_recall_list, m1_label, m2_precision_list, m2_recall_list, m2_label):
    pl.clf()    
    pl.plot(m1_recall_list, m1_precision_list, label=m1_label)
    pl.plot(m2_recall_list, m2_precision_list, label=m2_label)
    pl.xlabel('Recall')
    pl.ylabel('Precision')
    pl.ylim([0.0, 0.20])
    pl.xlim([0.0, 0.20])
    pl.title('Precision-Recall curve')
    #pl.legend(loc="upper right")
    pl.legend(loc=9, bbox_to_anchor=(0.5, -0.2))
    pl.show()


In [None]:
print("Plotting precision recall curves.")

plot_precision_recall(pm_avg_precision_list, pm_avg_recall_list, "popularity_model",
                      ism_avg_precision_list, ism_avg_recall_list, "item_similarity_model")


### Generate Precision Recall curve using pickled results on a larger data subset(Python 3)

In [None]:
print("Plotting precision recall curves for a larger subset of data (100,000 rows) (user sample = 0.005).")

#Read the persisted files 
pm_avg_precision_list = joblib.load('pm_avg_precision_list_3.pkl')
pm_avg_recall_list = joblib.load('pm_avg_recall_list_3.pkl')
ism_avg_precision_list = joblib.load('ism_avg_precision_list_3.pkl')
ism_avg_recall_list = joblib.load('ism_avg_recall_list_3.pkl')

print("Plotting precision recall curves.")
plot_precision_recall(pm_avg_precision_list, pm_avg_recall_list, "popularity_model",
                      ism_avg_precision_list, ism_avg_recall_list, "item_similarity_model")

### Generate Precision Recall curve using pickled results on a larger data subset(Python 2.7)

In [None]:
print("Plotting precision recall curves for a larger subset of data (100,000 rows) (user sample = 0.005).")

pm_avg_precision_list = joblib.load('pm_avg_precision_list_2.pkl')
pm_avg_recall_list = joblib.load('pm_avg_recall_list_2.pkl')
ism_avg_precision_list = joblib.load('ism_avg_precision_list_2.pkl')
ism_avg_recall_list = joblib.load('ism_avg_recall_list_2.pkl')

print("Plotting precision recall curves.")
plot_precision_recall(pm_avg_precision_list, pm_avg_recall_list, "popularity_model",
                      ism_avg_precision_list, ism_avg_recall_list, "item_similarity_model")

The curve shows that the personalized model provides much better performance over the popularity model. 

# Matrix Factorization based Recommender System

Using SVD matrix factorization based collaborative filtering recommender system
--------------------------------------------------------------------------------

The following code implements a Singular Value Decomposition (SVD) based matrix factorization collaborative filtering recommender system. The user ratings matrix used is a small matrix as follows:

        Item0   Item1   Item2   Item3
User0     3        1       2      3
User1     4        3       4      3
User2     3        2       1      5
User3     1        6       5      2
User4     0        0       5      0

As we can see in the above matrix, all users except user 4 rate all items. The code calculates predicted recommendations for user 4.

### Import the required libraries

In [None]:
#Code source written with help from: 
#http://antoinevastel.github.io/machine%20learning/python/2016/02/14/svd-recommender-system.html

import math as mt
import csv
from sparsesvd import sparsesvd #used for matrix factorization
import numpy as np
from scipy.sparse import csc_matrix #used for sparse matrix
from scipy.sparse.linalg import * #used for matrix multiplication

#Note: You may need to install the library sparsesvd. Documentation for 
#sparsesvd method can be found here:
#https://pypi.python.org/pypi/sparsesvd/

### Methods to compute SVD and recommendations

In [None]:
#constants defining the dimensions of our User Rating Matrix (URM)
MAX_PID = 4
MAX_UID = 5

#Compute SVD of the user ratings matrix
def computeSVD(urm, K):
    U, s, Vt = sparsesvd(urm, K)

    dim = (len(s), len(s))
    S = np.zeros(dim, dtype=np.float32)
    for i in range(0, len(s)):
        S[i,i] = mt.sqrt(s[i])

    U = csc_matrix(np.transpose(U), dtype=np.float32)
    S = csc_matrix(S, dtype=np.float32)
    Vt = csc_matrix(Vt, dtype=np.float32)
    
    return U, S, Vt

#Compute estimated rating for the test user
def computeEstimatedRatings(urm, U, S, Vt, uTest, K, test):
    rightTerm = S*Vt 

    estimatedRatings = np.zeros(shape=(MAX_UID, MAX_PID), dtype=np.float16)
    for userTest in uTest:
        prod = U[userTest, :]*rightTerm
        #we convert the vector to dense format in order to get the indices 
        #of the movies with the best estimated ratings 
        estimatedRatings[userTest, :] = prod.todense()
        recom = (-estimatedRatings[userTest, :]).argsort()[:250]
    return recom


### Use SVD to make predictions for a test user id, say 4

In [None]:
#Used in SVD calculation (number of latent factors)
K=2

#Initialize a sample user rating matrix
urm = np.array([[3, 1, 2, 3],[4, 3, 4, 3],[3, 2, 1, 5], [1, 6, 5, 2], [5, 0,0 , 0]])
urm = csc_matrix(urm, dtype=np.float32)

#Compute SVD of the input user ratings matrix
U, S, Vt = computeSVD(urm, K)

#Test user set as user_id 4 with ratings [0, 0, 5, 0]
uTest = [4]
print("User id for whom recommendations are needed: %d" % uTest[0])

#Get estimated rating for test user
print("Predictied ratings:")
uTest_recommended_items = computeEstimatedRatings(urm, U, S, Vt, uTest, K, True)
print(uTest_recommended_items)

### Quiz 4

a.) Change the input matrix row for test userid 4 in the user ratings matrix to the following value. Note the difference in predicted recommendations in this case.

i.) [5 0 0 0]


(Note*: The predicted ratings by the code include the items already rated by test user as well. This has been left purposefully like this for better understanding of SVD).

SVD tutorial: http://web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm

## Understanding Intuition behind SVD

SVD result gives three matrices as output: U, S and Vt (T in Vt means transpose). Matrix U represents user vectors and Matrix Vt represents item vectors. In simple terms, U represents users as 2 dimensional points in the latent vector space, and Vt represents items as 2 dimensional points in the same space.


Next, we print the matrices U, S and Vt and try to interpret them. Think how the points for users and items will look like in a 2 dimensional axis. For example, the following code plots all user vectors from the matrix U in the 2 dimensional space. Similarly, we plot all the item vectors in the same plot from the matrix Vt.


In [None]:
%matplotlib inline
from pylab import *

#Plot all the users
print("Matrix Dimensions for U")
print(U.shape)

for i in range(0, U.shape[0]):
    plot(U[i,0], U[i,1], marker = "*", label="user"+str(i))

for j in range(0, Vt.T.shape[0]):
    plot(Vt.T[j,0], Vt.T[j,1], marker = 'd', label="item"+str(j))    
    
legend(loc="upper right")
title('User vectors in the Latent semantic space')
ylim([-0.7, 0.7])
xlim([-0.7, 0])
show()