# Song Recommender Assignment

In this module, we focused on building recommender systems to find products, music and movies that interest users. We also built an exciting iPython notebook for recommending songs, which compared the simple popularity-based recommendation with a personalized model, and showed the significant improvement provided by personalization.

In this assignment, we are going to explore the song data and the recommendations made by our model. In the process, you are going to learn how to use one of the most important data manipulation primitives, groupby. These techniques will be important to building the intelligent application in your capstone project.

In [None]:
# Importamos de la sesión de teoría

import graphlab

song_data = graphlab.SFrame('song_data.gl/')

train_data,test_data = song_data.random_split(.8,seed=0)

personalized_model = graphlab.item_similarity_recommender.create(train_data,
                                                                user_id='user_id',
                                                                item_id='song')

In [3]:
song_data.head(2)

user_id,song_id,listen_count,title,artist
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOAKIMP12A8C130995,1,The Cove,Jack Johnson
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Paco De Lucia

song
The Cove - Jack Johnson
Entre Dos Aguas - Paco De Lucia ...


## 1. Counting unique users

 The method .unique() can be used to select the unique elements in a column of data. In this question, you will compute the number of unique users who have listened to songs by various artists. For example, to find out the number of unique users who listened to songs by 'Kanye West', all you need to do is select the rows of the song data where the artist is 'Kanye West', and then count the number of unique entries in the ‘user_id’ column. Compute the number of unique users for each of these artists: 'Kanye West', 'Foo Fighters', 'Taylor Swift' and 'Lady GaGa'.

In [9]:
print "Kanye West:"
print len(song_data[song_data['artist'] == 'Kanye West']['user_id'].unique())

print "Foo Fighters:"
print len(song_data[song_data['artist'] == 'Foo Fighters']['user_id'].unique())

print "Taylor Swift:"
print len(song_data[song_data['artist'] == 'Taylor Swift']['user_id'].unique())

print "Lady GaGa:"
print len(song_data[song_data['artist'] == 'Lady GaGa']['user_id'].unique())

Kanye West:
2522
Foo Fighters:
2055
Taylor Swift:
3246
Lady GaGa:
2928


## 2. Using groupby-aggregate to find the most popular and least popular artist

In [11]:
gb_artist = song_data.groupby(key_columns='artist', 
                  operations={'total_count': graphlab.aggregate.SUM('listen_count')})

In [13]:
# Menos escuchado
gb_artist.sort('total_count').head(1)

artist,total_count
William Tabbert,14


In [14]:
# Más escuchado
gb_artist.sort('total_count', ascending=False).head(1)

artist,total_count
Kings Of Leon,43218


## 3.  Using groupby-aggregate to find the most recommended songs

 Now that we learned how to use .groupby() to compute aggregates for each value in a column, let’s use to find the song that is most recommended by the personalized_model model we learned in the iPython notebook above. Follow these steps to find the most recommended song:

Split the data into 80% training, 20% testing, using seed=0, as was done in the iPython notebook above.Train an item_similarity_recommender, as done in the iPython notebook, using the training data.Next, we are going to make recommendations for the users in the test data, but there are over 200,000 users (58,628 unique users) in the test set. Computing recommendations for these many users can be slow in some computers. Thus, we will use only the first 10,000 users only in this question. Using this command to select this subset of users:

In [16]:
subset_test_users = test_data['user_id'].unique()[0:10000]

Let’s compute one recommended song for each of these test users. Use this command to compute these recommendations:

In [17]:
personalized_model.recommend(subset_test_users,k=1)

user_id,song,score,rank
b048033af070b5dbb18d5d0e5 f334c9390611b04 ...,Fantasy - The xx,0.0377200168112,1
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...,Cuando Pase El Temblor - Soda Stereo ...,0.0194504536115,1
ed04954d5b6001c7945c6ac71 686c3bd4ecdacb3 ...,Coming Your Way - Iration,0.0313142140706,1
b1e6e9563da324641e644c769 b7edf202186de47 ...,Pimpa's Paradise - Damian Marley / Stephen Marl ...,0.0694444378217,1
02f015d32ac2cd1e52d26e3ec 36048711dd5711b ...,Where The Boat Leaves From (Album) - Zac Brown ...,0.0615360885859,1
91b986eeb5d81eec60dc4b136 f04c0cfd662d658 ...,Jezebel - Sade,0.0588785750525,1
f933855d675606737fdc191e9 edff7625d08aae8 ...,Schießt die Deutschen raus - Mario Lang ...,0.0349425778669,1
4867d5516a280db13695b9b9c 7ce6b574f34c6b4 ...,Two Steps_ Twice - Foals,0.0104654913857,1
968f1baebc490d3c6999ee6c8 5c5cab8b726b347 ...,Me_ Myself And I - Beyoncé ...,0.0166700103066,1
c067c22072a17d33310d7223d 7b79f819e48cf42 ...,Grind With Me (Explicit Version) - Pretty Ricky ...,0.0459424376488,1


Finally, we can use .groupby() to find the most recommended song! 

In [23]:
# Menos recomendada
gb_song.sort('count').head(1)

song,count
Hubcap - Sleater-kinney,12


In [24]:
# Más recomendada
gb_song.sort('count', ascending=False).head(1)

song,count
Sehr kosmisch - Harmonia,5970
