# Abel Ykalo & Josette Grinslade's AI Term Project
This is our workspace for collaborative filtering and content based music generation.
For spotify API, reference this page for [setup](https://medium.com/@benfenison/using-spotify-web-api-w-python-6b31a328b26e) and this page for [documentation](https://github.com/plamere/spotipy). Data for the content base filtering was collected [here](https://grinslade.pythonanywhere.com).

## The Problem
Our project was focused on the pratical everyday issue that music listeners struggle to find new songs that they would *actually* listen to. There are many manys to solve this problem, but the most common way that is used in this industry is collaborative filtering. Collaboraive filtering has been done by Spotify, Netflix and many others for recommendations to their users. Collaborative filtering is an algorithm that dictates what users are similar and then suggest content that the similar users enjoy. Another method commonly used in pratice is Content-based filtering. This algorithm simply decides if a user's liked content is similar to other content based off its metadata or attributes then suggests this content. 

## Our Approach
Our approach to this problem similar but with our own implementation. More specifically, we would create two ways of generating suggested songs: Content-based filtering and Collaborative filtering. We would implement these algorithms in our own way and compare it to established generations like Spotify's. We decided we would look at user data to generate 5 songs that the listener would truly enjoy.

### Spotify request of user information
This authetication below requires you to enter a spotify account and grant permission. We left our results of Josette's account incase you did not have access to a spotify account.

In [5]:
import sys
import spotipy
import spotipy.util as util

sc = 'user-top-read'

username = input("Enter username or none: ")
results = {}
if username == "none":
    username = "jojogrin"
token = util.prompt_for_user_token(username=username,scope=sc,client_id='33ebd71536d94caebcf54970eaf4191a', client_secret='87c219c0663d414388f96a6b7da7ee11', redirect_uri='https://grinslade.pythonanywhere.com/api_callback')

Enter username or none: jojogrin


In [6]:
sp = spotipy.Spotify(auth=token)

# User's Top Tracks

In [7]:
import pandas as pd 
results = sp.current_user_top_tracks()
user_data = {}
songs = []
ids = []
count = 0
print(username + "'s Top 20 songs")
print("----------------------------------")
for track in results['items']:
    count += 1
    songs.append(track)
    ids.append(track['id'])
    print(str(count) + '. ' + track['name'] + ' - ' + track['artists'][0]['name'])
    
features = sp.audio_features(ids)

for item in features:
    item.update( {"user": username})
user = pd.DataFrame(features)


jojogrin's Top 20 songs
----------------------------------
1. Every Single Thing - HOMESHAKE
2. Look What You Made Me Do - Taylor Swift
3. thank u, next - Ariana Grande
4. How Long - Charlie Puth
5. God is a woman - Ariana Grande
6. Bonkers - Dizzee Rascal
7. Confident - Demi Lovato
8. Truth Hurts - Lizzo
9. 7 rings - Ariana Grande
10. Can't Fight The Moonlight - LeAnn Rimes
11. J'veux sortir avec un rappeur - Alice et Moi
12. Fitness - Lizzo
13. Bitch Better Have My Money - Rihanna
14. 6 Inch (feat. The Weeknd) - Beyoncé
15. G.O.M.D. - J. Cole
16. 2ON - Bree Runway
17. Havana - Remix - Camila Cabello
18. Burn It Up - HIGH HØØPS
19. break up with your girlfriend, i'm bored - Ariana Grande
20. Talk (feat. Disclosure) - Khalid


# Spotify Suggestions
These suggestions are generated by Spotify off of the current user's top track. Results are different upon every request.

In [8]:
print('Spotify Suggestions from %s\'s Top Song %s - %s' % (username, songs[0]['name'], songs[0]['artists'][0]['name']))
print("----------------------------------")
spot_results = sp.recommendations(seed_tracks=[ids[0]])
for track in spot_results['tracks']:
    print(track['name'] + ' - ' + track['artists'][0]['name'])

Spotify Suggestions from jojogrin's Top Song Every Single Thing - HOMESHAKE
----------------------------------
Czech One - King Krule
All over You - LEISURE
Orpheus Under the Influence - The Buttertones
Suede - NxWorries
How Can I Love You? - Yellow Days
drink i'm sippin on - Yaeji
Candy Wrappers - Summer Salt
Shake - Yeek
Hard To Say Goodbye - Washed Out
Bones - Crumb
Honey Moon - Mac DeMarco
passionfruit - Yaeji
Girl Like You - Toro y Moi
Let It Pass - Jakob Ogawa
Tailwhip - Men I Trust
Panama - Sports
1 4 2 - Inner Wave
Free Room (feat. Appleby) - Ravyn Lenae
Salad Days - Mac DeMarco
You Say I'm in Love - Banes World


# Content-based Suggestions
This is our implementation for content-based filtering. Our derivation goes through each song in our database (~615 songs). For each of the songs we calculate the difference between the user's top tracks and the database song audio features. if the difference falls with in the user's general standard deviation of each audio we add a "point" to the like-ability for the song. These audio features are danceability, energy, key, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence and tempo. Finally, we would suggest the five songs with the highest like-ability for the user. We decided to take the standard deviation of each audio feature since each is on its own scale and we only want to add a point if it is similar to what the user listens to. If a user has a large variety of music they listen to it would be reflected in a wide standard deviation for each of the audio features. 

In [9]:
import time
t0 = time.time()

#user.to_csv(username + "_top.csv", index=True)
data = pd.read_csv('mock_spot_data.csv')
scores = []
song_index = []
possible = 11
norm_prob = []

for topindex, toprow in data.iterrows():
    score = 0
    if not(data.iloc[topindex]['user'] == username):
        for index, row in user.iterrows():
            if abs(user.iloc[index]['danceability'] - data.iloc[topindex]['danceability']) <= user['danceability'].std():
                score+=1
            if abs(user.iloc[index]['energy'] - data.iloc[topindex]['energy']) <= user['energy'].std():
                score+=1
            if abs(user.iloc[index]['key'] - data.iloc[topindex]['key']) <= user['key'].std():
                score+=1
            if abs(user.iloc[index]['loudness'] - data.iloc[topindex]['loudness']) <= user['loudness'].std():
                score+=1
            if abs(user.iloc[index]['mode'] - data.iloc[topindex]['mode']) <= user['mode'].std():
                score+=1
            if abs(user.iloc[index]['speechiness'] - data.iloc[topindex]['speechiness']) <= user['speechiness'].std():
                score+=1
            if abs(user.iloc[index]['acousticness'] - data.iloc[topindex]['acousticness']) <= user['acousticness'].std():
                score+=1
            if abs(user.iloc[index]['instrumentalness'] - data.iloc[topindex]['instrumentalness']) <= user['instrumentalness'].std():
                score+=1
            if abs(user.iloc[index]['liveness'] - data.iloc[topindex]['liveness']) <= user['liveness'].std():
                score+=1
            if abs(user.iloc[index]['valence'] - data.iloc[topindex]['valence']) <= user['valence'].std():
                score+=1
            if abs(user.iloc[index]['tempo'] - data.iloc[topindex]['tempo']) <= user['tempo'].std():
                 score+=1
        scores.append(score/11)
        song_index.append(topindex)
    #print("Score: ", score, " out of ", possible)
    
#normalize the scores and get prob 
min_x = min(scores)
max_x = max(scores)

suggested_songs = []

def sortSecond(val): 
    return val[1]

for i in range(len(scores)):
    x = (scores[i] - min_x) / (max_x - min_x)
    x = min((x),1)
    norm_prob.append(x)
    index = song_index[i]
    #print('%s - %s : Probability - %.2f' % (sp.track(data.iloc[index]['uri'])['name'], sp.track(data.iloc[index]['uri'])['artists'][0]['name'], norm_prob[i]))
    if x >= 0.70: # this threshold can be changed!
        suggested_songs.append((index, x))
        
suggested_songs.sort(key = sortSecond)
top_songs_n = len(suggested_songs) - 6
if 5 > len(suggested_songs):
    top_songs_n = 0
    
print('Content-based Suggestions')
print("----------------------------------")
for n in range(len(suggested_songs)-1, top_songs_n, -1):
    song_i = suggested_songs[n][0]
    prob = suggested_songs[n][1]
    print(sp.track(data.iloc[song_i]['uri'])['name'], '-', sp.track(data.iloc[song_i]['uri'])['artists'][0]['name'])
total = time.time() - t0
print("Runtime:", total)

Content-based Suggestions
----------------------------------
Freaking Out - A R I Z O N A
Strings - iann dior
100 Letters - Halsey
Glass House (feat. Naomi Wild) - Machine Gun Kelly
Truth Hurts - Lizzo
Runtime: 101.86498212814331


## Results

PARTICIPANT # 1

| Songs |User's Top Tracks|Spotify Generation|Spotify - Would add?|Content-based Generation|Content-based - Would add?|
|-------|-----------------|------------------|--------------------|------------------------|--------------------------|
| #1| Every Single Thing - HOMESHAKE| Them Changes - Thundercat | yes| Audio (feat. Sia, Diplo, and Labrinth) - Sia | yes |
| #2| Look What You Made Me Do - Taylor Swift | Sticky - Ravyn Lenae| yes| Freaking Out - A R I Z O N A | yes |
| #3| thank u, next - Ariana Grande| Cariño - The Marías | yes | Strings - iann dior | yes |
| #4| How Long - Charlie Puth | Holy Toledo - Vundabar| no| Peer Pressure - James Bay | no|
| #5| God is a woman - Ariana Grande| Darling - Real Estate | no | 100 Letters - Halsey | no |
|   |                               |                       | 3/5 |    | 3/5                        |

PARTICIPANT # 2

| Songs |User's Top Tracks|Spotify Generation|Spotify - Would add?|Content-based Generation|Content-based - Would add?|
|-------|-----------------|------------------|--------------------|------------------------|--------------------------|
| #1| MIDDLE CHILD - J. Cole| Gold Roses (feat. Drake) - Rick Ross | yes| I Got It - Manus | yes |
| #2| Blessings - Big Sean | Paramedic! - SOB X RBE| yes| Calma - Alan Walker Remix - Pedro Capó | no |
| #3| Isis - Joyner Lucas| Last Time That I Checc'd (feat. YG) - Nipsey Hussle | no | Wake Up - Iamjakehill | yes |
| #4| Under The Sun (with J. Cole & Lute feat. DaBaby) - Dreamville | REEL IT IN - Aminé| yes| Ladders - Mac Miller | no|
| #5| Planez - Jeremih| GOKU - Jaden | no | Just Friends - Virginia To Vegas | yes |
|   |                               |                       | 3/5 |    | 2/5                        |


PARTICIPANT # 3

| Songs |User's Top Tracks|Spotify Generation|Spotify - Would add?|Content-based Generation|Content-based - Would add?|
|-------|-----------------|------------------|--------------------|------------------------|--------------------------|
| #1| Every Single Thing - HOMESHAKE| Ugotme - Omar Apollo | no| Stronger Than You - Steven Universe | no |
| #2| Never Would Have Made It - Teyana Taylor| Cool with You - Her's| no| Mean It - Lauv | yes |
| #3| Habit - Blood Cultures Remix - Cool Company| CHANCES - KAYTRANADA | yes | Bambi - Hippo Campus | no |
| #4| Paranoid (feat. B.o.B) - Ty Dolla $ign | Hard To Say Goodbye - Washed Out| yes| All Love - FLETCHER | no|
| #5| Issues/Hold On - Teyana Taylor| Shalala - Moses Gunn Collective | no | Stacy - Quinn XCII | no |
|   |                               |                       | 2/5 |    | 1/5                        |

PARTICIPANT # 4

| Songs |User's Top Tracks|Spotify Generation|Spotify - Would add?|Content-based Generation|Content-based - Would add?|
|-------|-----------------|------------------|--------------------|------------------------|--------------------------|
| #1| All Love - FLETCHER| Another Summer Night Without You - Alexander 23 | no| Hurt Feelings - Mac Miller | no |
| #2| Just Friends - Virginia to Vegas| whywhywhy - MisterWives| no| Stronger Than You - Steven Universe | no |
| #3| Diamond Days - Elephante| Wasted - MKTO| no | Strings - iann dior | no |
| #4| Undrunk - FLETCHER | Champion - Elina| no | Calma - Alan Walker Remix - Pedro Capó | no|
| #5| You've Changed (feat. Angeline) - Vaance| Outnumbered - Dermot Kennedy | no | Stacy - Quinn XCII | yes |
|   |                               |                       | 0/5 |    | 1/5                        |

PARTICIPANT # 5

| Songs |User's Top Tracks|Spotify Generation|Spotify - Would add?|Content-based Generation|Content-based - Would add?|
|-------|-----------------|------------------|--------------------|------------------------|--------------------------|
| #1| Going Bad (feat. Drake) - Meek Mill| Baby (Lil Baby feat. DaBaby) - Quality Control | yes| Stronger Than You - Steven Universe | no |
| #2| Take It All - Adele| Swap Meet - Tyga| yes | Mean It - Lauv | no |
| #3| Amor Eterno - Vicente Fernandez| Die Young - Roddy Rich| yes | Stacy - Quinn XCII | no |
| #4| Nama Look At Me Now - Galantis| Fukk Sleep - A$AP Rocky| yes | IDGAF - Dua Lipa | no|
| #5| No One Mourns The Wicked - Kristin Chenoweth| Creep On Me (feat. French Montana & DJ Snake) - GASHI | yes | All Love - FLETCHER | yes |
|   |                               |                       | 5/5 |    | 1/5                        |

PARTICIPANT # 6

| Songs |User's Top Tracks|Spotify Generation|Spotify - Would add?|Content-based Generation|Content-based - Would add?|
|-------|-----------------|------------------|--------------------|------------------------|--------------------------|
| #1|IV. Sweatpants - Childish Gambino| Blood On The Leaves - Kanye West  | no| Rise (Sing It Loud) - Caroline Jones  | yes |
| #2|Show Yourself - Idina Menzel| Knock Knock - Mac Miller  | no| Ways to Fake It - CRX   | no |
| #3|Into the Unknown - Idina Menzel| Long Night (feat. Chance the Rapper) - Hoodie Allen | no |Bitch Better Have My Money - Rihanna| yes |
| #4|The Greatest Show - Hugh Jackman |The Show Goes On - Lupe Fiasco| yes | Circles - Post Malone  | yes |
| #5|Penny Rabbit and Summer Bear - Kishi Bashi|Who That Be - Rich Brian| no | All Love - FLETCHER  | no |
|   |                               |                       | 1/5 |    | 3/5                        |

![image.png](https://github.com/jhgrins/music_generation_ai/blob/master/chart.png?raw=true)

To compare and evaluate our implementation of content-based filtering, we compared the results to a spotipy implementation. Spotify's suggestions are based off a single seed in this case, another song. 

The content-based filtering we implemented took the current user's top 20 songs and calculated the averages of each attribute. If a song from the database's attribute  fell within the standard deviation of the user's songs then one point would be awarded, out of a eleven. Finally, after normalizing the data, the top 5 songs with the most points would be suggested the user. Spotify's suggestion algorithm is not made public. For the purpose of these experiments, we used spotify's API to generate five suggested songs given the user's top song.

For each participant, we generated two suggestion lists with each content-based algorithm. We then had the participants listen to each of the songs suggested and decide if each song was a song the liked. The data we collected did not have a well define trend. This data shows that spotify suggestion tends to be more succesful than our implementation.

## If we had more time

If we had more time for this project, we would have perfected our algorithms and acquired access from spotify to use a bigger data set instead of instead of generating our own database and using an additional data set. It would have been interesting to see how succesful our content-based algorithm would have been depending on small adjustments we would have made. For example, would our alogrithm be more succesful if we how less similar songs so that our users have a more diverse set of suggestions? What if we compare each top track of the user to the database of songs instead of using the averages of all the user's top 20 tracks so that the user could have a variety of suggestion instead of a generalization?

# Collaborative Filter

Following is our implementation for collaborative filtering. Our design begins by taking all songs offered to us by the million song dataset site. We begin organizing the data in a way that is easy to manage. We begin by creating a dataframe that contains all the information we need. In this implementation we create a large dictionary that stores profiles on each of the users found in the dataset. In these profiles we have a list of tuples, where the tuples contain the song title as well as the amount of times the song was listened to  by the specific user. This portion of the code is directly below and takes a few minutes to run. Then, when given a specific user to recommend to, we score every other user(potential partner) in relation to the given user. Everytime we get a match on a song we increase the potential partner's core by 1 * the number of times the song was listend to by the user. We do this because not all songs are created equally, we like some more than others. We then grab the potential partner with the highest score and look at their top five most listened to songs and remove all matching songs that the user has already listened to and add the next highest to replace it. We then recommend these unique top five songs to our user.  

In [None]:
import pandas
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split

triplets_file = 'https://static.turi.com/datasets/millionsong/10000.txt'
songs_metadata_file = 'https://static.turi.com/datasets/millionsong/song_data.csv'

song_df_1 = pandas.read_csv(triplets_file,header=None, sep='\t')
song_df_1.columns = ['user_id', 'song_id', 'listen_count']
song_df_2 =  pandas.read_csv(songs_metadata_file)

song_df = pandas.merge(song_df_1, song_df_2.drop_duplicates(['song_id']), on="song_id", how="left")

grouped_sum = song_df['listen_count'].sum()
song_df['percentage']  = song_df['listen_count'].div(grouped_sum)*100

train_data, test_data = train_test_split(song_df, test_size = 0.20, random_state=0)

users = song_df['user_id'].unique()
songs = song_df['title'].unique()

userDataBanks = {}

for index in range(len(song_df)):
    if song_df.iloc[index]['user_id'] in userDataBanks:
        songAndCount = (song_df.iloc[index]['title'], song_df.iloc[index]['listen_count'])
        userDataBanks[song_df.iloc[index]['user_id']].append(songAndCount)
    else:
        songAndCount = (song_df.iloc[index]['title'], song_df.iloc[index]['listen_count'])
        userDataBanks[song_df.iloc[index]['user_id']] = [songAndCount]
        
print("Data pool has been created")
    

This code below is where we perform collaborative filtering. A user is passed into the method collaborative_filter and it creates a score for every user besides the one that is passed in (which we call potential partners). We recommend the top five songs from the highest scoring potential partner that are unique to the user's song history

In [64]:
def recommend_me(partner, userList):
    songs = userDataBanks[partner]
    songs.sort(key=lambda tup: tup[1], reverse=True)
    for i in range(5):
        if song_in_list(songs[i], userList):
            songs.pop(i)
            i -= 1
    return songs[:5]

def song_in_list(song, partnerList):
    for songs in partnerList:
        if songs[0] == song:
            return True
    return False

def collaborative_filter(user):
    userScores = {}
    maximumScore = 0
    maximumPartner = user
    for partner in users:
        if partner == user:
            continue
        userScores[partner] = 0
        for index in range(len(userDataBanks[user])):
            if song_in_list(userDataBanks[user][index][0], userDataBanks[partner]):
                userScores[partner] += userDataBanks[user][index][1]
        if userScores[partner] > maximumScore:
            maximumScore = userScores[partner]
            maximumPartner = partner
    rtnList = recommend_me(maximumPartner, userDataBanks[user])
    return rtnList

We used five of the users in the dataset we had and entered them into our program and tabled out the five songs that was recommended to them. Immediately under our table we have code that displays the users (shortened), an input box for which user you want a recommendation to, then after our code is ran (few seconds), You will have two lists given to you. The first one is a list of all the songs the given user has listened to with the number of times the song has been played and the artist name. The way accuracy was determined was creating an identical playlist on spotify to what our user had. Then viewing the recommendations given by spotify we would see if one of our recommendations match. We refresh the recommendation list another two times to have a total of 3 recommendation lists.

|   | User ID                                  | Song 1             |             Song 2 |                                                              Song 3 |                                                              Song 4 | Song 5                           | Accuracy |
|---|------------------------------------------|--------------------|-------------------:|--------------------------------------------------------------------:|--------------------------------------------------------------------:|----------------------------------|----------|
| 0 | b80344d063b5ccb3212f76538f3d9e43d87dca9e | Save Room          |             Heaven |                                             It Don't Have To Change |                                                      Lesson Learned | Jamaica Roots II(Agora E Sempre) | 1/5      |
| 1 | 85c1f87fea955d09b4bec2e36aee110927aedf9a | Puppets            |  The Best of Times |                                                     I Need A Dollar |                                               Guerrilla Monsoon Rap | Kiss (LP Version)                | 0/5      |
| 2 | a955513fb89fdb0d5e8437a5cf8a9b3a0abad4d5 | Master Of Puppets  |            Majesty | Horn Concerto No. 4 in E flat K495: II. Romance (Andante cantabile) | I Need A Girl (Part One) (Featuring Usher & Loon) (Amended Version) | Piggy                            | 0/5      |
| 3 | 276e43ad698705e5011e5091f367d951b21246f5 | Make Her Say       | Ride The Lightning |                                                        You Gotta Be |                                                 Oh_ The Devastation | Let's Go [from 'Salvation']      | 2/5      |
| 4 | d8bfd4ec88f0f3773a9e022e3c1a0f1d3b7b6a92 | Beautiful Stranger |       Day 'N' Nite |                                                           Alejandro |                                      Dog Days Are Over (Radio Edit) | Around The World (Radio Edit)    | 3/5      |