# alexrainbirdMusic Spotify Recommender System (Part 3 of 3) 

The goal of this project is to build a recommender system that recommends song similar to songs found on user: alexrainbirdMusic's playlist. 

Knowing what song to incorporate onto alexrainbirdMusic's playlists is extremely important since his profitability is tied to the number of views he has on his playlist videos on YouTube. alexrainbirdMusic has over 1.2M subscribers on YouTube and over 116k followers on Spotify. The more playlists he creates that ties to his goal of bringing the "finest independent pop, folk and rock music" to his followers, the more views and profit he will get. 

Currently, alexrainbirdMusic has artists submit their songs on his website and he reviews the song to determine if it should be added to a playlist. This is a much more manual and tedius process for selecting songs. 

In the previous 2 notebooks: 
- **Get Data**
- **Exploratory Data Analysis**

I gathered playlist, artist, and track data from user: alexrainbirdMusic's playlists, cleaned it, and conducted exploratory data analysis to determine patterns within his playlists.

This notebook details the preparation of the data and creation of the **recommender system using content-based filtering and cosine similarity.** The result is a recommender system that recommends songs similar to songs found on user: alexrainbirdMusic's playlists. 

Using this recommender system, alexrainbirdMusic can generate recommendations for songs to incorporate onto his playlists. 

## Structure: 
**0. Download & Import Packages**: Download and import relevant packages. 

**1. Recommender Model Prep**: Create functions used to prep the data for modeling. 

**2. Create Feature Sets**: Bring all of the functions together to create the features that will be used in the recommender system and concat all features into a single dataframe output.

**3. Convert Features to Vector**: Concat the songs into a summarization vector and find the cosine similarity between the vectors. 

**4. Calculate Cosine Similarity & Generate Recommendations:** Find similarity between the songs that overlap on the non-alexrainbirdMusic playlist and the alexrainbirdMusic playlists and recommend tracks on the non-alexrainbirdMusic playlist.


# Credits
This notebook is built on top of Eric Chang's: https://github.com/enjuichang/PracticalDataScience-ENCA/tree/main/notebooks.

# 0. Download & Import Packages


In [1]:
import pandas as pd
from textblob import TextBlob, Word, Blobber
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer

# 1. Recommender Model Prep
- Remove Duplicates
- Get Relevant Columns
- Get Subjectivity and Polarity of Track Names 
- Perform One Hot Encoding for Genres
- Normalize Data 
- Combine Features into One DataFrame

In [2]:
model_data_arb = pd.read_csv('model_data_arb.csv')
model_data_not_arb = pd.read_csv('model_data_not_arb.csv')

In [3]:
model_data_arb.head()

Unnamed: 0,artist,artist_id,artist_pop,artist_genres,album,album_id,track_name,track_id,track_pop,danceability,...,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,playlist,user,artists_song
0,SYML,6AyATGg7mDgBlZ4N5uNog0,74.0,[pop],Chariot,3R9K985Qq28VqqzeV65bZ0,Chariot,1ZHboJHdP97xyCaQWuP9h6,50.0,0.516,...,0.00328,0.0202,0.0937,0.556,168.027,200693.0,4.0,Indie/Rock/Alt Compilation - February 2023 (al...,alexrainbirdMusic,SYMLChariot
1,Various Artists,0LyfQWJT6nXafLPZqxe9Of,0.0,[unknown],Indie / Rock / Alt Compilation: March 2023,2jYAyFnU1j39P7dQ5vhE7u,Embrace It,6bGS1VVUZk5MyQ9pYw2a8Q,11.0,0.572,...,0.000924,0.0,0.0504,0.444,128.988,206047.0,4.0,Indie/Rock/Alt Compilation - February 2023 (al...,alexrainbirdMusic,Various ArtistsEmbrace It
2,Boo Seeka,1SFz3S9eSUTc49ysstadiO,53.0,"[aussietronica, australian indie, australian pop]",Stories,5C6jaEd7bW1pHkpEsXmAO7,Stories,0CWDijxzzTvMlp4c4U2LIS,46.0,0.815,...,0.00962,8.8e-05,0.115,0.949,127.981,164074.0,4.0,Indie/Rock/Alt Compilation - February 2023 (al...,alexrainbirdMusic,Boo SeekaStories
3,Quiet Houses,6oeIyvCenamQzsTMYnuZTC,31.0,[unknown],Hot and Clumsy,4mVZsDCMoexn0hrJxFy9f6,Hot and Clumsy,50aFetaKwC3pKTLyUWh7UZ,34.0,0.48,...,0.007,0.0133,0.31,0.782,160.036,271600.0,4.0,Indie/Rock/Alt Compilation - February 2023 (al...,alexrainbirdMusic,Quiet HousesHot and Clumsy
4,Juliana Madrid,6RhkgeqhRai3jy4ULSlxFx,31.0,[unknown],Madonna,2uNNQws3ix4ZyVxueGd2un,Madonna,1eCdsyHZVqHSi5jhwc4TLs,35.0,0.618,...,0.529,0.0,0.0695,0.475,104.983,192133.0,4.0,Indie/Rock/Alt Compilation - February 2023 (al...,alexrainbirdMusic,Juliana MadridMadonna


In [4]:
model_data_not_arb.head()

Unnamed: 0,artist,artist_id,artist_pop,artist_genres,album,album_id,track_name,track_id,track_pop,danceability,...,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,playlist,user,artists_song
0,Tigercub,6ekYAO2D1JkI58CF4uRRqw,47,"[brighton indie, modern alternative rock, mode...",Play My Favourite Song,48kUWLzmZrvwHqYrpOkBat,Play My Favourite Song,4uBs8miGwBykRYjrbAO5kV,42,0.509,...,0.00197,0.00633,0.265,0.523,95.961,155603,4,New Alt-Rock Mixtape,Spotify,TigercubPlay My Favourite Song
1,Crawlers,2xtmoxSauQs0TQFUoHmbfy,52,"[indie pop, warrington indie]",Crawlers - EP,4wePwIhGnXxJ3tRFAusMAE,Come Over (Again),4PDJDIdWxNN1AlnbrKkoPf,60,0.498,...,0.00393,0.0,0.0675,0.235,93.018,254419,4,New Alt-Rock Mixtape,Spotify,CrawlersCome Over (Again)
2,The Luka State,6DaXEbr3LdLNcui8pZf6AF,44,"[english indie rock, modern alternative rock, ...",More Than This,4xENjLbcy7IdEHA3JQzYRx,More Than This,1F3VhVtaMqUqKhXdpA3itF,36,0.443,...,2.6e-05,0.00332,0.298,0.333,112.307,182100,4,New Alt-Rock Mixtape,Spotify,The Luka StateMore Than This
3,The Backseat Lovers,6p2HnfM955TI1bX34dkLnI,70,"[indie pop, modern rock, slc indie]",When We Were Friends,3TSMSh5dai7WEnEGOoMXBZ,Kilby Girl,1170VohRSx6GwE6QDCHPPH,73,0.329,...,0.0578,0.0352,0.113,0.225,162.279,282206,4,New Alt-Rock Mixtape,Spotify,The Backseat LoversKilby Girl
4,Fleshwater,6P5ccCJCe8A4s9tDSTNFzF,52,"[dreamo, grungegaze]",We're Not Here to Be Loved,0hm7PiBu72tRliLqLfiKy1,Kiss the Ladder,41QBT1Al5RQ9u9UIHOuXnj,56,0.122,...,9e-06,0.000324,0.403,0.25,174.248,77467,4,New Alt-Rock Mixtape,Spotify,FleshwaterKiss the Ladder


In [5]:
# convert artist genres to list
#model_data_arb['artist_genres_list'] = model_data_arb.artist_genres.str[1:-1].str.split(',').tolist()
#model_data_not_arb['artist_genres_list'] = model_data_not_arb.artist_genres.str[1:-1].str.split(',').tolist()

## 1a. Remove Duplicates

Drop duplicates based on a combination of strings "artist" + "track name". 

In [6]:
model_data_arb = model_data_arb.drop_duplicates('artists_song')
model_data_not_arb = model_data_not_arb.drop_duplicates('artists_song')
songDF = model_data_not_arb
songDF.head()

Unnamed: 0,artist,artist_id,artist_pop,artist_genres,album,album_id,track_name,track_id,track_pop,danceability,...,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,playlist,user,artists_song
0,Tigercub,6ekYAO2D1JkI58CF4uRRqw,47,"[brighton indie, modern alternative rock, mode...",Play My Favourite Song,48kUWLzmZrvwHqYrpOkBat,Play My Favourite Song,4uBs8miGwBykRYjrbAO5kV,42,0.509,...,0.00197,0.00633,0.265,0.523,95.961,155603,4,New Alt-Rock Mixtape,Spotify,TigercubPlay My Favourite Song
1,Crawlers,2xtmoxSauQs0TQFUoHmbfy,52,"[indie pop, warrington indie]",Crawlers - EP,4wePwIhGnXxJ3tRFAusMAE,Come Over (Again),4PDJDIdWxNN1AlnbrKkoPf,60,0.498,...,0.00393,0.0,0.0675,0.235,93.018,254419,4,New Alt-Rock Mixtape,Spotify,CrawlersCome Over (Again)
2,The Luka State,6DaXEbr3LdLNcui8pZf6AF,44,"[english indie rock, modern alternative rock, ...",More Than This,4xENjLbcy7IdEHA3JQzYRx,More Than This,1F3VhVtaMqUqKhXdpA3itF,36,0.443,...,2.6e-05,0.00332,0.298,0.333,112.307,182100,4,New Alt-Rock Mixtape,Spotify,The Luka StateMore Than This
3,The Backseat Lovers,6p2HnfM955TI1bX34dkLnI,70,"[indie pop, modern rock, slc indie]",When We Were Friends,3TSMSh5dai7WEnEGOoMXBZ,Kilby Girl,1170VohRSx6GwE6QDCHPPH,73,0.329,...,0.0578,0.0352,0.113,0.225,162.279,282206,4,New Alt-Rock Mixtape,Spotify,The Backseat LoversKilby Girl
4,Fleshwater,6P5ccCJCe8A4s9tDSTNFzF,52,"[dreamo, grungegaze]",We're Not Here to Be Loved,0hm7PiBu72tRliLqLfiKy1,Kiss the Ladder,41QBT1Al5RQ9u9UIHOuXnj,56,0.122,...,9e-06,0.000324,0.403,0.25,174.248,77467,4,New Alt-Rock Mixtape,Spotify,FleshwaterKiss the Ladder


## 1b. Create Genres List from Artist Genres

Convert "artist_genres" to a list.

In [7]:
def genre_preprocess(df):
    '''
    Preprocess the genre data
    '''
    df['artist_genres_list'] = df['artist_genres'].apply(lambda x: x.split(" "))
    return df


## 1c. Get Relevant Columns 
Select the columns for the recommender system.

In [8]:
# Select useful columns
def select_cols(df):
    '''
       Select useful columns
       
    '''
    return df[['artist', 'artist_id', 'artist_pop', 'artist_genres', "artist_genres_list",'album',
       'track_name', 'track_id','track_pop','danceability', 'energy', 'key',
       'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness',
       'liveness', 'valence', 'tempo']]

## 1d. Get Subjectivity and Polarity of Track Names 

Using TextBlob package, perform sentiment analysis on the track names. 

- **Subjectivity**
    - Score <1/3 = Low
    - Score > 1/3 = High
    - Score 1/3 <> 1/3 = Medium
- **Polarity**
    - Score <0 = Negative
    - Score > 1 = Positive
    - Score 0 <> 1 = Neutral

In [9]:
from textblob import TextBlob, Word, Blobber

def getSubjectivity(text):
    '''
    Getting the Subjectivity using TextBlob
    '''
    return TextBlob(text).sentiment.subjectivity 

def getPolarity(text):
    '''
    Getting the Polarity  using TextBlob
    '''
    return TextBlob(text).sentiment.polarity

def getAnalysis(score, task="polarity"):
    '''
      Categorizing the Polarity & Subjectivity score
    '''
    if task == "subjectivity":
        if score < 1/3:
            return "low"
        elif score >1/3:
            return "high"
        else:
            return "medium"
    else:
        if score <0:
            return "negative"
        elif score == 0:
            return "neutral"
        else:
            return "positive"

        
def sentiment_analysis(df, text_col):
    '''
    Perform sentiment analysis on text
    ---
    Input:
    df (pandas dataframe): Dataframe of interest
    text_col (str): column of interest
    '''
    df['subjectivity'] = df[text_col].apply(getSubjectivity).apply(lambda x: getAnalysis(x,"subjectivity"))
    df['polarity'] = df[text_col].apply(getPolarity).apply(getAnalysis)
    return df

## 1e. Perform One Hot Encoding for Artist Genres 

Since there are multiple genres, use one-hot-encoding to create dummy variables for every genre: Ex. genre | indie. 

In [10]:
def ohe_prep(df, column, new_name): 
    ''' 
    Create One Hot Encoded features of a specific column
    ---
    Input: 
    df (pandas dataframe): Spotify Dataframe
    column (str): Column to be processed
    new_name (str): new column name to be used
        
    Output: 
    tf_df: One-hot encoded features 
    '''
    
    tf_df = pd.get_dummies(df[column])
    feature_names = tf_df.columns
    tf_df.columns = [new_name + "|" + str(i) for i in feature_names]
    tf_df.reset_index(drop = True, inplace = True)    
    return tf_df


## 1f. Normalize 

Since some of the columns are at different scales, use the MinMaxScaler to transform the data. Key features to scale are:  
- Artist Popularity
- Track Popularity
- Key

In [11]:
songDF.artist_pop.describe()

count    807.000000
mean      44.371747
std       21.216492
min        0.000000
25%       30.000000
50%       47.000000
75%       60.000000
max       99.000000
Name: artist_pop, dtype: float64

In [12]:
songDF.track_pop.describe()

count    807.000000
mean      33.479554
std       19.582345
min        0.000000
25%       19.000000
50%       36.000000
75%       47.000000
max       86.000000
Name: track_pop, dtype: float64

In [13]:
songDF.key.describe()

count    807.000000
mean       5.350682
std        3.644867
min        0.000000
25%        2.000000
50%        6.000000
75%        9.000000
max       11.000000
Name: key, dtype: float64

In [14]:
from sklearn.preprocessing import MinMaxScaler

def scale(df, column_to_scale):
    col = df[[column_to_scale]].reset_index(drop = True)
    scaler = MinMaxScaler()
    col_scaled = pd.DataFrame(scaler.fit_transform(col), columns = col.columns)
    return col_scaled

# 1g. Combine Preprocessing Functions

In [15]:
def playlist_preprocess(df):
    '''
    Preprocess imported playlist
    '''
    df = genre_preprocess(df)
    df = select_cols(df)

    return df

### Playlist (Non AlexRainBirdMusic) 

In [16]:
songDF = playlist_preprocess(songDF)
songDF.head()

Unnamed: 0,artist,artist_id,artist_pop,artist_genres,artist_genres_list,album,track_name,track_id,track_pop,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,Tigercub,6ekYAO2D1JkI58CF4uRRqw,47,"[brighton indie, modern alternative rock, mode...","[[brighton, indie,, modern, alternative, rock,...",Play My Favourite Song,Play My Favourite Song,4uBs8miGwBykRYjrbAO5kV,42,0.509,0.767,8,-5.686,1,0.0579,0.00197,0.00633,0.265,0.523,95.961
1,Crawlers,2xtmoxSauQs0TQFUoHmbfy,52,"[indie pop, warrington indie]","[[indie, pop,, warrington, indie]]",Crawlers - EP,Come Over (Again),4PDJDIdWxNN1AlnbrKkoPf,60,0.498,0.713,2,-7.738,1,0.0278,0.00393,0.0,0.0675,0.235,93.018
2,The Luka State,6DaXEbr3LdLNcui8pZf6AF,44,"[english indie rock, modern alternative rock, ...","[[english, indie, rock,, modern, alternative, ...",More Than This,More Than This,1F3VhVtaMqUqKhXdpA3itF,36,0.443,0.979,0,-1.341,0,0.13,2.6e-05,0.00332,0.298,0.333,112.307
3,The Backseat Lovers,6p2HnfM955TI1bX34dkLnI,70,"[indie pop, modern rock, slc indie]","[[indie, pop,, modern, rock,, slc, indie]]",When We Were Friends,Kilby Girl,1170VohRSx6GwE6QDCHPPH,73,0.329,0.444,1,-9.973,1,0.0417,0.0578,0.0352,0.113,0.225,162.279
4,Fleshwater,6P5ccCJCe8A4s9tDSTNFzF,52,"[dreamo, grungegaze]","[[dreamo,, grungegaze]]",We're Not Here to Be Loved,Kiss the Ladder,41QBT1Al5RQ9u9UIHOuXnj,56,0.122,0.988,4,-3.41,1,0.135,9e-06,0.000324,0.403,0.25,174.248


### Playlist Test (AlexRainBirdMusic) 

In [17]:
model_data_arb['artist_genres'] = model_data_arb['artist_genres'].astype(str)
model_data_arb['track_name'] = model_data_arb['track_name'].astype(str)

model_data_arb = playlist_preprocess(model_data_arb)
model_data_arb.head()

Unnamed: 0,artist,artist_id,artist_pop,artist_genres,artist_genres_list,album,track_name,track_id,track_pop,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,SYML,6AyATGg7mDgBlZ4N5uNog0,74.0,[pop],[[pop]],Chariot,Chariot,1ZHboJHdP97xyCaQWuP9h6,50.0,0.516,0.915,2.0,-5.846,1.0,0.0545,0.00328,0.0202,0.0937,0.556,168.027
1,Various Artists,0LyfQWJT6nXafLPZqxe9Of,0.0,[unknown],[[unknown]],Indie / Rock / Alt Compilation: March 2023,Embrace It,6bGS1VVUZk5MyQ9pYw2a8Q,11.0,0.572,0.885,2.0,-5.913,1.0,0.0473,0.000924,0.0,0.0504,0.444,128.988
2,Boo Seeka,1SFz3S9eSUTc49ysstadiO,53.0,"[aussietronica, australian indie, australian pop]","[[aussietronica,, australian, indie,, australi...",Stories,Stories,0CWDijxzzTvMlp4c4U2LIS,46.0,0.815,0.704,6.0,-5.061,0.0,0.0453,0.00962,8.8e-05,0.115,0.949,127.981
3,Quiet Houses,6oeIyvCenamQzsTMYnuZTC,31.0,[unknown],[[unknown]],Hot and Clumsy,Hot and Clumsy,50aFetaKwC3pKTLyUWh7UZ,34.0,0.48,0.706,3.0,-9.73,1.0,0.0395,0.007,0.0133,0.31,0.782,160.036
4,Juliana Madrid,6RhkgeqhRai3jy4ULSlxFx,31.0,[unknown],[[unknown]],Madonna,Madonna,1eCdsyHZVqHSi5jhwc4TLs,35.0,0.618,0.763,5.0,-6.412,0.0,0.0347,0.529,0.0,0.0695,0.475,104.983


# 2. Create Feature Sets

Bring all of the functions together to create the features that will be used in the recommender system and concat all features into a single dataframe output.

In [18]:
from sklearn.feature_extraction.text import TfidfVectorizer

def create_feature_set(df, float_cols):
    '''
    Process spotify df to create a final set of features that will be used to generate recommendations
    ---
    Input: 
    df (pandas dataframe): Spotify Dataframe
    float_cols (list(str)): List of float columns that will be scaled
            
    Output: 
    final (pandas dataframe): Final set of features 
    '''
    
    # Tfidf genre lists
    tfidf = TfidfVectorizer()
    tfidf_matrix =  tfidf.fit_transform(df['artist_genres'])#.apply(lambda x: " ".join(x)))
    genre_df = pd.DataFrame(tfidf_matrix.toarray())
    genre_df.columns = ['genre' + "|" + i for i in tfidf.get_feature_names()]
    genre_df.drop(columns='genre|unknown') # drop unknown genre
    genre_df.reset_index(drop = True, inplace=True)
    
    # Sentiment analysis
    df = sentiment_analysis(df, "track_name")

    # One-hot Encoding
    subject_ohe = ohe_prep(df, 'subjectivity','subject') * 0.3
    polar_ohe = ohe_prep(df, 'polarity','polar') * 0.5
    key_ohe = ohe_prep(df, 'key','key') * 0.5
    mode_ohe = ohe_prep(df, 'mode','mode') * 0.5

    # Normalization
    # Scale popularity columns
    #pop = df[["artist_pop","track_pop"]].reset_index(drop = True)
    #scaler = MinMaxScaler()
    #pop_scaled = pd.DataFrame(scaler.fit_transform(pop), columns = pop.columns) * 0.2 
    artist_pop_scaled = scale(df, "artist_pop")
    track_pop_scaled = scale(df, "track_pop")
    key_scaled = scale(df, "key")                                   
    

    # Scale audio columns
    floats = df[float_cols].reset_index(drop = True)
    scaler = MinMaxScaler()
    floats_scaled = pd.DataFrame(scaler.fit_transform(floats), columns = floats.columns) * 0.2

    # Concanenate all features
    final = pd.concat([genre_df, floats_scaled, artist_pop_scaled, track_pop_scaled, key_scaled, subject_ohe, polar_ohe, key_ohe, mode_ohe], axis = 1)
    
    # Add song id
    final['id']=df['track_id'].values
    #final['track_name']=df['track_name'].values

    return final

### Playlist Features (Non AlexRainBirdMusic) 

In [19]:
# Save the data and generate the features
float_cols = songDF.dtypes[songDF.dtypes == 'float64'].index.values
#songDF.to_csv("../data/allsong_data.csv", index = False)

# Generate features
complete_feature_set = create_feature_set(songDF, float_cols=float_cols)
#complete_feature_set.to_csv("../data/complete_feature.csv", index = False)
complete_feature_set.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['subjectivity'] = df[text_col].apply(getSubjectivity).apply(lambda x: getAnalysis(x,"subjectivity"))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['polarity'] = df[text_col].apply(getPolarity).apply(getAnalysis)


Unnamed: 0,genre|5th,genre|acoustic,genre|action,genre|aesthetic,genre|african,genre|alaska,genre|alt,genre|alternative,genre|ambient,genre|american,...,key|5,key|6,key|7,key|8,key|9,key|10,key|11,mode|0,mode|1,id
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.272293,0.0,0.0,...,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.5,4uBs8miGwBykRYjrbAO5kV
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,4PDJDIdWxNN1AlnbrKkoPf
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.237504,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,1F3VhVtaMqUqKhXdpA3itF
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,1170VohRSx6GwE6QDCHPPH
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,41QBT1Al5RQ9u9UIHOuXnj


### Playlist Features (AlexRainBirdMusic) 

In [20]:
playlistDF_test  = create_feature_set(model_data_arb, float_cols=float_cols)
playlistDF_test.head()



Unnamed: 0,genre|5th,genre|aarhus,genre|aberdeen,genre|acoriana,genre|acoustic,genre|action,genre|adelaide,genre|aesthetic,genre|african,genre|alabama,...,key|5.0,key|6.0,key|7.0,key|8.0,key|9.0,key|10.0,key|11.0,mode|0.0,mode|1.0,id
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,1ZHboJHdP97xyCaQWuP9h6
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,6bGS1VVUZk5MyQ9pYw2a8Q
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0CWDijxzzTvMlp4c4U2LIS
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,50aFetaKwC3pKTLyUWh7UZ
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,1eCdsyHZVqHSi5jhwc4TLs


In [21]:
playlistDF_test.columns.tolist()

['genre|5th',
 'genre|aarhus',
 'genre|aberdeen',
 'genre|acoriana',
 'genre|acoustic',
 'genre|action',
 'genre|adelaide',
 'genre|aesthetic',
 'genre|african',
 'genre|alabama',
 'genre|alaska',
 'genre|albany',
 'genre|alt',
 'genre|alternative',
 'genre|ambient',
 'genre|american',
 'genre|americana',
 'genre|and',
 'genre|anglia',
 'genre|ann',
 'genre|anthem',
 'genre|anti',
 'genre|arbor',
 'genre|area',
 'genre|arkansas',
 'genre|art',
 'genre|asbury',
 'genre|asheville',
 'genre|athens',
 'genre|atlanta',
 'genre|aussie',
 'genre|aussietronica',
 'genre|austin',
 'genre|austindie',
 'genre|australian',
 'genre|austrian',
 'genre|bahai',
 'genre|baltimore',
 'genre|band',
 'genre|barbadian',
 'genre|bass',
 'genre|bath',
 'genre|bay',
 'genre|bc',
 'genre|beatlesque',
 'genre|bedroom',
 'genre|belfast',
 'genre|belgian',
 'genre|bergen',
 'genre|birmingham',
 'genre|black',
 'genre|blues',
 'genre|bossbeat',
 'genre|boston',
 'genre|brighton',
 'genre|brisbane',
 'genre|bristol

# 3. Convert Features to Vector

- Concat the songs in the playlist into one summarization vector
- Find the cosine similarity between the summarized playlist vector 

In [22]:
def generate_playlist_feature(complete_feature_set, playlist_df):
    '''
    Summarize a user's playlist into a single vector
    ---
    Input: 
    complete_feature_set (pandas dataframe): Dataframe which includes all of the features for the spotify songs
    playlist_df (pandas dataframe): playlist dataframe
        
    Output: 
    complete_feature_set_playlist_final (pandas series): single vector feature that summarizes the playlist
    complete_feature_set_nonplaylist (pandas dataframe): 
    '''
    
    # Find song features in the playlist
    complete_feature_set_playlist = complete_feature_set[complete_feature_set['id'].isin(playlist_df['id'].values)]
    # Find all non-playlist song features
    complete_feature_set_nonplaylist = complete_feature_set[~complete_feature_set['id'].isin(playlist_df['id'].values)]
    complete_feature_set_playlist_final = complete_feature_set_playlist.drop(columns = "id")
    return complete_feature_set_playlist_final.sum(axis = 0), complete_feature_set_nonplaylist



In [23]:
# Generate the features
complete_feature_set_playlist_vector, complete_feature_set_nonplaylist = generate_playlist_feature(complete_feature_set, playlistDF_test)

In [24]:
complete_feature_set_playlist_vector

genre|5th           0.975451
genre|acoustic      1.398599
genre|action        0.764885
genre|aesthetic     0.281772
genre|african       0.000000
                     ...    
key|9              12.500000
key|10              5.000000
key|11              7.500000
mode|0             35.000000
mode|1             75.500000
Length: 332, dtype: float64

In [25]:
complete_feature_set_nonplaylist.head()

Unnamed: 0,genre|5th,genre|acoustic,genre|action,genre|aesthetic,genre|african,genre|alaska,genre|alt,genre|alternative,genre|ambient,genre|american,...,key|5,key|6,key|7,key|8,key|9,key|10,key|11,mode|0,mode|1,id
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.272293,0.0,0.0,...,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.5,4uBs8miGwBykRYjrbAO5kV
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,4PDJDIdWxNN1AlnbrKkoPf
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.237504,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,1F3VhVtaMqUqKhXdpA3itF
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,1170VohRSx6GwE6QDCHPPH
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,41QBT1Al5RQ9u9UIHOuXnj


# 4. Calculate Cosine Similarity & Generate Recommendations

Find similarity between the songs that overlap on the non-alexrainbirdMusic playlist and the alexrainbirdMusic playlists and recommend tracks on the non-alexrainbirdMusic playlist.

In [26]:
from sklearn.metrics.pairwise import cosine_similarity
def generate_playlist_recos(df, features, nonplaylist_features):
    '''
    Generated recommendation based on songs in aspecific playlist.
    ---
    Input: 
    df (pandas dataframe): spotify dataframe
    features (pandas series): summarized playlist feature (single vector)
    nonplaylist_features (pandas dataframe): feature set of songs that are not in the selected playlist
        
    Output: 
    non_playlist_df_top_40: Top 40 recommendations for that playlist
    '''
    
    non_playlist_df = df[df['track_id'].isin(nonplaylist_features['id'].values)]
    # Find cosine similarity between the playlist and the complete song set
    non_playlist_df['sim'] = cosine_similarity(nonplaylist_features.drop('id', axis = 1).values, features.values.reshape(1, -1))[:,0]
    non_playlist_df_top_40 = non_playlist_df.sort_values('sim',ascending = False).head(40)
    
    return non_playlist_df_top_40

In [27]:
# Generate top 10 recommendation
recommend = generate_playlist_recos(songDF, complete_feature_set_playlist_vector, complete_feature_set_nonplaylist)
recommend.head(10)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  non_playlist_df['sim'] = cosine_similarity(nonplaylist_features.drop('id', axis = 1).values, features.values.reshape(1, -1))[:,0]


Unnamed: 0,artist,artist_id,artist_pop,artist_genres,artist_genres_list,album,track_name,track_id,track_pop,danceability,...,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,subjectivity,polarity,sim
189,ThxSoMch,4MvZhE1iuzttcoyepkpfdF,76,[unknown],[[unknown]],SPIT IN MY FACE!,SPIT IN MY FACE!,1N8TTK1Uoy7UvQNUazfUt5,86,0.73,...,1,0.0554,0.0464,0.0216,0.111,0.649,94.094,low,neutral,0.882796
142,Cannons,7FtCyCJCJaxabYO7Uyda5B,63,[unknown],[[unknown]],Fever Dream,Tunnel of You,1KX30hHDvpBRvm3JXxtYEY,50,0.657,...,1,0.0302,0.00108,0.147,0.214,0.633,97.993,low,neutral,0.882668
497,Elliot Greer,6EFGjOozwPlW4PxLu8SoXD,52,[unknown],[[unknown]],The End,The End,7qkX2mHStdrHVC2NFWqwpo,53,0.586,...,1,0.0328,0.809,0.00159,0.0908,0.348,122.844,low,neutral,0.853054
708,Pretoria,3dK49BK2fasxOmbCkUXhGc,31,[unknown],[[unknown]],Keep Two-Stepping,Keep Two-Stepping,273QLNes7AQUsoFWKSNlPQ,40,0.73,...,1,0.0346,0.0124,0.000105,0.28,0.579,127.044,low,neutral,0.842553
35,Zero 9:36,1V599H9vfq6hWe2hGzyzI0,58,[unknown],[[unknown]],Come Thru,Come Thru,4o48EHVTIDb3PS0EIQ5l7A,41,0.613,...,1,0.0425,0.0247,0.0,0.438,0.638,93.042,low,neutral,0.839712
132,Cannons,7FtCyCJCJaxabYO7Uyda5B,63,[unknown],[[unknown]],Purple Sun,Purple Sun,3Av5sPAsNXVW2tmbz0LA6j,49,0.648,...,0,0.0351,0.0657,0.0149,0.148,0.893,150.026,low,neutral,0.838781
310,Ruby Red,2f0NSj1t2L6JowHINXCFb6,34,[unknown],[[unknown]],Martina Soleil (Living At The Same Time),Martina Soleil (Living At The Same Time),1BEjhtSuXdCf2NVofV71DD,32,0.628,...,1,0.0531,0.000915,0.00718,0.443,0.773,134.973,low,neutral,0.837117
350,Elliott Fullam,2Qxud5LpPRMREqYpOX3DQw,44,[unknown],[[unknown]],A Hopeful Ending,A Hopeful Ending,6G3Rl9I1mqrok2EyPeQ0PX,39,0.401,...,1,0.0387,0.961,0.628,0.0861,0.254,94.655,low,neutral,0.835067
730,Michl,0qG3lxHmrUeKzL1BJJ7IBN,49,[unknown],[[unknown]],Michl,Die Trying,5O06nbk5wDRr1WR3Tyo0Af,32,0.417,...,1,0.0806,0.791,0.0,0.163,0.0751,125.178,low,neutral,0.834346
471,The JMC,3h2V8ihzcPvM91ZAr7JibH,30,[unknown],[[unknown]],I'll Write Your Constellation,I'll Write Your Constellation,126mLC9wFwxS541dfKiEX3,34,0.581,...,1,0.0273,0.00105,0.0676,0.0819,0.309,111.976,low,neutral,0.834053


The challenge with recommender systems is that the user cannot quantify the accuracy however, this is a much faster option to find similar songs than listening to the songs and making a judgement. 

In addition to the recommender system, alexrainbirdMusic can leverage A/B testing where he can determine which songs are more popular and relevant for his playlists.  

# END