# Recommender System based on Artist Collaboration
- This system suggests new music and artists to users by analyzing data on past collaborations. Through identifying connections, the system will make recommendations that align with the user’s tastes.

***Objective:*** To introduce the user to new favorite artists, uncover hidden musical gems, and foster a more interconnected listening experience💎

***Dataset Used:*** https://www.kaggle.com/datasets/ambaliyagati/spotify-dataset-for-playing-around-with-sql/code

***Input:*** User's artist👩‍🎤

***Output:*** A dataframe containing the names of artist collaborators👥, how many times they collaborated🖐️, 
for what albums and songs💿

## _________________________________________________________________________________________

In [3]:
import pandas as pd

In [4]:
df = pd.read_csv(r"C:\Users\leeso\Downloads\Spotify_kaggle_dataset.csv")
# df = pd.read_csv("spotify_tracks.csv")
df

Unnamed: 0,id,name,genre,artists,album,popularity,duration_ms,explicit
0,7kr3xZk4yb3YSZ4VFtg2Qt,Acoustic,acoustic,Billy Raffoul,1975,58,172199,False
1,1kJygfS4eoVziBBI93MSYp,Acoustic,acoustic,Billy Raffoul,A Few More Hours at YYZ,57,172202,False
2,6lynns69p4zTCRxmmiSY1x,Here Comes the Sun - Acoustic,acoustic,"Molly Hocking, Bailey Rushlow",Here Comes the Sun (Acoustic),42,144786,False
3,1RC9slv335IfLce5vt9KTW,Acoustic #3,acoustic,The Goo Goo Dolls,Dizzy up the Girl,46,116573,False
4,5o9L8xBuILoVjLECSBi7Vo,My Love Mine All Mine - Acoustic Instrumental,acoustic,"Guus Dielissen, Casper Esmann",My Love Mine All Mine (Acoustic Instrumental),33,133922,False
...,...,...,...,...,...,...,...,...
6295,4uveHSzaz8YEbTF9j6QlCI,Voyage to Atlantis,world-music,Future World Music,Reign of Vengeance,25,180001,False
6296,4u15cjyziW2Ewn5Ek3082l,L'Oiseau,world-music,"Putumayo, Marianne Perrudin, Thomas Artaud",Global Relaxation by Putumayo,25,276776,False
6297,56pHPaTeX2O9aVmTFYS8hV,The Daintree,world-music,Joseph Tawadros,World Music,12,69533,False
6298,6Ldyc5TsR4kaUsuHKcB2AD,The Sorcerers Symphony,world-music,Future World Music,Behold,26,90001,False


## Initial Processing ------------------------

In [6]:
print(df['id'].value_counts())
print(df['name'].value_counts())

id
1OG1NoKpZZLrMqMYCk9m84    3
4uOBL4DDWWVx4RhYKlPbPC    3
2cqxvn34ihH7BSv9XbkOgq    3
190l7oYBQe6JBsWPJM2uNN    2
28zSrc22pN7CSv01aKVxvg    2
                         ..
7qpbYhyFEv6e6dDhPyDKYZ    1
63FWYiuggXZPsiNae0L9cd    1
5ct1TZKCnSxyDzUxhgLcAq    1
0fW7wX2goqLrMCjkxP7873    1
3ry0f8ybk8upUBIk8unvmF    1
Name: count, Length: 6187, dtype: int64
name
Kids                           26
Movies                         26
Chill                          23
Romance                        23
Road Trip                      23
                               ..
The Grave Awaits                1
Punishment                      1
We Hate Grindcore               1
The Protocols Of Anti-Sound     1
Fiore d'inverno                 1
Name: count, Length: 4518, dtype: int64


In [7]:
# Both the "id" and "name" columns contain non-unique values.
# Given the difference in number of repetitions, we can use "id" as the song reference.

In [8]:
# Check ID replicates
df[df['id'] == '1OG1NoKpZZLrMqMYCk9m84']

Unnamed: 0,id,name,genre,artists,album,popularity,duration_ms,explicit
3221,1OG1NoKpZZLrMqMYCk9m84,LALALALA,j-pop,Stray Kids,ROCK-STAR,73,182224,False
3360,1OG1NoKpZZLrMqMYCk9m84,LALALALA,k-pop,Stray Kids,ROCK-STAR,73,182224,False
3449,1OG1NoKpZZLrMqMYCk9m84,LALALALA,kids,Stray Kids,ROCK-STAR,73,182224,False


In [9]:
# Create a new DataFrame, all with unique id values

unique_id_df = df.copy()
unique_id_df.drop_duplicates(subset='id', keep = 'first', inplace=True, ignore_index=True)
unique_id_df.head()

Unnamed: 0,id,name,genre,artists,album,popularity,duration_ms,explicit
0,7kr3xZk4yb3YSZ4VFtg2Qt,Acoustic,acoustic,Billy Raffoul,1975,58,172199,False
1,1kJygfS4eoVziBBI93MSYp,Acoustic,acoustic,Billy Raffoul,A Few More Hours at YYZ,57,172202,False
2,6lynns69p4zTCRxmmiSY1x,Here Comes the Sun - Acoustic,acoustic,"Molly Hocking, Bailey Rushlow",Here Comes the Sun (Acoustic),42,144786,False
3,1RC9slv335IfLce5vt9KTW,Acoustic #3,acoustic,The Goo Goo Dolls,Dizzy up the Girl,46,116573,False
4,5o9L8xBuILoVjLECSBi7Vo,My Love Mine All Mine - Acoustic Instrumental,acoustic,"Guus Dielissen, Casper Esmann",My Love Mine All Mine (Acoustic Instrumental),33,133922,False


In [10]:
# Comparison

print(len(df))
print(len(unique_id_df))

6300
6187


## Data Manipulation ------------------------

In [12]:
# Create a list containing every artist

unique_artist_list = []
for i in range(len(unique_id_df)):
    artist_list = []
    entry = df.loc[i]["artists"]
    if "," in entry:
        artist_list = entry.split(",")
    else:
        artist_list.append(entry)
    for artist in artist_list:
        if artist in unique_artist_list:
            continue
        else:
            unique_artist_list.append(artist)

unique_artist_list

['Billy Raffoul',
 'Molly Hocking',
 ' Bailey Rushlow',
 'The Goo Goo Dolls',
 'Guus Dielissen',
 ' Casper Esmann',
 'Ling tosite sigure',
 'Benson Boone',
 ' Andrew Gialanella',
 'Sonido de Agua en Bambu',
 'Healing Solfeggio Frequencies',
 ' Harmony Touch',
 'Roses & Frey',
 'The Re-Stoned',
 'Sam Smith',
 'The Moon Loungers',
 'Lesfm',
 ' Olexy',
 'SLANDER',
 ' Dylan Matthew',
 'Cody Fry',
 'Mortal Treason',
 'Frank Ocean',
 'Girl in the Distance',
 'Simon & Garfunkel',
 'Jonah Baker',
 'Puppe Music',
 'SYML',
 'Acoustic Levitation',
 'Trainman',
 'Hillside Recording',
 ' Diana Trout',
 'Acoustics',
 ' Tyler Swatie',
 'Kondor',
 'Jason Derulo',
 'Acoustic Quarter',
 'Helios Jazz Club',
 'AJR',
 'Devin The Dude',
 'Ben Weighill',
 'David AI',
 'SZA',
 ' Justin Bieber',
 'Jay Filson',
 'U Know & The Drill',
 'Matt Johnson',
 ' John Adams',
 'XtraVert',
 'Bailey Zimmerman',
 'Thomas Daniel',
 'Mark S.D.Ray',
 'Noah Kahan',
 'Acoustic Guitar Collective',
 'Dj funkybee',
 'Lofi Afrobeats

In [13]:
pd.Series(unique_artist_list).value_counts()

Billy Raffoul        1
charlieonnafriday    1
Luther White         1
Oliverquint          1
Gunna                1
                    ..
 Burniss Travis      1
 Common              1
 Robert Glasper      1
 ?uestlove           1
 C3N6                1
Name: count, Length: 5851, dtype: int64

In [14]:
# Create a column called "Collaborations" containing "1" if there is more than one artist, else "0"

collab_list = []

def find_collaborations(x):
    if len(x.split(",")) > 1:
        return 1
    else:
        return 0

unique_id_df["Collaborations"] = unique_id_df['artists'].apply(find_collaborations)

In [15]:
unique_id_df.head(3)

Unnamed: 0,id,name,genre,artists,album,popularity,duration_ms,explicit,Collaborations
0,7kr3xZk4yb3YSZ4VFtg2Qt,Acoustic,acoustic,Billy Raffoul,1975,58,172199,False,0
1,1kJygfS4eoVziBBI93MSYp,Acoustic,acoustic,Billy Raffoul,A Few More Hours at YYZ,57,172202,False,0
2,6lynns69p4zTCRxmmiSY1x,Here Comes the Sun - Acoustic,acoustic,"Molly Hocking, Bailey Rushlow",Here Comes the Sun (Acoustic),42,144786,False,1


In [16]:
# Isolate data with collaborations in DataFrame, collab_df

collab_df = unique_id_df[unique_id_df["Collaborations"] == 1]
collab_df.reset_index(drop=True)

collab_df.head(3)

Unnamed: 0,id,name,genre,artists,album,popularity,duration_ms,explicit,Collaborations
2,6lynns69p4zTCRxmmiSY1x,Here Comes the Sun - Acoustic,acoustic,"Molly Hocking, Bailey Rushlow",Here Comes the Sun (Acoustic),42,144786,False,1
4,5o9L8xBuILoVjLECSBi7Vo,My Love Mine All Mine - Acoustic Instrumental,acoustic,"Guus Dielissen, Casper Esmann",My Love Mine All Mine (Acoustic Instrumental),33,133922,False,1
7,42qGA2116mkpSAaxzQfjEf,Landslide,acoustic,"Guus Dielissen, Andrew Gialanella",Landslide,29,199222,False,1


In [17]:
collab_df['artists'].value_counts()

artists
Johann Sebastian Bach, Angela Hewitt                            15
Designer Disguise, Wow That's What I Call Metalcore             13
New Age, New Age Instrumental Music, New Age 2021                9
Claude Debussy, Steven Osborne                                   7
Travis Scott, Bad Bunny, The Weeknd                              6
                                                                ..
Tom Petty, Jeff Lynne, Steve Winwood, Dhani Harrison, Prince     1
Alana Springsteen, Chris Stapleton                               1
Guitar Mage, Peacefulness, Harmoniac                             1
Blake Shelton, Gwen Stefani                                      1
Putumayo, Giacomo Lariccia                                       1
Name: count, Length: 1207, dtype: int64

In [18]:
# Asking for user input

global artist1
artist1 = input("Who is your favorite artist? ")
while artist1 not in unique_artist_list:
    print("Artist not found in database. Please enter another artist.")
    artist1 = input("Who is your favorite artist? ")

Who is your favorite artist?  Taylor Swift


In [19]:
# Create a new column called "Collaborated with " + user inputted artist

def get_collaborators(x, artist):
    if artist in x.split(","):
        return 1
    else:
        return 0

collab_df["Collaborated with " + artist1] = collab_df.apply(lambda x: get_collaborators(x.artists, artist1), axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  collab_df["Collaborated with " + artist1] = collab_df.apply(lambda x: get_collaborators(x.artists, artist1), axis=1)


In [20]:
# Create DataFrame, filtered_df, only containing data where the user inputted artist collaborated

global filtered_df
filtered_df = collab_df[collab_df["Collaborated with "+artist1] == 1].reset_index(drop=True)
filtered_df

Unnamed: 0,id,name,genre,artists,album,popularity,duration_ms,explicit,Collaborations,Collaborated with Taylor Swift


In [21]:
# Create final DataFrame, output_df, with columns for: (1) the artist the user input collaborated with, (2) no. of collabs with this
# artist, (3) collaboration albums and songs

artists = []
output_df = pd.DataFrame()
albums = {}
names = {}
for i in range(len(filtered_df)):
    entry = filtered_df.loc[i]['artists'].split(",")
    for artist in entry:
        if artist != artist1:
            artist = artist.strip()
            artists.append(artist)
            if artist in albums.keys() and filtered_df.loc[i]['album'] not in albums[artist]:
                albums[artist].append(filtered_df.loc[i]['album'])
            elif artist not in albums.keys():
                albums[artist] = [filtered_df.loc[i]['album']]
            if artist in names.keys() and filtered_df.loc[i]['name'] not in names[artist]:
                names[artist].append(filtered_df.loc[i]['name'])
            elif artist not in names.keys():
                names[artist] = [filtered_df.loc[i]['name']]
output_df['artist'] = pd.Series(artists).value_counts().index
output_df['no. of collabs with '+ artist1] = pd.Series(artists).value_counts().values
output_df['collab albums'] = output_df['artist'].apply(lambda x: albums[x])
output_df['collab (song) names'] = output_df['artist'].apply(lambda x: names[x])

In [22]:
output_df

Unnamed: 0,artist,no. of collabs with Taylor Swift,collab albums,collab (song) names


# 🎉💻🎉