In this notebook, I will be using the Spotify data from Kaggle to identify the top most popular/influential artists for collaborations. This will be done by transforming the data into a network of artists connected by collaborations weighted by popularity.

"Collaboration is an artist's best friend in today's music economy. From widening an artist's audience to prolonging success, a rising tide really does lift all boats." This is known as the “collaboration phenomenon."

Collaborations in songs have become a huge trend in the music industry as it adds extra excitement, marketability and versatility, combining the fanbases of the collab artists. With the vast amount of music available to us through streaming, a feature or collaboration is usually a key way to stand out.

Many artists also gain or give recognition through collaborations and features, and thus seeing the top artists grouped by popularity is one indication of their influence in the collaboration space.

Check out https://blog.chartmetric.com/the-evolving-role-of-music-artist-collaborations/ for more info on the history and importance of collaborations.

In [2]:
import pandas as pd
import numpy as np
import networkx as nx
import itertools
import operator

In [3]:
# Read the data
raw_data = pd.read_csv("Data/data.csv")

In [4]:
# We are only looking at the artists and popularity columns
spotify_data = raw_data[["artists", "popularity"]]

In [5]:
# Only take/count songs that are collaborations in the data. 
# We transform the data into lists as this is easier to work with than growing a df.
collab_artists = []
for row in spotify_data.values:
    if len(row[0].split(',')) > 1:
        collab_artists.append(row)

In [6]:
# Clean the strings in the list as they were messy strings instead of lists in the raw data
clean_data = []
for row in collab_artists:
    temp = []
    for artist in row[0].split(','):
        temp.append(artist.replace('[', '').replace("'", '').replace(']', '').strip())
    clean_data.append([temp, row[1]])

In [7]:
# Create new list of unique artists, since we only want one artist per node
unique_artists = []
for row in clean_data:
    for artist in row[0]:
        if artist not in unique_artists:
            unique_artists.append(artist)

In [8]:
# Create the graph and add the nodes
G = nx.Graph()
for artist in unique_artists:
    G.add_node(artist)

In [9]:
# Create list of edges where each edge is a list containing [Node1, Node2, weight]
edge_list = []
for row in clean_data:
    combinations = itertools.combinations(row[0], 2)
    for pair in combinations:
        edge_list.append([pair[0], pair[1], row[1]])

In [10]:
# Add the edges including the weights 
for edge in edge_list:
    G.add_edge(edge[0], edge[1], weight = edge[2])

In [11]:
# Aggregate the weights (popularity) by node (artist)
popular_artists = []
for artist in unique_artists:
    popularity = 0
    for node in G.neighbors(artist):
        popularity += G.get_edge_data(artist, node)['weight'] 
    artist_info = [artist, popularity]
    popular_artists.append(artist_info)

In [13]:
# Output the most popular/influential artists based on collaborations
results = sorted(popular_artists, key=operator.itemgetter(1), reverse=True)
results[:20]

[['Wolfgang Amadeus Mozart', 9885],
 ['Lil Wayne', 5449],
 ['Kanye West', 5185],
 ['Snoop Dogg', 5165],
 ['Johann Sebastian Bach', 4963],
 ['Drake', 4714],
 ['JAY-Z', 3914],
 ['Nicki Minaj', 3834],
 ['2Pac', 3695],
 ['Eminem', 3505],
 ['Chris Brown', 3474],
 ['Future', 3335],
 ['Ludwig van Beethoven', 3311],
 ['Kendrick Lamar', 3293],
 ['Ty Dolla $ign', 3232],
 ['Travis Scott', 3128],
 ['Bad Bunny', 3086],
 ['Ludacris', 3050],
 ['A$AP Rocky', 2989],
 ['Justin Bieber', 2979]]

The top 10 most popular collab artists are:
1. Wolfgang Amadeus Mozart
2. Lil Wayne
3. Kanye West
4. Snoop Dogg
5. Johann Sebastian Bach
6. Drake
7. JAY-Z
8. Nicki Minaj
9. 2Pac
10. Eminem

Results and Discussion: Most of the artists in the top 10 are known to make HipHop/Rap styles of music, which is to be expected as this is the most popular genre in the recent years and is an industry that also thrives on collaborations. 

Surprisingly however, there are a few classical musicians, with Wolfgang Amadeus Mozart being the most popular collab artist beating out artists such as Lil Wayne aand Kanye West that are known for their hit songs with collaborations and using collabs to give recognition to smaller artists.

However, this analysis is not foolproof as it has very limited EDA and does not adjust for the popularity by volume of songs, which was a calculated decision by myself as I believed the volume of collab songs by an artist should be a factor. Furthermore it does not take into account the difference in definitions of a "collaboration" between generations, which could explain why Mozart is ranked #1.