### 1. Introduction

In this notebook, I visualized music tack data from the Kaggle Spotify datasets which contain more than 170 thoudsands tacks from 30,000 artist over the last 100 years. This notebook has essential data exprlation analysis and also try to find out how music has envolved across the years.<br>
As far as I am concerned the limitation of this analysis is the scope of Soptify's track collection, later when we see the distribution of tracks, it has much more records in the recent 3 years and but few in early 20s-50s. Another limitation is the popularity score only reflects the preference of Spotify users(likely to be relatiativly young gorup), and also the popularity is scored in the current time, it might shows how people like old music nowadays, not neccessasryly shows how the classic track's popularity back in the time(a good feature for this will be the record sales for the album).

The datasets has the following features and definitions:<br>
- __duration_ms:__<br>
    The length of the track in milliseconds (ms)
- __artists:__<br>
    The list of artists credited for production of the track
- __year:__<br>
    The release year of track
- __key:__<br>
    The primary key of the track encoded as integers in between 0 and 11
- __mode:__<br>
    The binary value representing whether the track starts with a major (1) chord progression or not (0)
- __release_date:__<br>
    The date of release of the track in yyyy-mm-dd, yyyy-mm, or even yyyy format
- __acousticness:__<br>
    The relative metric of the track being acoustic
- __danceability:__<br>
    The relative measurement of the track being danceable
- __energy:__<br>
    The energy of the track
- __instrumentalness:__<br>
    The relative ratio of the track being instrumental
- __liveness:__<br>
    The relative duration of the track sounding as a live performance
- __loudness:__<br>
    Relative loudness of the track in the typical range [-60, 0] in decibel (dB)
- __speechiness:__<br>
    The relative length of the track containing any kind of human voice
- __valence:__<br>
    The positiveness of the track
- __tempo:__<br>
    The tempo of the track in Beat Per Minute (BPM)
- __name:__<br>
    The title of the track
- __popularity:__<br>
    The popularity of the song lately, default country = US

#### Import libraries

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from math import pi
import warnings
warnings.filterwarnings('ignore')
from IPython.display import YouTubeVideo
from IPython.display import Audio
from sklearn.preprocessing import MinMaxScaler
from matplotlib.colors import ListedColormap
import matplotlib.animation as ani
from IPython.display import HTML
from matplotlib import animation, rc
%matplotlib inline
from matplotlib.animation import FuncAnimation, PillowWriter
import matplotlib
matplotlib.rcParams['animation.embed_limit'] = 2**128
import matplotlib.patches as mpatches

#### Import datasets

Kaggle has provided 4 datasets:
- data: the main dataset that based on each track
- data_by_artist: The average of features group by each artist
- data_by_genres: The average of features group by each genres, this is the only dataset that has genres information
- data_by_year: The average of features group by each year

In [None]:
data = pd.read_csv('data.csv')
data.shape

In [None]:
data.head()

clear up the format of artists column:

In [None]:
data.artists = data['artists'].str.replace('[','')
data.artists = data['artists'].str.replace(']','')
data.artists = data['artists'].str.replace("'",'')
data.artists = data['artists'].str.replace('"','')
data.artists

In [None]:
artist_df = pd.read_csv('data_by_artist.csv')
artist_df.shape

In [None]:
artist_df.head()

In [None]:
genres_df = pd.read_csv('data_by_genres.csv')
genres_df.shape

In [None]:
genres_df.head()

In [None]:
year_df = pd.read_csv('data_by_year.csv')
year_df.shape

In [None]:
year_df.head()

### 2. EDA

In [None]:
data.info()

In [None]:
#Describe those numeric features
data[['acousticness','danceability','energy','liveness','loudness','mode','popularity','speechiness','tempo','valence']].describe()

we can see that some features has it's own range, such as loudness can go from -60 to 3.85, Tempos are from 0 to 243, but the rest seems are only from 0 - 1

#### What are those Maximum and Minimum sounds like? It is a music dataset, let's hear them!

The loudest Track:

In [None]:
df_loud = data.sort_values('loudness',ascending=False).head().reset_index()
df_loud[['artists','name','loudness']]

One of the loudest track: 'Your Pretty Face Is Going to Hell'(turn down your volume)

In [None]:
audio_loud = Audio(filename='Your Pretty Face Is Going to Hell.mp3')
audio_loud

The most energetic track:

In [None]:
df_energy = data.sort_values(['energy','popularity'],ascending=False).head().reset_index()
df_energy[['artists','name','energy']]

Suprisingly the top 4 energitic tracks are all rain sound recordings, it might due to how the Spotify's algorism calcualte energy score

In [None]:
audio_energy = Audio(filename='Forest Rain.mp3')
audio_energy

The most dancible track:

In [None]:
df_dance = data.sort_values(['danceability','popularity'],ascending=False).head().reset_index()
df_dance[['artists','name','danceability']]

One of the top five track:

In [None]:
audio_dance = Audio(filename='Tone-Loc Funky Cold Medina.mp3')
audio_dance

The happiest tracks:

In [None]:
df_happy = data.sort_values(['valence','popularity'],ascending=False).head().reset_index()
df_happy[['artists','name','valence']]

In [None]:
audio_happy = Audio(filename='Pasito Duranguense.mp3')
audio_happy

The most clam tracks:

In [None]:
df_sad = data.sort_values(['valence','popularity'],ascending=(True,False)).head().reset_index()
df_sad[['artists','name','valence']]

Seems the most non-positive trakcs are those white noise recording that help people sleep:

In [None]:
audio_clam = Audio(filename='Clean White Noise - Loopable With No Fade.mp3')
audio_clam

Most popular artists:

In [None]:
artist_pop = pd.DataFrame(data.groupby('artists')['popularity'].mean().reset_index())
artist_pop = artist_pop.sort_values(['popularity'],ascending=(False)).head(20).reset_index().drop('index',axis=1)
plt.figure(figsize=(10,6))
# make barplot
sns.barplot(x=artist_pop.popularity, y=artist_pop.artists, data=artist_pop)
plt.show()

#### Feature Distrubutions

In [None]:
f, axes = plt.subplots(3, 3, figsize=(13, 14), sharex=False)
sns.distplot( data["acousticness"] , color="skyblue", ax=axes[0, 0])
sns.distplot( data["danceability"] , color="olive", ax=axes[0, 1])
sns.distplot( data["energy"] , color="gold", ax=axes[0, 2])
sns.distplot( data["liveness"] , color="yellowgreen",ax=axes[1, 0])
sns.distplot( data["loudness"] , color="wheat", ax=axes[1, 1])
sns.distplot( data["popularity"] , color="teal", ax=axes[1, 2])
sns.distplot( data["speechiness"] , color="palegreen", ax=axes[2, 0])
sns.distplot( data["tempo"] , color="orchid", ax=axes[2, 1])
sns.distplot( data["valence"] , color="lightblue", ax=axes[2, 2])

From those charts we can see:<br>
1.Most of the track is not live and has none or few speech in it (for example raps will have certain 'speech' level in the song)<br>
2.Danciability seems like normal shapes suggests it is randomly distrubuted<br>
3.valence(sentiment) are balance in general, but with a large amount that near 1.0 - very positive and happy song. Generally, more positive songs than negative songs<br>
4.Acuousticness is polarized, most of the tracks are either pure instrucmental or vocal, which make sense<br>
5.Many tracks has '0' popularity, and most of the populatiry scores sit from 20 to 60

Let's exam the year's distribution individually,the chart below shows that before 1945,there are less tracks avaiable and volitile between years, on the other hand there are very high amount of collection form 2019 to present

In [None]:
sns.distplot( data["year"] , color="olive")
plt.axvspan(1920, 1945, facecolor='r', alpha=0.2)

#### Popularity - What are those 0s score?

In the histogram matrix above, the popularity one shows a high number of 0 score records. What are those tracks that are been '0 popular'? Do they have some characteristics? Let's take a look at those and see if there is some pattens

In [None]:
#some 0 popularity tracks
mask = data.popularity==0
data[mask].tail()

In [None]:
#describe 0s
data[mask].describe()

In [None]:
#describe non-0s
data[data.popularity>0].describe()

If we compare average of the 0s score records versus all data, the 0s populariy tracks are older(year 1960 vs 1982), less energitc(0.37 vs 0.51), more instrumental(0.34 vs 0.15), and more speechiness(0.19 vs 0.08)

In [None]:
#salce the feature value from 0-1 so we can place them in to one chart
scaler = MinMaxScaler()
df_scale = data[['year','instrumentalness','speechiness','energy','danceability','acousticness','duration_ms','liveness','loudness','tempo','valence']]
scaler.fit(df_scale)
df_scale = pd.DataFrame(scaler.transform(df_scale))
df_scale.columns=['year','instrumentalness','speechiness','energy','danceability','acousticness','duration_ms','liveness','loudness','tempo','valence']

#transpose columns to row
df_violin = pd.DataFrame(df_scale.T.unstack())
df_violin.index.rename(['ind1','ind2'],inplace=True)

#Add the popularity from main dataset
df_violin=df_violin.join(data['popularity'],on='ind1',how='inner').reset_index()
df_violin.drop('ind1',axis=1,inplace=True)
df_violin.columns=['feature','value','popularity']

#create another column for 0 of non0
df_violin['pop']=df_violin.popularity.apply(lambda x:'0'if x ==0 else 'Non-0')
df_violin.drop('popularity',axis=1,inplace=True)

#draw violinplot
plt.figure(figsize=(20,8))
sns.violinplot(x="feature", y="value", hue="pop", data=df_violin, palette="Pastel1")
plt.show()

The pair violin pltos shows similar results, Less energy, more acounstic are more likely to has 0 score. The most obivious are the year, 0 popular tracks are either order or newer than average. 

In [None]:
#Count the 0s and non-0s for donut chart
pop_df = pd.DataFrame(data.popularity>0)
pop_df = pop_df.value_counts().tolist()

#prepare data group by years for the bar charts
non_0 = data[data.popularity>0].groupby('year')['popularity'].count().reset_index()
total = data.groupby('year')['popularity'].count().reset_index()
non_0['popularity'] = [i / j * 100 for i,j in zip(non_0['popularity'], total['popularity'])]
total['popularity'] = [i / j * 100 for i,j in zip(total['popularity'], total['popularity'])]


#Draw a donut chart
names = ['Total Non-0s', 'Total 0s']
plt.figure(figsize=(4,4))

#ax1 = plt.subplot2grid((1,2),(0,0))
my_circle = plt.Circle( (0,0), 0.6, color='white')
plt.pie(pop_df, labels=names, colors=['palegreen','grey'],autopct='%1.2f%%',startangle=90, pctdistance=0.4)
p = plt.gcf()
p.gca().add_artist(my_circle)
fig.suptitle('Overall 0s Popularity Percentage', fontsize=10)

#Draw a bar chart
# bar chart 1 -> top bars (group of 'Popularity=0')
plt.figure(figsize=(7,7))
#ax2 = plt.subplot2grid((1,2),(0,1))
bar1 = sns.barplot(x="year",y="popularity", data=total, color='grey')
plt.xticks(rotation=90)
plt.locator_params(nbins=30)

# bar chart 2 -> bottom bars (group of 'Popularity>0')
bar2 = sns.barplot(x="year",y="popularity", data=non_0, color='palegreen')
plt.xticks(rotation=60)
plt.locator_params(nbins=30)
# add legend
top_bar = mpatches.Patch(color='grey', label='0s')
bottom_bar = mpatches.Patch(color='palegreen', label='Non-0s')
plt.legend(handles=[top_bar, bottom_bar])

#fig.suptitle('Overall 0s Popularity Percentage', fontsize=14)
# Show the graph
plt.show()

In total we have about a quarter of records (23.46%) that are 0 popular. The bar charts below shows tacks before 1955 take up the majority of those 0s score, they might be too old for people to listen to. The 'Golden Age',from 1960 to 2000, nearly every record has some popularity. On the other hand, the latest tracks that are in 2021, might be too new to have a popularity score assigned.

#### Feature Correlations

In [None]:
plt.figure(figsize=(16, 8))
sns.set(style="whitegrid")
corr = data.corr()
sns.heatmap(corr,annot=True,cmap='coolwarm')

If we take a look at the popularity column, some feature are highly correlated with it. The more energetic, lounder tracks the more popular. On the other hand, less acoustic tracks are more perfereable for Spotify users. There is also a high correlation of 0.51 with feature 'year', suggested that user are more like to listen to current music.<br>
The year's column also review some trends, we can see the music track has become less acoustic, more energic, and louder across the years.<br>
Some other correlation pair are just make sense, for example, higher energy tracks are louder, less accoustic tracks(R&R, electronic etc) are engertic and louder.

Based on the heatmap, I picked some feature pairs that has high correlation and plot them in 2d historgram to exam thier relationship(which has better visual result due to the the amount of points in the dataset). The darker of bins in the charts, the more points lay at the same position. As we can see, Dancebility/Valence, Energy/Lounderness shows a strong positive linear relatioship, while others' correlation doesn't appear very linear. 

In [None]:
fig, [(axes, axes1), (axes2, axes3),(axes4, axes5)] = plt.subplots(nrows = 3, ncols = 2, sharex = False, figsize=(12,12))
axes.hist2d(x = data['danceability'],y = data['valence'], bins=(80, 80), cmap=plt.cm.Blues)
axes.set_xlabel('danceability', fontsize=12)
axes.set_ylabel('valence', fontsize=12)
axes1.hist2d(x = data['energy'],y = data['loudness'], bins=(80, 80), cmap=plt.cm.Blues)
axes1.set_xlabel('energy', fontsize=12)
axes1.set_ylabel('loudness', fontsize=12)
axes2.hist2d(x = data['energy'],y = data['popularity'], bins=(80, 80), cmap=plt.cm.Blues)
axes2.set_xlabel('energy', fontsize=12)
axes2.set_ylabel('popularity', fontsize=12)
axes3.hist2d(x = data['acousticness'],y = data['popularity'], bins=(80, 80), cmap=plt.cm.Blues)
axes3.set_xlabel('acousticness', fontsize=12)
axes3.set_ylabel('popularity', fontsize=12)
axes4.hist2d(x = data['energy'],y = data['acousticness'], bins=(80, 80), cmap=plt.cm.Blues)
axes4.set_xlabel('energy', fontsize=12)
axes4.set_ylabel('acousticness', fontsize=12)
axes5.hist2d(x = data['loudness'],y = data['acousticness'], bins=(80, 80), cmap=plt.cm.Blues)
axes5.set_xlabel('loudness', fontsize=12)
axes5.set_ylabel('acousticness', fontsize=12)
plt.show()

#### Music Trend Analysis

In [None]:
#Extract Numeric features and set them on the same scale
year_num = year_df[['acousticness','danceability','energy','liveness','loudness','popularity','speechiness','tempo','valence']]
scaler = MinMaxScaler()
scaler.fit(year_num)
year_scale = pd.DataFrame(scaler.transform(year_num))
year_scale.columns = ['acousticness','danceability','energy','liveness','loudness','popularity','speechiness','tempo','valence']
year_scale['year'] = year_df['year']

#Create line charts for each numeric feature across the years
# Initialize the figure style
plt.style.use('seaborn-darkgrid')
 
# create a color palette
palette = plt.get_cmap('tab10')

f, axes = plt.subplots(3, 3, figsize=(13, 13), sharex=True)

# multiple line plot

num=0
for column in year_scale.drop('year', axis=1):
    num+=1
    x = np.random.rand(9)
    # Find the right spot on the plot
    plt.subplot(3,3, num)
    plt.figsize=(20, 20)
 
    # plot every group, but discrete
    for v in year_scale.drop('year', axis=1):
        plt.plot(year_scale['year'], year_scale[v], marker='', color='grey', linewidth=0.6, alpha=0.3)
        
    # Plot the lineplot
    plt.plot(year_scale['year'], year_scale[column], marker='', color=palette(num), linewidth=2.4, alpha=0.9, label=column)
    
    # Same limits for every chart
    #plt.xlim(0,10)
    #plt.ylim(-2,22)
 
    # Not ticks everywhere
    if num in range(7) :
        plt.tick_params(labelbottom='off')
    if num not in [1,4,7] :
        plt.tick_params(labelleft='off')
 
    # Add title
    plt.title(column, loc='left', fontsize=12, fontweight=0, color=palette(num) )

# general title
plt.suptitle("Numeric Attributes Trend Over The Years", fontsize=13, fontweight=0, color='black', style='italic')
 
# Show the graph
plt.show()

Convert years to decades:

In [None]:
def convert_year (year):
    if year >=1920 and year < 1930:
        return '20s'
    elif year >=1930 and year < 1940:
        return '30s'
    elif year >=1940 and year < 1950:
        return '40s'
    elif year >=1950 and year < 1960:
        return '50s'
    elif year >=1960 and year < 1970:
        return '60s'
    elif year >=1970 and year < 1980:
        return '70s'
    elif year >=1980 and year < 1990:
        return '80s'
    elif year >=1990 and year < 2000:
        return '90s'
    elif year >=2000 and year < 2010:
        return '2000s'
    elif year >=2010:
        return '2010 & newer'
data['decades']=data.year.apply(convert_year)

In [None]:
data[['artists','name','year','decades']]

The most popular artists of it's time

In [None]:
artist_group = pd.DataFrame(data.groupby(['decades','artists'])['popularity'].mean()).sort_values(['popularity'],ascending=False)
artist_group.reset_index(inplace=True)
artist_group['rank'] = artist_group.groupby('decades')['popularity'].rank(method='first',ascending=False)
artist_group[artist_group['rank']==1.0]

In [None]:
#Get the numeric featrure of those most poplar artists from th 60s
artist_g_top6 = artist_df[artist_df.artists.isin(['24kGoldn','Frank Sinatra','Grover Washington, Jr.','Shakira','The Beach Boys','Earth, Wind & Fire'])]
artist_g_top6.reset_index(inplace=True)
artist_g_top6['decades'] = ['2010 & newer','70s','90s','80s','2000s','60s']
artist_g_top6_num = artist_g_top6[['acousticness','danceability','energy','loudness','tempo','valence']]
artist_g_top6 = artist_g_top6.reset_index()

#Scale those numeric feature based on the main dataset so they can be plotted on the same charts
scaler = MinMaxScaler()
scaler.fit(artist[['acousticness','danceability','energy','loudness','tempo','valence']])
art6_scale = pd.DataFrame(scaler.transform(artist_g_top6_num))
art6_scale.columns = ['acousticness','danceability','energy','loudness','tempo','valence']
art6_scale['artists'] = artist_g_top6['artists']
art6_scale.columns = ['acousticness','danceability', 'energy', 'loudness', 'tempo','valence', 'artists']
art6_scale['decades']=artist_g_top6['decades']

#sort the data by decades and make places for the average value for decades
art6_scale=art6_scale.sort_values(by='decades').reset_index().drop('index',axis=1)
art6_scale=art6_scale.reindex([2,3,4,5,0,1]).reset_index().drop('index',axis=1)
art6_scale.index = [0,2,4,6,8,10]

#Calualte the average of numeric value for different decades
avg_df = data.groupby('decades')['acousticness','danceability','energy','loudness','tempo','valence'].mean().reset_index()
avg_df=avg_df.drop(avg_df.index[2:6])
avg_df['artists'] = 'average'
avg_df = avg_df[['acousticness','danceability','energy','loudness','tempo','valence','artists','decades']].reset_index().drop('index',axis=1)

#Reindex the average and combine them into the artist df
avg_df = avg_df.reindex([2,3,4,5,0,1]).reset_index().drop('index',axis=1)
scaler.fit(artist[['acousticness','danceability','energy','loudness','tempo','valence']])
avg_df_scale = pd.DataFrame(scaler.transform(avg_df[['acousticness','danceability','energy','loudness','tempo','valence']]))
avg_df_scale.columns = ['acousticness','danceability','energy','loudness','tempo','valence']
avg_df_scale[['artists','decades']]=avg_df[['artists','decades']]
avg_df_scale.index = [1,3,5,7,9,11]
art6_scale=pd.concat([art6_scale,avg_df_scale],axis=0)
art6_scale.sort_index(inplace=True)

# ------- PART 1: Define a function that do a plot for one line of the dataset!
 
def make_spider( row, title, color):

    # number of variable
    categories=list(art6_scale)[:-2]
    N = len(categories)

    # What will be the angle of each axis in the plot? (we divide the plot / number of variable)
    angles = [n / float(N) * 2 * pi for n in range(N)]
    angles += angles[:1]

    # Initialise the spider plot
    ax = plt.subplot(2,3,ind, polar=True, )

    # If you want the first axis to be on top:
    ax.set_theta_offset(pi / 2)
    ax.set_theta_direction(-1)

    # Draw one axe per variable + add labels labels yet
    plt.xticks(angles[:-1], categories, color='grey', size=8)

    # Draw ylabels
    ax.set_rlabel_position(0)
    plt.yticks([0.2,0.4,0.6,0.8], ["0.2","0.4","0.6","0.8"], color="grey", size=7)
    plt.ylim(0,1)

    # Ind1
    values=art6_scale.loc[row].drop(['artists','decades']).values.flatten().tolist()
    values += values[:1]
    avg_values = art6_scale.loc[row+1].drop(['artists','decades']).values.flatten().tolist()
    avg_values+=avg_values[:1]
    ax.plot(angles, avg_values, color='grey', linewidth=0, linestyle='solid')
    ax.plot(angles, values, color=color, linewidth=2, linestyle='solid')
    ax.fill(angles, values, color=color, alpha=0.8)
    ax.fill(angles, avg_values, color='grey', alpha=0.6)

    # Add a title
    plt.title(title, size=11, color=color, y=1.1)

    
# ------- PART 2: Apply the function to all individuals
# initialize the figure
my_dpi=96
plt.figure(figsize=(1200/my_dpi, 1200/my_dpi), dpi=my_dpi)
 
# Create a color palette:
my_palette = plt.cm.get_cmap("Set1")
 
# Loop to plot
ind = 0
for row in range(0, len(art6_scale.index),2):
    ind +=1
    make_spider( row=row, title=art6_scale['decades'][row]+':'+'\n'+art6_scale['artists'][row], color=my_palette(ind))

Map and convert the key and mode

In [None]:
key_mapping = {0:"C",1:"C♯",2:"D",3:"D♯",4:"E",5:"F",6:"F♯",7:"G",8:"G♯",9:"A",10:"A♯",11:"B"}
mode_mapping = {0:"Minor",1:"Major"}
key_name = data.key.map(key_mapping)
mode_name= data['mode'].map(mode_mapping)
data['key_mode'] = key_name+' '+mode_name
data[['name','key','mode','key_mode']]

In [None]:
#Count the number of Key used
key_count = pd.DataFrame(data.key_mode.value_counts()).reset_index()
key_count.columns=['key_name','counts']

#recoder and reformat the key counts by years
column_list = key_count['key_name'].values
ani_bar_df = data.groupby('year')['key_mode'].value_counts().unstack().reset_index()
ani_bar_df_cum = ani_bar_df.cumsum()
ani_bar_df_cum['year']=ani_bar_df['year']
ani_bar_df_cum.set_index('year',inplace=True)
ani_bar_df_cum=ani_bar_df_cum[column_list]

#prepare the dat for animation
def prepare_data(df, steps=5):
    df = df.reset_index()
    df.index = df.index * steps
    last_idx = df.index[-1] + 1
    df_expanded = df.reindex(range(last_idx))
    df_expanded['year'] = df_expanded['year'].fillna(method='ffill')
    df_expanded = df_expanded.set_index('year')
    df_rank_expanded = df_expanded.rank(axis=1, method='first')
    df_expanded = df_expanded.interpolate()
    df_rank_expanded = df_rank_expanded.interpolate()
    return df_expanded, df_rank_expanded

df_expanded, df_rank_expanded = prepare_data(ani_bar_df_cum)

#draw a race chart
def nice_axes(ax):
    ax.set_facecolor('.8')
    ax.tick_params(labelsize=20, length=0)
    ax.grid(True, axis='x', color='white')
    ax.set_axisbelow(True)
    [spine.set_visible(False) for spine in ax.spines.values()]
    
def init():
    ax.clear()
    nice_axes(ax)
    ax.set_ylim(.2, 6.8)
colors = plt.cm.Paired(range(12))
labels = df_expanded.columns
def update(i):
    for bar in ax.containers:
        bar.remove()
    y = df_rank_expanded.iloc[i]
    width = df_expanded.iloc[i]
    ax.barh(y=y, width=width, color=colors, tick_label=labels)
    date_str = round(df_expanded.index[i])
    ax.set_title(date_str,x=0.7, y=0.3, fontsize=100)
    
fig = plt.Figure(figsize=(20, 10), dpi=144)
ax = fig.add_subplot()
anim = FuncAnimation(fig=fig, func=update, init_func=init, frames=len(df_expanded), 
                     interval=100, repeat=False)
HTML(anim.to_jshtml())

In [None]:
#anim.save('animation.mp4', writer='imagemagick')

In [None]:
matplotlib.matplotlib_fname()

In [None]:
#get the most popular track for each key
data_key_rank = data.copy()
data_key_rank['rank']=data.groupby('key_mode')['popularity'].rank(method='first',ascending=False)
data_key_rank[data_key_rank['rank']==1]
data_key_rank['song_artist'] =data_key_rank.name + ' '+'-'+' '+ data_key_rank.artists.str.strip('[\'\"]')
data_key_rank = data_key_rank[data_key_rank['rank']==1]

#merge with key counts to show number of tracks with each key
bar_df = pd.merge(key_count,data_key_rank, left_on='key_name',right_on='key_mode')
bar_df = bar_df[['key_name','counts','song_artist']]

#reduce the lenth of artist and tracks name
bar_df = pd.merge(key_count,data_key_rank, left_on='key_name',right_on='key_mode')
bar_df = bar_df[['key_name','counts','song_artist']]
bar_df.loc[bar_df.key_name=='C♯ Major','song_artist'] = 'WAP - Cardi B'
bar_df.loc[bar_df.key_name=='G Major','song_artist'] = 'LA NOCHE DE ANOCHE - Bad Bunny'
bar_df.loc[bar_df.key_name=='G♯ Major','song_artist'] = 'Head & Heart - Joel Corry'
bar_df.loc[bar_df.key_name=='F♯ Major','song_artist'] = 'Holy - Justin Bieber'
bar_df.loc[bar_df.key_name=='B Major','song_artist'] = 'ROCKSTAR - DaBaby'
bar_df.loc[bar_df.key_name=='F♯ Minor','song_artist'] = 'For The Night - Lil Baby'
bar_df.loc[bar_df.key_name=='D Minor','song_artist'] = 'Monster - Shawn Mendes'
bar_df.loc[bar_df.key_name=='G Minor','song_artist'] = 'Mood - 24kGoldn'
bar_df.loc[bar_df.key_name=='C Minor','song_artist'] = 'Chica Ideal - Sebastian Yatra'

# Reorder the dataframe
df = bar_df.sort_values(by=['counts'])

# initialize the figure
plt.figure(figsize=(20,10))
ax = plt.subplot(111, polar=True)
plt.axis('off')

# Constants = parameters controling the plot layout:
upperLimit = 100
lowerLimit = 30
labelPadding = 4

# Compute max and min in the dataset
max = df['counts'].max()

# Let's compute heights: they are a conversion of each item value in those new coordinates
# In our example, 0 in the dataset will be converted to the lowerLimit (10)
# The maximum will be converted to the upperLimit (100)
slope = (max - lowerLimit) / max
heights = slope * df.counts + lowerLimit

# Compute the width of each bar. In total we have 2*Pi = 360°
width = 2*np.pi / len(df.index)

# Compute the angle each bar is centered on:
indexes = list(range(1, len(df.index)+1))
angles = [element * width for element in indexes]
angles

# Draw bars
bars = ax.bar(
    x=angles, 
    height=heights, 
    width=width, 
    bottom=lowerLimit,
    linewidth=2, 
    edgecolor="white",
    color="#61a4b2"
)

# Add labels
for bar, angle, height, label in zip(bars,angles, heights, df["key_name"]+ ':'+' ' +df['song_artist']):

    # Labels are rotated. Rotation must be specified in degrees :(
    rotation = np.rad2deg(angle)

    # Flip some labels upside down
    alignment = ""
    if angle >= np.pi/2 and angle < 3*np.pi/2:
        alignment = "right"
        rotation = rotation + 180
    else: 
        alignment = "left"

    # Finally add the labels
    ax.text(
        x=angle, 
        y=lowerLimit + bar.get_height() + labelPadding, 
        s=label, 
        ha=alignment, 
        va='center', 
        rotation=rotation, 
        rotation_mode="anchor") 