# Objective: 
    Compare popular songs(billboard) & award_winning(grammy) songs from different aspects.
# Introduction:
    We are very curious about what kind of music do people like and what kind of music are likely to get awards. We initially thought that the music that are popular and the ones that win awards will differ greatly. However, through analyzing and visualizing data: we noticed that the two kind of music actually overlap greatly.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource,HoverTool
from bokeh.io import show, output_notebook
from scipy.ndimage import gaussian_gradient_magnitude
from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
import seaborn as sns
from matplotlib_venn import venn2
import scipy as sp
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
import random

# Data cleaning
    The three datasets we used are: Spotify Music Database 1912-2020, Grammy Awards 1999-2018, and BillBoard Top 100 1999-2019. The spotify dataset that includes 160,000+ entrys acts as a control group, which includes roughly all kinds of available music. The Billboard datasets acts as a representation of popular songs, while the Grammy dataset acts as a representation of award winning songs. In order to make the best use out of the datasets, we filtered out the information we do not need and merged the datasets for comparison purposes. As following:

In [None]:
# import datasets
spotify = pd.read_csv('/kaggle/input/spotify-dataset-19212020-160k-tracks/data.csv')
grammy = pd.read_csv('/kaggle/input/data-on-songs-from-billboard-19992019/BillboardFromLast20/grammySongs_1999-2019.csv')
billboard = pd.read_csv('/kaggle/input/data-on-songs-from-billboard-19992019/BillboardFromLast20/billboardHot100_1999-2019.csv')

In [None]:
len(spotify)

In [None]:
len(grammy)

In [None]:
len(billboard)

In [None]:
spotify.head()

In [None]:
grammy.head()

In [None]:
billboard.head()

    In order to assign features to grammy and billboard songs, we renamed the columns' name and changed the format of the objects in the Artist column (get rid of the square bracket and quotation mark, change "&" to ",") and combined the repeated songs with columns(features/weekly rank/weeks on chart, etc.) we need for different cleaned datasets.

In [None]:
# put spotify into the same format
spotify['artists'] = spotify['artists'].str.strip("[]")
spotify['artists'] = spotify['artists'].str.replace("'", "").str.replace(" &", ",")
spotify.rename(columns = {'name':'Name', 'artists':'Artist'}, inplace = True)
spotify['Name'] = spotify['Name'].str.lower()
spotify['Artist'] = spotify['Artist'].str.lower()

# put grammy into the same format
grammy['Artist'] = grammy['Artist'].str.replace(" &", ",")
grammy['Name'] = grammy['Name'].str.lower()
grammy['Artist'] = grammy['Artist'].str.lower()

# put billboard into the same format
billboard.rename(columns = {'Artists':'Artist'}, inplace = True)
billboard['Artist'] = billboard['Artist'].str.replace(" &", ",")
billboard['Name'] = billboard['Name'].str.lower()
billboard['Artist'] = billboard['Artist'].str.lower()

In [None]:
# clean spotify dataset (in case it contains the same songs by the same artists)
songs = spotify.groupby(['Name', 'Artist'], as_index = False).agg({'acousticness' : 'mean', 'acousticness' : 'mean', 'danceability' : 'mean', 'duration_ms' : 'mean', 'energy' : 'mean', 'explicit' : 'max', 'instrumentalness' : 'mean', 'key' : 'median',  'liveness' : 'mean',  'loudness' : 'mean',  'mode' : 'max', 'popularity' : 'sum', 'speechiness' : 'mean', 'tempo' : 'mean', 'valence' : 'mean', 'year' : 'min'})

# clean grammy songs, merge it with songs(spotify) to get more info, drop useless columns
gr = grammy.merge(songs, on = ['Name', 'Artist'])
gr = gr.drop(columns = 'Unnamed: 0').drop(columns = 'X')

# clean billboard
bb1 = billboard.groupby(['Name', 'Artist', 'Week', 'Weekly.rank'], as_index = False).agg({'Weeks.on.chart' : 'max', 'Peak.position' : 'min', 'Genre' : 'first', 'Date':'first'})
bb1 = bb1.merge(songs, on = ['Name', 'Artist'])
bb2 = bb1.groupby(['Name','Artist'], as_index = False).agg({'Weeks.on.chart' : 'max', 'Peak.position' : 'min'})
bb2 = bb2.dropna(subset = ['Peak.position', 'Weeks.on.chart'])
bb3 = bb1.groupby(['Name','Artist'], as_index = False).agg({'acousticness' : 'mean', 'danceability' : 'mean', 'duration_ms' : 'mean', 'energy' : 'mean', 'explicit' : 'max', 'instrumentalness' : 'mean', 'key' : 'median',  'liveness' : 'mean',  'loudness' : 'mean',  'mode' : 'max', 'speechiness' : 'mean', 'tempo' : 'mean', 'valence' : 'mean', 'year' : 'min'})

# capitalize each word (reformatting)
songs['Name'] = songs['Name'].str.title()
songs['Artist'] = songs['Artist'].str.title()
gr['Name'] = gr['Name'].str.title()
gr['Artist'] = gr['Artist'].str.title()
bb1['Name'] = bb1['Name'].str.title()
bb1['Artist'] = bb1['Artist'].str.title()
bb2['Name'] = bb2['Name'].str.title()
bb2['Artist'] = bb2['Artist'].str.title()
bb3['Name'] = bb3['Name'].str.title()
bb3['Artist'] = bb3['Artist'].str.title()
bb3['loudness'] = bb3['loudness']/60 + 1
songs['loudness'] = songs['loudness']/60 + 1
gr['loudness'] = gr['loudness']/60 + 1

In [None]:
songs.head()

In [None]:
gr.head()

In [None]:
bb1.head()

In [None]:
bb2.head()

In [None]:
bb3.head()

    At the end we have five cleaned dataset:
    1. songs: all the songs
    2. gr: all grammy awards and songs and features
    3. bb1: all billboard songs and features (contain repeated songs but on different weeks)
    4. bb2: combine the same song contain peak position and weeks on chart (without repeated)
    5. bb3: combine the same song with features (without repeated)

# Visualizations 

# 1. Boxplot:
    In order to compare the features of average music, popular music, and award_winning music, we utilized the audio features that the Spotify dataset assigned to each song to compare the values of each feature of the three datasets. The following graph shows that compared to the features of the huge Spotify music gallery, BillBoard songs and Grammy songs tend to be more energetic, more dancable, less acoustic, less positive, and less instrumental. The Billboard data displays that nowadays people tend to favor dancable, energetic, and loud music, as these kind of music can elicit positive emotions and are more brainwashing. The Grammy data tend to be studio recordings (low liveness), probably because of sound quality. One really interesting feature is 'loudness', we can see that no music with loudness less than 0.6 are on Billboard or wins Grammy. By hovering on the boxplot, we can see that the outliers of the spotify control group are mostly voiceless music. We can probably further improve the scaling by taking out those music and rescaling the loudness.

In [None]:
fig = go.Figure()
fig.add_trace(go.Box(y=songs['acousticness'], name = 'acousticness - all', hovertext= songs['Name'],
    hoverinfo="y+text"))
fig.add_trace(go.Box(y=gr['acousticness'], name = 'acousticness - grammy', hovertext= gr['Name'],
    hoverinfo="y+text"))
fig.add_trace(go.Box(y=bb3['acousticness'], name = 'acousticness - popular', hovertext= bb3['Name'],
    hoverinfo="y+text"))
fig.add_trace(go.Box(y=songs['danceability'], name = 'danceability - all', hovertext= songs['Name'],
    hoverinfo="y+text"))
fig.add_trace(go.Box(y=gr['danceability'], name = 'danceability - grammy', hovertext= gr['Name'],
    hoverinfo="y+text"))
fig.add_trace(go.Box(y=bb3['danceability'], name = 'danceability - popular', hovertext= bb3['Name'],
    hoverinfo="y+text"))
fig.add_trace(go.Box(y=songs['energy'], name = 'energy - all', hovertext= songs['Name'],
    hoverinfo="y+text"))
fig.add_trace(go.Box(y=gr['energy'], name = 'energy - grammy', hovertext= gr['Name'],
    hoverinfo="y+text"))
fig.add_trace(go.Box(y=bb3['energy'], name = 'energy - popular', hovertext= bb3['Name'],
    hoverinfo="y+text"))
fig.add_trace(go.Box(y=songs['instrumentalness'], name = 'instrumentalness - all', hovertext= songs['Name'],
    hoverinfo="y+text"))
fig.add_trace(go.Box(y=gr['instrumentalness'], name = 'instrumentalness - grammy', hovertext= gr['Name'],
    hoverinfo="y+text"))
fig.add_trace(go.Box(y=bb3['instrumentalness'], name = 'instrumentalness - popular', hovertext= bb3['Name'],
    hoverinfo="y+text"))
fig.add_trace(go.Box(y=songs['liveness'], name = 'liveness - all', hovertext= songs['Name'],
    hoverinfo="y+text"))
fig.add_trace(go.Box(y=gr['liveness'], name = 'liveness - grammy', hovertext= gr['Name'],
    hoverinfo="y+text"))
fig.add_trace(go.Box(y=bb3['liveness'], name = 'liveness - popular', hovertext= bb3['Name'],
    hoverinfo="y+text"))
fig.add_trace(go.Box(y=songs['loudness'], name = 'loudness - all', hovertext= songs['Name'],
    hoverinfo="y+text"))
fig.add_trace(go.Box(y=gr['loudness'], name = 'loudness - grammy', hovertext= gr['Name'],
    hoverinfo="y+text"))
fig.add_trace(go.Box(y=bb3['loudness'], name = 'loudness - popular', hovertext= bb3['Name'],
    hoverinfo="y+text"))
fig.add_trace(go.Box(y=songs['valence'], name = 'valence - all', hovertext= songs['Name'],
    hoverinfo="y+text"))
fig.add_trace(go.Box(y=gr['valence'], name = 'valence - grammy', hovertext= gr['Name'],
    hoverinfo="y+text"))
fig.add_trace(go.Box(y=bb3['valence'], name = 'valence - popular', hovertext= bb3['Name'],
    hoverinfo="y+text"))

fig.update_layout(
    title='Audio Profile Comparison between popular songs, award-winning songs, and all songs',
    yaxis=dict(
        zerolinecolor='rgb(0, 0, 0)',
        zerolinewidth=2,
    ),
    paper_bgcolor='rgb(250, 250, 250)',
    plot_bgcolor='rgb(230, 230, 240)',
    showlegend=False, 
)
fig.show()

# 2. Correlation heat graph: 

    The heat graph takes a closer look at the audio profile of the Spotify songs, and focuses on the connection between song features. The graph shows which properties of songs are more correlated with each by colors. 

    According to the scale on the right, as the correlation box become more reddish it shows that two properties are more correlated with each other. We can see high correlation between energy & mode, energy & loudness, dancability & valence... The years column is particularly interesting, we can see that over the years song become more danceable, more energetic, louder and that newer songs are more popular. This line shows the trend of song features development.

In [None]:
f = plt.figure(figsize=(10, 10))
corr = spotify.corr()
ax = sns.heatmap(corr, vmin=-1, vmax=1, center=0,  cmap = sns.diverging_palette(220, 10, n=100),  square=True)
_ = ax.set_xticklabels(ax.get_xticklabels(), rotation=45, horizontalalignment='right')

# 3. Wordcloud:
    In the two graphs above, we analyzed the features of the audio profile of the songs. In the wordclouds below, we are oing to analyze the intersection and difference of artists who often appear on Billboard and who get many Grammy Awards. The following wordclouds are made in these order: Grammy, Billboard, and Spotify.

In [None]:
STOPWORDS.add("featuring")
STOPWORDS.add("songwriter")
STOPWORDS.add("nan")
STOPWORDS.add("artist")
STOPWORDS.add("the")
stopwords = set(STOPWORDS)

In [None]:
def green_color_func(word, font_size, position, orientation, random_state=None,
                    **kwargs):
    return "hsl(140, 25%%, %d%%)" % random.randint(1, 60)
_words = '' 
  
# iterate through the csv file 
for val in gr.Artist: 
      
    # typecaste each val to string 
    val = str(val)
  
    # split the value 
    tokens = val.split()
      
    _words += " ".join(tokens)+" "
  
wordcloud = WordCloud(width = 800, height = 800,
                background_color ='white',
                stopwords = stopwords,
                min_font_size = 10,random_state=1).generate(_words)

# plot the WordCloud image
plt.figure(figsize = (8, 8))
plt.imshow(wordcloud.recolor(color_func=green_color_func, random_state=3),
           interpolation="bilinear")
plt.axis("off")
plt.tight_layout(pad = 0)
plt.show()

In [None]:
def blue_color_func(word, font_size, position, orientation, random_state=None,
                    **kwargs):
    return "hsl(200, 250%%, %d%%)" % random.randint(1, 60)
_words = ''  
  
# iterate through the csv file 
for val in billboard.Artist: 
      
    # typecaste each val to string 
    val = str(val).title()
  
    # split the value 
    tokens = val.split()
      
    _words += " ".join(tokens)+" "
  
wordcloud = WordCloud(width = 800, height = 800,
                background_color ='white',
                stopwords = stopwords,
                min_font_size = 10,random_state=1).generate(_words)

# plot the WordCloud image
plt.figure(figsize = (8, 8))
plt.imshow(wordcloud.recolor(color_func=blue_color_func, random_state=3),
           interpolation="bilinear")
plt.axis("off")
plt.tight_layout(pad = 0)
plt.show()

In [None]:
def purple_color_func(word, font_size, position, orientation, random_state=None,
                    **kwargs):
    return "hsl(267, 100%%, %d%%)" % random.randint(1, 60)
d = os.path.dirname(__file__) if "__file__" in locals() else os.getcwd()
_words_2 = '' 

  
# iterate through the csv file 
for val in songs.Artist: 
      
    # typecaste each val to string 
    val = str(val) 
  
    # split the value 
    tokens = val.split()
      
    _words_2 += " ".join(tokens)+" "
    
music_color = np.array(Image.open(os.path.join(d, "../input/images/cd1.png"))) # a cd image found online

stopwords = set(STOPWORDS) 
STOPWORDS.add("featuring")
STOPWORDS.add("songwriter")
STOPWORDS.add("nan")
STOPWORDS.add("artist")

wc = WordCloud(width = 400, height = 400,background_color="white", max_words=2000, mask=music_color,
               stopwords=stopwords, max_font_size=40, random_state=42)

wc.generate(_words_2)

# create coloring from image
image_colors = ImageColorGenerator(music_color)
# show
plt.figure(figsize = (15, 15))
plt.imshow(wc.recolor(color_func=purple_color_func), interpolation="bilinear")
plt.axis("off")
plt.figure()
plt.tight_layout(pad = 0)
plt.show()

    We realize that there are lots of similar artists, like Taylor Swift and Coldplay, in the Billboard and Grammy word clouds, proving that high-quality songs tend to be popular, and popular songs tend to win awards. 

    However, we also noticed some differences between the two wordclouds. For example, Beyoncé is one of the biggest name in the Grammy wordcloud, yet she can barely be seen on the Billboard wordcloud. We find this pretty shocking. The lack of appearance of Beyoncé on the Billboard wordcloud shows that people are more likely to listen to freshly published songs. Therefore, even though Beyoncé is very famous, she does not appear on Billboard very often likely because the lack of new songs. 

    We did not find the Spotify wordcloud particularly useful because we can see that the artist with most music published are actually classical music composers, like Beetovin, Bach, and Chopin. Since the other two datasets include mainly songs with words, this Spotify wordcloud does not provide much information.

# 4. Scatter Plot1: 

    We focused on the Billboard dataset for the scatterplots below. We are going to plot the peak position of a song v.s. its weeks on chart using the dataset bb2 (with only needed columns left and filtered out repeated terms). 

In [None]:
coeffs = np.polyfit(bb2["Peak.position"], bb2["Weeks.on.chart"], 1)
plt.scatter(bb2["Peak.position"], bb2["Weeks.on.chart"])
plt.xlabel("Chart Position")
plt.ylabel("Weeks on Chart")
plt.plot(bb2["Peak.position"], coeffs[0] * bb2["Peak.position"] + coeffs[1], color = "black")

    This graph displays the week and peak position of each song in the Billboard dataset, we can roughly see a correlation between peak position and weeks on chart. 

    Therefore we plotted another graph grouping the songs with the same peak position and graphing their mean weeks on chart: 

In [None]:
bb2_mean = bb2.groupby('Peak.position',as_index = False).agg({'Weeks.on.chart' : 'mean'})
plt.scatter(bb2_mean["Peak.position"], bb2_mean["Weeks.on.chart"])
plt.xlabel("Peak Position")
plt.ylabel("Mean weeks on chart of songs with the same peak position")
plt.plot(bb2_mean["Peak.position"], coeffs[0] * bb2_mean["Peak.position"] + coeffs[1], color = "black")

    This graph shows more clearly that songs with higher peak positions tend to stay on the chart longer, meaning that popularity produces popularity. People tend to listen to songs that are already popular and further add to their popularity.

# 5. Scatter Plot2
    The songs with a high peak position in Scatter Plot 1 shows a great range of weeks on chart, different from songs with a low peak position. Therefore, we made an interactive chart with the same information to provide information about each points for further analysis.

    As shown by the graph when hovering, the left top point on the chart Radioactive by Imagine Dragons (Peak:3, weeks: 87!!!), and  the bottom left point is You Need To Calm Down by Taylor Swift (Peak: 2, weeks: 2). Both songs have a really high peak position, but the weeks on chart varies drastically.

In [None]:
output_notebook()

In [None]:
data2 = ColumnDataSource(bb2)

In [None]:
TOOLTIPS = [("(Name, Artist, Peak, Weeks)", "(@Name, @Artist, @{Peak.position}, @{Weeks.on.chart})")]

p = figure(title = 'Weeks On Chart vs. Peak Position', plot_width=500, plot_height=400, tooltips = TOOLTIPS)

# add a circle renderer with a size, color, and alpha
# plt.scatter(pz['contributions'], pz['GPA'])
p.circle("Peak.position", "Weeks.on.chart", size = 3, color="blue", source = data2)
p.xaxis.axis_label = 'Weekly Rank'
p.yaxis.axis_label = 'Weeks On Chart'
# show the results
show(p)

# 6. Line Chart
    We picked out some representative points (high week, high peak, and low week, high peak) from the scatter plot above and graphed out the position of the songs throughout their presense on Billboard by weeks. The line graph below shows a comparison between ever-lasting classics and evanescent fireworks on the Billboard. 
    
    Most of the long-lived songs first appear at the bottom part of the Billboard and climbs up to the top and disappear when they fall out of Top 50. On the other hand, the short lived ones usually show up at the top of the chart but drops to the bottom or just disappears suddenly after a short period of time. These examples include You Need To Calm Down and Killshot. We hypothesize that the long-lived songs climbed up because of the quality or brainwashiness of the song itself, while the short-lived songs flashes at the very top of Billboard because of either the reputation of the artist or advertisement of the song.
    
    Noticing the wierd invisible 'dead' line of Rank 50 that makes all the long-lasting songs disappear from Billboard, we researched about this phenomenon and found out the recurrent rules of Billboard: For songs that have been on chart for more than 20 weeks, they will be removed from Billboard and placed into the recurrent list if they fall below rank 50. This rule also explains the trendline of Shape Of You by Ed Sheeran: because it appears on Billboard two times but have fallen below rank 50.

In [None]:
rad = bb1[(bb1['Name'] == 'Radioactive') & (bb1['Artist'] == 'Imagine Dragons')]
costar = bb1[(bb1['Name'] == 'Counting Stars') & (bb1['Artist'] == 'Onerepublic')]
roll = bb1[(bb1['Name'] == 'Rolling In The Deep') & (bb1['Artist'] == 'Adele')]
imy = bb1[(bb1['Name'] == 'I\'M Yours') & (bb1['Artist'] == 'Jason Mraz')]
soy = bb1[(bb1['Name'] == 'Shape Of You') & (bb1['Artist'] == 'Ed Sheeran')]

ynt = bb1[(bb1['Name'] == 'You Need To Calm Down') & (bb1['Artist'] == 'Taylor Swift')]
ibe = bb1[(bb1['Name'] == 'I Believe') & (bb1['Artist'] == 'Fantasia')]
badg = bb1[(bb1['Name'] == 'Bad Guy') & (bb1['Artist'] == 'Billie Eilish')]
ks = bb1[(bb1['Name'] == 'Killshot') & (bb1['Artist'] == 'Eminem')]
atm = bb1[(bb1['Name'] == 'Atm') & (bb1['Artist'] == 'J. Cole')]

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x = rad['Weeks.on.chart'], y = rad['Weekly.rank'], mode = 'lines', name = 'Radioactive - Imagine Dragons'))
fig.add_trace(go.Scatter(x = costar['Weeks.on.chart'], y = costar['Weekly.rank'], mode = 'lines', name = 'Counting Stars - OneRepublic'))
fig.add_trace(go.Scatter(x = roll['Weeks.on.chart'], y = roll['Weekly.rank'], mode = 'lines', name = 'Rolling In The Deep - Adele'))
fig.add_trace(go.Scatter(x = imy['Weeks.on.chart'], y = imy['Weekly.rank'], mode = 'lines', name = 'I\'m Yours - Jason Mraz'))
fig.add_trace(go.Scatter(x = soy['Weeks.on.chart'], y = soy['Weekly.rank'], mode = 'lines', name = 'Shape Of You - Ed Sheeran'))

fig.add_trace(go.Scatter(x = ynt['Weeks.on.chart'], y = ynt['Weekly.rank'], mode = 'lines', name = 'You Need To Calm Down - Taylor Swift'))
fig.add_trace(go.Scatter(x = ibe['Weeks.on.chart'], y = ibe['Weekly.rank'], mode = 'lines', name = 'I Believe - Fantasia'))
fig.add_trace(go.Scatter(x = badg['Weeks.on.chart'], y = badg['Weekly.rank'], mode = 'lines', name = 'Bad Guy - Billie Eilish'))
fig.add_trace(go.Scatter(x = ks['Weeks.on.chart'], y = ks['Weekly.rank'], mode = 'lines', name = 'Killshot - Eminem'))
fig.add_trace(go.Scatter(x = atm['Weeks.on.chart'], y = atm['Weekly.rank'], mode = 'lines', name = 'Atm - J. Cole'))

fig.update_layout(
    title='Weekly Rank vs. Weeks on Billboard',
    xaxis_title="Weeks On Chart",
    yaxis_title="Weekly Rank",
    legend_title="Song - Artist",
    paper_bgcolor='rgb(250, 250, 250)',
    plot_bgcolor='rgb(230, 230, 240)'
)
fig['layout']['yaxis']['autorange'] = "reversed"
fig.show()

In [None]:
gr[gr['Name'] == 'Rolling In The Deep']

In [None]:
gr[gr['Name'] == 'Shape Of You']

In [None]:
gr[gr['Name'] == 'Radioactive']

# 7. Venn Diagram
    After displaying the overlapping trends in popular and award-winning song features and artist, the venn diagram below shows the intersection of popular and award-winning songs in the simplest and most straightforward way. As shown by the venn diagram, 131 out of 180 songs that got the Grammy Award (we count the songs that win multiple awards as one song) have appeared on the Billboard as some time.

In [None]:
plt.figure(figsize=(6,6))
v = venn2([set(gr['Name']), set(bb1['Name'])], 
          set_labels = ('Songs got Grammy', 'Songs on Billboard'), 
          set_colors=('darkblue', 'lightblue'), 
          )
v.get_label_by_id('A').set_size(20)
v.get_label_by_id('A').set_color('darkblue')
v.get_label_by_id('B').set_size(15)
v.get_label_by_id('A').set_color('darkblue')

# Conclusion:
    From the analysis and visualization above, we have come to the conclusion that popularity and awards are correlated. the reason is still unclear.

    One possibility is the distribution of the Grammy Awards contributes to the intersection. Grammy awards are awarded by genres, which a great proportion of the awards are given to popular genres. As shown in the Pie Chart below, almost 50% of the awards are given to Country, Pop, and Rap. In terms of specific songs, people might start listening to a song after it gets an award, or a song might get nominated for an award because of its popularity. The exact reason behind this trend needs further research.

In [None]:
grammy_for_pie = gr.groupby("Genre", as_index = False).agg({"Name": "count"})

In [None]:
my_colors = ['palegreen', 'paleturquoise', 'lightpink', 'lightsteelblue', 
           'khaki', 'tomato', 'aqua', 'lightseagreen', 'lightsalmon', 'lightskyblue',"plum"]

In [None]:
plt.figure(figsize = (10,15))
_ = plt.pie(grammy_for_pie["Name"],labels= grammy_for_pie["Genre"],autopct="%1.2f%%",colors= my_colors)

# Further analysis or things to improve:
    The other aspect to focus on in further research will be in the data cleaning part. The Grammy data set initially has over 400 entrys but when we merge it with the Spotify dataset, it got cut to about 220 entrys. Maybe some of the songs that got awarded are not recorded in Spotify, but since the Spotify database, that is not likely. If we had more time, we would look into what happened in the merging process and check if formatting issues are causing the elimination of those entries.
    Also, the Spotify dataset include many lyricless music, including classical music, piano compositions, electric lyricless music and so on. Since our project focuses on songs with lyrics, we can improve the accuracy of our analysis if we can find out a way to take out those music pieces.