# Historic Spotify Chart Analysis
In this notebook, we will analyze the historic Spotify charts to determine recurring christmas songs and artists. The data has been pulled from [Spotifycharts](https://spotifycharts.com/home/) Top 200 Weekly Playlist. Since the charts API has been deprecated and scraping is against their Terms of Service, we'll be using the [historic Spotify charts dataset from Kaggle](https://www.kaggle.com/bartomiejczyewski/spotify-top-200-weekly-global-20172021).

First let's load the data into a pandas dataframe and do some basic data cleaning.

In [212]:
import numpy as np
import pandas as pd
import datetime
import matplotlib.pyplot as plt
%matplotlib inline

# read csv into pandas dataframe
df = pd.read_csv('archive/historicCharts.csv')

# read the date strings into pandas as datetime objects
df["date"] = pd.to_datetime(df["Week"])
df = df.set_index("date")
df.drop(["Week"], axis=1, inplace=True)

# drop other dataframe columns that aren't of interest
df.drop(["Explicit", "Track_Number_on_Album", "Artist_Followers", "Duration_MS", "Track_Number_on_Album", "Link", "Album_Name"], axis=1, inplace=True)

df["Artist"] = df["Artist"].astype('string')
df["Track"] = df["Track"].astype('string').str.lower()

# convert the Artist_Genres column to lists
df["Artist_Genres"] = df["Artist_Genres"].apply(eval)

df.plot(x="Rank", y="Streams", kind="scatter", marker=".")
df.head()

Using matplotlib backend: TkAgg


Unnamed: 0_level_0,Rank,Track,Artist,Streams,Artist_Genres
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2017-01-06,1,starboy,The Weeknd,25734078,"[canadian contemporary r&b, canadian pop, pop]"
2017-01-06,2,closer,The Chainsmokers,23519705,"[dance pop, edm, electropop, pop, pop dance, t..."
2017-01-06,3,rockabye (feat. sean paul & anne-marie),Clean Bandit,21216399,"[dance pop, edm, pop, pop dance, post-teen pop..."
2017-01-06,4,let me love you,DJ Snake,19852704,"[dance pop, edm, electronic trap, pop, pop dance]"
2017-01-06,5,i don’t wanna live forever (fifty shades darke...,ZAYN,18316326,"[dance pop, pop, post-teen pop, uk pop]"


  self.func()


In [None]:
# filter by month (November and December seem to be the time for christmas songs)
# score the songs by their position in the charts as follows (higher is better) 
# (250 - position)*(weeks the song was in the charts)
december_df = df[df.index.month == 12]

# The most frequently played christmas songs, determined by how often they appear in the charts around christmas
# By using this method, one hit wonders and songs that just happen to be in the charts around christmas are not included (as often)
# You could use a more sophisticated scoring function, but this ended up producing good results in my testing.
december_df['Track'].value_counts().index[:60].values



# Now let's see how these songs rank throughout the years
#lastChristmas = df[df["Track"] == "last christmas"]
lastChristmas = df[df["Track"] == "santa claus is coming to town"]
lastChristmas  = lastChristmas.assign(Score = (250 - lastChristmas["Rank"]))
lastChristmas.plot(kind="line", subplots=True)

