ANALYSIS </br>

Bilboard Exploration: </br>
1. How many songs reached the number 1 position during the sample period </br>
   Will looking at only #1 songs be useful for analysis, or should we look at songs that entered the top 5/10? </br>
2. How many weeks did each of those songs appear on the charts? </br>

Spoitfy Exploration: </br>
1. Of 1,000,000 playlists, how many were updated three or fewer times? </br>

Preparation: </br>
1. Artist Name, Track Title of all #1 songs </br>
2. Match Artist Names and Track Titles to Spotify IDs </br>
3. Iterate over Spotify data to identify playlists that have one or more #1 tracks in the playlist </br>
4. Isolate relevant song lines and write to a new Spotify dataframe/file </br>
5. Of x playlists with 3 or fewer edits, how many had a Billboard Charting song on it? </br>

Analysis: </br>
1. Which #1 songs were most present in playlists? Are there any #1 songs that were not on any playlists? </br>
2. Do the top 10 performing billboard songs appear most on the user playlists? </br>
3. For the 10 number one songs that have the most playlist adds, what was the playlist activity in relation to chart activity?</br>

In [73]:
import json
import pandas as pd
import os
import numpy as np

### Billboard Exploration

In [102]:
# How many songs reached the number 1 position during the sample period
# Will looking at only #1 songs be useful for analysis, or should we look at songs that entered the top 5/10? all?

billboard = pd.read_csv('../data/Billboard/billboard_chart_data.csv')

In [103]:
billboard.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41700 entries, 0 to 41699
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   week_of            41700 non-null  object
 1   rank_current_week  41700 non-null  int64 
 2   title              41700 non-null  object
 3   artist             41700 non-null  object
 4   rank_prior_week    41700 non-null  object
 5   peak_pos           41700 non-null  int64 
 6   weeks_on_chart     41700 non-null  int64 
dtypes: int64(3), object(4)
memory usage: 2.2+ MB


In [104]:
billboard['artist_title'] = billboard.artist+ '-' + billboard.title

In [105]:
# total number of songs that entered the billboard top 100 during the sample period
billboard.artist_title.nunique()

3460

In [106]:
# all number 1 position during the sample period
no1 = billboard[billboard.rank_current_week == 1]
no1.to_csv('../data/Billboard/number_1_by_week.csv', index=False)

In [122]:
no1_unique = no1[['week_of','artist','title']].drop_duplicates(subset=['artist','title'], keep='first').reset_index()

no1_unique[['base_artist','expand']] = no1_unique['artist'].str.split(' Featuring ', expand=True)
#no1_unique[['base_artist','expand']] = no1_unique['artist'].str.split(' &', expand=True)
no1_unique['base_artist'] = no1_unique['base_artist'].str.lower()
no1_unique['base_artist'] = no1_unique['base_artist'].replace({'ke$ha':'kesha', 'far*east movement':'far east movement','luis fonsi & daddy yankee':'luis fonsi'})
no1_unique['title'] = no1_unique['title'].str.lower()
no1_unique['title'] = no1_unique['title'].replace('uptown funk!','uptown funk')
no1_unique['key'] = no1_unique['base_artist'] + '_' + no1_unique['title'].str.lower().str.split(" \(").str[0]
artist_list = no1_unique.base_artist.unique().tolist()

In [167]:
no1.title.unique().tolist()

['TiK ToK',
 'Imma Be',
 'Break Your Heart',
 'Rude Boy',
 "Nothin' On You",
 'OMG',
 'Not Afraid',
 'California Gurls',
 'Love The Way You Lie',
 'Teenage Dream',
 'Just The Way You Are',
 'Like A G6',
 'We R Who We R',
 "What's My Name?",
 'Only Girl (In The World)',
 'Raise Your Glass',
 'Firework',
 'Grenade',
 'Hold It Against Me',
 'Black And Yellow',
 'Born This Way',
 'E.T.',
 'S&M',
 'Rolling In The Deep',
 'Give Me Everything',
 'Party Rock Anthem',
 'Last Friday Night (T.G.I.F.)',
 'Moves Like Jagger',
 'Someone Like You',
 'We Found Love',
 'Sexy And I Know It',
 'Set Fire To The Rain',
 "Stronger (What Doesn't Kill You)",
 'Part Of Me',
 'We Are Young',
 'Somebody That I Used To Know',
 'Call Me Maybe',
 'Whistle',
 'We Are Never Ever Getting Back Together',
 'One More Night',
 'Diamonds',
 'Locked Out Of Heaven',
 'Thrift Shop',
 'Harlem Shake',
 'When I Was Your Man',
 'Just Give Me A Reason',
 "Can't Hold Us",
 'Blurred Lines',
 'Roar',
 'Wrecking Ball',
 'Royals',
 'Th

In [123]:
#number of songs that went number one
len(no1_unique)

90

In [125]:
billboard_key = no1_unique['key'].str.lower().tolist()
billboard_key

['kesha_tik tok',
 'the black eyed peas_imma be',
 'taio cruz_break your heart',
 'rihanna_rude boy',
 "b.o.b_nothin' on you",
 'usher_omg',
 'eminem_not afraid',
 'katy perry_california gurls',
 'eminem_love the way you lie',
 'katy perry_teenage dream',
 'bruno mars_just the way you are',
 'far east movement_like a g6',
 'kesha_we r who we r',
 "rihanna_what's my name?",
 'rihanna_only girl',
 'p!nk_raise your glass',
 'katy perry_firework',
 'bruno mars_grenade',
 'britney spears_hold it against me',
 'wiz khalifa_black and yellow',
 'lady gaga_born this way',
 'katy perry_e.t.',
 'rihanna_s&m',
 'adele_rolling in the deep',
 'pitbull_give me everything',
 'lmfao_party rock anthem',
 'katy perry_last friday night',
 'maroon 5_moves like jagger',
 'adele_someone like you',
 'rihanna_we found love',
 'lmfao_sexy and i know it',
 'adele_set fire to the rain',
 'kelly clarkson_stronger',
 'katy perry_part of me',
 'fun._we are young',
 'gotye_somebody that i used to know',
 'carly rae j

In [126]:
title_list = []
base_list = no1_unique.title.unique().tolist()

for x in base_list:
    x = x.split(' (')
    x = x[0]
    title_list.append(x)

In [127]:
len(title_list)

90

In [128]:
title_list

['tik tok',
 'imma be',
 'break your heart',
 'rude boy',
 "nothin' on you",
 'omg',
 'not afraid',
 'california gurls',
 'love the way you lie',
 'teenage dream',
 'just the way you are',
 'like a g6',
 'we r who we r',
 "what's my name?",
 'only girl',
 'raise your glass',
 'firework',
 'grenade',
 'hold it against me',
 'black and yellow',
 'born this way',
 'e.t.',
 's&m',
 'rolling in the deep',
 'give me everything',
 'party rock anthem',
 'last friday night',
 'moves like jagger',
 'someone like you',
 'we found love',
 'sexy and i know it',
 'set fire to the rain',
 'stronger',
 'part of me',
 'we are young',
 'somebody that i used to know',
 'call me maybe',
 'whistle',
 'we are never ever getting back together',
 'one more night',
 'diamonds',
 'locked out of heaven',
 'thrift shop',
 'harlem shake',
 'when i was your man',
 'just give me a reason',
 "can't hold us",
 'blurred lines',
 'roar',
 'wrecking ball',
 'royals',
 'the monster',
 'timber',
 'dark horse',
 'happy',
 '

In [90]:
# all songs that reached at least the number 5 position during the sample period
top5 = billboard[billboard.rank_current_week <= 5]
top5.artist_title.nunique()

249

In [91]:
# all songs that reached at least the number 10 position during the sample period
top10 = billboard[billboard.rank_current_week <= 10]
top10.artist_title.nunique()

423

In [92]:
#How many weeks did each of those songs appear on the charts?
week_counts = pd.DataFrame(billboard.groupby(['artist','title'])[['artist','title']].value_counts())
week_counts.sort_values('count', ascending=False).head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,count
artist,title,Unnamed: 2_level_1
Imagine Dragons,Radioactive,87
AWOLNATION,Sail,79
LMFAO Featuring Lauren Bennett & GoonRock,Party Rock Anthem,68
OneRepublic,Counting Stars,68
Adele,Rolling In The Deep,65
The Lumineers,Ho Hey,62
Imagine Dragons,Demons,61
John Legend,All Of Me,59
Gotye Featuring Kimbra,Somebody That I Used To Know,59
Ed Sheeran,Thinking Out Loud,58


### Spotify Exploration

In [93]:
#Of 1,000,000 playlists, how many were updated three or fewer times? 

In [19]:
#file names - create list of flie strings to leverage in loops
file_list = []                                                    #empty list for strings to land
for file in os.listdir('..\data\Spotify\data'):                   #for loop to locate each file in source folder
    file_name = '..\\data\\Spotify\\data\\' + os.fsdecode(file)   #create a file string name to be read in
    file_list.append(file_name)                                   #add name to list
#file_list                                                         #print resulting list

In [20]:
# get playlist IDs where the playlists were updated three or fewer times
pid_list = []
for file in file_list: #for loop to iterate through files
    with open(file) as data_file:
        d = json.load(data_file)
        playlists = pd.json_normalize(d['playlists'])
        edit_reqs = playlists[playlists.num_edits <= 3]
        pid_list.append(edit_reqs.pid.unique())

In [21]:
#list of pid arrays to single list of pids
pid_list = np.concatenate(pid_list).ravel().tolist()


In [22]:
#count of playlists with three or fewer edits
len(pid_list)

174083

In [23]:
#percent of playlists from Spotify's million dataset
round(len(pid_list) / 1000000 * 100,2)

17.41

### Preparation

In [94]:
# Artist Name, Track Title of all #1 songs
no1_unique[['artist','title']]

Unnamed: 0,artist,title
0,Ke$ha,TiK ToK
1,The Black Eyed Peas,Imma Be
2,Taio Cruz Featuring Ludacris,Break Your Heart
3,Rihanna,Rude Boy
4,B.o.B Featuring Bruno Mars,Nothin' On You
...,...,...
85,Luis Fonsi & Daddy Yankee Featuring Justin Bieber,Despacito
86,Taylor Swift,Look What You Made Me Do
87,Cardi B,Bodak Yellow (Money Moves)
88,Post Malone Featuring 21 Savage,Rockstar


In [60]:
# Iterate over Spotify data to identify playlists that have one or more #1 tracks in the playlist
# Isolate relevant song lines and write to a new Spotify dataframe/file

In [61]:
# SINGLE FILE TRY

# hold=[]
# with open('../data/Spotify/data/mpd.slice.0-999.json') as data_file:
#         d = json.load(data_file)
#         playlists = pd.json_normalize(d['playlists'])
#         playlists = playlists[playlists.pid.isin(pid_list)]
#         tracks = pd.json_normalize(d, record_path=['playlists','tracks'],meta=[['playlists','pid']])
#         tracks = tracks[tracks['playlists.pid'].isin(pid_list)]
#         tracks['artist_name'] = tracks['artist_name'].str.lower().str.split(',').str[0]
#         tracks['artist_name'] = tracks['artist_name'].str.lower().str.split('(').str[0]
#         tracks['track_name'] = tracks['track_name'].str.lower().str.split('(').str[0]
#         tracks['track_name'] = tracks['track_name'].str.lower().str.split('-').str[0]
#         replacements = {'kesha':'ke$ha', 'far east movement':'far*east movement'}
#         tracks = tracks.replace(tracks.artist_name, replacements)
#         tracks['key'] = tracks['artist_name'] + '_' + tracks['track_name'].str.lower()#.str.split().str[0]
#         tracks = tracks[tracks['key'].isin(billboard_key2)]
#         df = tracks.merge(playlists, how='inner', left_on='playlists.pid', right_on='pid')
#         df = df.drop(columns=['tracks','description','playlists.pid'])
#         hold.append(df)
#         print(f'file {data_file} complete')
        
# top_song_playlists = pd.concat(hold)# 

#### Uncomment the cells below for first time use
First cell creates an empty dataframe. </br>

Second cell is a loop that iterates through Spotify Million Playlist Dataset.  The result: </br>
<li> Returns only playlists with three or fewer edits </li>
<li> Returns playlist and track information for instances in which a track that reached No. 1 on the Billboard Charts was added to a user playlist during the sample period </li>
Note the cells above must be executed to generate a list of PIDs (playlist identifiers) and tracks that hit No. 1 (billboard_key)

Third cell checks for number of unique tracks in the results; 79 songs reached number one during the period, so the cell should return 79.

Fourth cell checks the results and prints the title of any track that hit No. 1 on Billboard, but had no occurrences in the Spotify dataframe.

Fifth cell prints results to a .csv


In [129]:
#intiate empty dataframe
top_song_playlists = pd.DataFrame()

In [130]:
hold = []
for file in file_list: 
    with open(file) as data_file:
        d = json.load(data_file)
        playlists = pd.json_normalize(d['playlists'])
        playlists = playlists[playlists.pid.isin(pid_list)]
        tracks = pd.json_normalize(d, record_path=['playlists','tracks'],meta=[['playlists','pid']])
        tracks = tracks[tracks['playlists.pid'].isin(pid_list)]
        tracks['artist_name'] = tracks['artist_name'].str.lower()
        tracks['artist_name'] = tracks['artist_name'].str.split(',').str[0]
        tracks['artist_name'] = tracks['artist_name'].str.split(" \(").str[0]
        tracks['track_name'] = tracks['track_name'].str.lower()
        tracks['track_name'] = tracks['track_name'].str.split(" \(").str[0]
        tracks['track_name'] = tracks['track_name'].str.split(' -').str[0]
        tracks['key'] = tracks['artist_name'] + '_' + tracks['track_name'].str.lower()
        tracks = tracks[tracks['key'].isin(billboard_key)]
        df = tracks.merge(playlists, how='inner', left_on='playlists.pid', right_on='pid')
        df = df.drop(columns=['tracks','description','playlists.pid'])
        df.modified_at = pd.to_datetime(df.modified_at, unit = 's')
        hold.append(df)
        print(f'file {file} complete')
        
top_song_playlists = pd.concat(hold)

file ..\data\Spotify\data\mpd.slice.0-999.json complete
file ..\data\Spotify\data\mpd.slice.1000-1999.json complete
file ..\data\Spotify\data\mpd.slice.10000-10999.json complete
file ..\data\Spotify\data\mpd.slice.100000-100999.json complete
file ..\data\Spotify\data\mpd.slice.101000-101999.json complete
file ..\data\Spotify\data\mpd.slice.102000-102999.json complete
file ..\data\Spotify\data\mpd.slice.103000-103999.json complete
file ..\data\Spotify\data\mpd.slice.104000-104999.json complete
file ..\data\Spotify\data\mpd.slice.105000-105999.json complete
file ..\data\Spotify\data\mpd.slice.106000-106999.json complete
file ..\data\Spotify\data\mpd.slice.107000-107999.json complete
file ..\data\Spotify\data\mpd.slice.108000-108999.json complete
file ..\data\Spotify\data\mpd.slice.109000-109999.json complete
file ..\data\Spotify\data\mpd.slice.11000-11999.json complete
file ..\data\Spotify\data\mpd.slice.110000-110999.json complete
file ..\data\Spotify\data\mpd.slice.111000-111999.json c

file ..\data\Spotify\data\mpd.slice.214000-214999.json complete
file ..\data\Spotify\data\mpd.slice.215000-215999.json complete
file ..\data\Spotify\data\mpd.slice.216000-216999.json complete
file ..\data\Spotify\data\mpd.slice.217000-217999.json complete
file ..\data\Spotify\data\mpd.slice.218000-218999.json complete
file ..\data\Spotify\data\mpd.slice.219000-219999.json complete
file ..\data\Spotify\data\mpd.slice.22000-22999.json complete
file ..\data\Spotify\data\mpd.slice.220000-220999.json complete
file ..\data\Spotify\data\mpd.slice.221000-221999.json complete
file ..\data\Spotify\data\mpd.slice.222000-222999.json complete
file ..\data\Spotify\data\mpd.slice.223000-223999.json complete
file ..\data\Spotify\data\mpd.slice.224000-224999.json complete
file ..\data\Spotify\data\mpd.slice.225000-225999.json complete
file ..\data\Spotify\data\mpd.slice.226000-226999.json complete
file ..\data\Spotify\data\mpd.slice.227000-227999.json complete
file ..\data\Spotify\data\mpd.slice.228000

file ..\data\Spotify\data\mpd.slice.330000-330999.json complete
file ..\data\Spotify\data\mpd.slice.331000-331999.json complete
file ..\data\Spotify\data\mpd.slice.332000-332999.json complete
file ..\data\Spotify\data\mpd.slice.333000-333999.json complete
file ..\data\Spotify\data\mpd.slice.334000-334999.json complete
file ..\data\Spotify\data\mpd.slice.335000-335999.json complete
file ..\data\Spotify\data\mpd.slice.336000-336999.json complete
file ..\data\Spotify\data\mpd.slice.337000-337999.json complete
file ..\data\Spotify\data\mpd.slice.338000-338999.json complete
file ..\data\Spotify\data\mpd.slice.339000-339999.json complete
file ..\data\Spotify\data\mpd.slice.34000-34999.json complete
file ..\data\Spotify\data\mpd.slice.340000-340999.json complete
file ..\data\Spotify\data\mpd.slice.341000-341999.json complete
file ..\data\Spotify\data\mpd.slice.342000-342999.json complete
file ..\data\Spotify\data\mpd.slice.343000-343999.json complete
file ..\data\Spotify\data\mpd.slice.344000

file ..\data\Spotify\data\mpd.slice.447000-447999.json complete
file ..\data\Spotify\data\mpd.slice.448000-448999.json complete
file ..\data\Spotify\data\mpd.slice.449000-449999.json complete
file ..\data\Spotify\data\mpd.slice.45000-45999.json complete
file ..\data\Spotify\data\mpd.slice.450000-450999.json complete
file ..\data\Spotify\data\mpd.slice.451000-451999.json complete
file ..\data\Spotify\data\mpd.slice.452000-452999.json complete
file ..\data\Spotify\data\mpd.slice.453000-453999.json complete
file ..\data\Spotify\data\mpd.slice.454000-454999.json complete
file ..\data\Spotify\data\mpd.slice.455000-455999.json complete
file ..\data\Spotify\data\mpd.slice.456000-456999.json complete
file ..\data\Spotify\data\mpd.slice.457000-457999.json complete
file ..\data\Spotify\data\mpd.slice.458000-458999.json complete
file ..\data\Spotify\data\mpd.slice.459000-459999.json complete
file ..\data\Spotify\data\mpd.slice.46000-46999.json complete
file ..\data\Spotify\data\mpd.slice.460000-4

file ..\data\Spotify\data\mpd.slice.563000-563999.json complete
file ..\data\Spotify\data\mpd.slice.564000-564999.json complete
file ..\data\Spotify\data\mpd.slice.565000-565999.json complete
file ..\data\Spotify\data\mpd.slice.566000-566999.json complete
file ..\data\Spotify\data\mpd.slice.567000-567999.json complete
file ..\data\Spotify\data\mpd.slice.568000-568999.json complete
file ..\data\Spotify\data\mpd.slice.569000-569999.json complete
file ..\data\Spotify\data\mpd.slice.57000-57999.json complete
file ..\data\Spotify\data\mpd.slice.570000-570999.json complete
file ..\data\Spotify\data\mpd.slice.571000-571999.json complete
file ..\data\Spotify\data\mpd.slice.572000-572999.json complete
file ..\data\Spotify\data\mpd.slice.573000-573999.json complete
file ..\data\Spotify\data\mpd.slice.574000-574999.json complete
file ..\data\Spotify\data\mpd.slice.575000-575999.json complete
file ..\data\Spotify\data\mpd.slice.576000-576999.json complete
file ..\data\Spotify\data\mpd.slice.577000

file ..\data\Spotify\data\mpd.slice.68000-68999.json complete
file ..\data\Spotify\data\mpd.slice.680000-680999.json complete
file ..\data\Spotify\data\mpd.slice.681000-681999.json complete
file ..\data\Spotify\data\mpd.slice.682000-682999.json complete
file ..\data\Spotify\data\mpd.slice.683000-683999.json complete
file ..\data\Spotify\data\mpd.slice.684000-684999.json complete
file ..\data\Spotify\data\mpd.slice.685000-685999.json complete
file ..\data\Spotify\data\mpd.slice.686000-686999.json complete
file ..\data\Spotify\data\mpd.slice.687000-687999.json complete
file ..\data\Spotify\data\mpd.slice.688000-688999.json complete
file ..\data\Spotify\data\mpd.slice.689000-689999.json complete
file ..\data\Spotify\data\mpd.slice.69000-69999.json complete
file ..\data\Spotify\data\mpd.slice.690000-690999.json complete
file ..\data\Spotify\data\mpd.slice.691000-691999.json complete
file ..\data\Spotify\data\mpd.slice.692000-692999.json complete
file ..\data\Spotify\data\mpd.slice.693000-6

file ..\data\Spotify\data\mpd.slice.796000-796999.json complete
file ..\data\Spotify\data\mpd.slice.797000-797999.json complete
file ..\data\Spotify\data\mpd.slice.798000-798999.json complete
file ..\data\Spotify\data\mpd.slice.799000-799999.json complete
file ..\data\Spotify\data\mpd.slice.8000-8999.json complete
file ..\data\Spotify\data\mpd.slice.80000-80999.json complete
file ..\data\Spotify\data\mpd.slice.800000-800999.json complete
file ..\data\Spotify\data\mpd.slice.801000-801999.json complete
file ..\data\Spotify\data\mpd.slice.802000-802999.json complete
file ..\data\Spotify\data\mpd.slice.803000-803999.json complete
file ..\data\Spotify\data\mpd.slice.804000-804999.json complete
file ..\data\Spotify\data\mpd.slice.805000-805999.json complete
file ..\data\Spotify\data\mpd.slice.806000-806999.json complete
file ..\data\Spotify\data\mpd.slice.807000-807999.json complete
file ..\data\Spotify\data\mpd.slice.808000-808999.json complete
file ..\data\Spotify\data\mpd.slice.809000-809

file ..\data\Spotify\data\mpd.slice.911000-911999.json complete
file ..\data\Spotify\data\mpd.slice.912000-912999.json complete
file ..\data\Spotify\data\mpd.slice.913000-913999.json complete
file ..\data\Spotify\data\mpd.slice.914000-914999.json complete
file ..\data\Spotify\data\mpd.slice.915000-915999.json complete
file ..\data\Spotify\data\mpd.slice.916000-916999.json complete
file ..\data\Spotify\data\mpd.slice.917000-917999.json complete
file ..\data\Spotify\data\mpd.slice.918000-918999.json complete
file ..\data\Spotify\data\mpd.slice.919000-919999.json complete
file ..\data\Spotify\data\mpd.slice.92000-92999.json complete
file ..\data\Spotify\data\mpd.slice.920000-920999.json complete
file ..\data\Spotify\data\mpd.slice.921000-921999.json complete
file ..\data\Spotify\data\mpd.slice.922000-922999.json complete
file ..\data\Spotify\data\mpd.slice.923000-923999.json complete
file ..\data\Spotify\data\mpd.slice.924000-924999.json complete
file ..\data\Spotify\data\mpd.slice.925000

In [97]:
# hold = []
# for file in file_list: 
#     with open(file) as data_file:
#         d = json.load(data_file)
#         playlists = pd.json_normalize(d['playlists'])
#         playlists = playlists[playlists.pid.isin(pid_list)]
#         tracks = pd.json_normalize(d, record_path=['playlists','tracks'],meta=[['playlists','pid']])
#         tracks = tracks[tracks['playlists.pid'].isin(pid_list)]
#         tracks['artist_name'] = tracks['artist_name'].str.lower().str.split(',').str[0]
#         tracks['artist_name'] = tracks['artist_name'].str.lower().str.split(" \(").str[0]
#         tracks['track_name'] = tracks['track_name'].str.lower().str.split(" \(").str[0]
#         tracks['track_name'] = tracks['track_name'].str.lower().str.split(' -').str[0]
#         tracks['key'] = tracks['artist_name'] + '_' + tracks['track_name'].str.lower()
#         tracks = tracks[tracks['key'].isin(billboard_key)]
#         df = tracks.merge(playlists, how='inner', left_on='playlists.pid', right_on='pid')
#         df = df.drop(columns=['tracks','description','playlists.pid'])
#         df.modified_at = pd.to_datetime(df.modified_at, unit = 's')
#         hold.append(df)
#         print(f'file {data_file} complete')
        
# top_song_playlists = pd.concat(hold)

In [131]:
#check to see if all #1 songs are present in the result dataframe
spotify_tracks = top_song_playlists.track_name.unique().tolist()
len(spotify_tracks)

90

In [132]:
tracks.key.unique()

array(['fun._we are young', 'kendrick lamar_humble.',
       'meghan trainor_all about that bass', 'mark ronson_uptown funk',
       "macklemore & ryan lewis_can't hold us",
       "justin timberlake_can't stop the feeling!",
       'macklemore & ryan lewis_thrift shop', 'drake_one dance',
       'omi_cheerleader', 'adele_hello', 'justin bieber_love yourself',
       'desiigner_panda', 'zayn_pillowtalk', 'wiz khalifa_see you again',
       'justin bieber_sorry', 'magic!_rude', 'john legend_all of me',
       'migos_bad and boujee', 'rihanna_we found love',
       'the weeknd_starboy', 'katy perry_roar',
       'justin bieber_what do you mean?', 'luis fonsi_despacito',
       'pitbull_timber', "the weeknd_can't feel my face",
       'iggy azalea_fancy', 'katy perry_firework',
       'miley cyrus_wrecking ball', 'cardi b_bodak yellow', 'usher_omg',
       'rihanna_diamonds', "rihanna_what's my name?", 'rihanna_only girl',
       'rihanna_s&m', 'ed sheeran_shape of you', 'kesha_tik tok',


In [118]:
spotify_tracks

['omg',
 'raise your glass',
 'give me everything',
 'party rock anthem',
 'we found love',
 'whistle',
 "can't hold us",
 'timber',
 'fancy',
 'tik tok',
 'happy',
 'one dance',
 'moves like jagger',
 'shape of you',
 'cheap thrills',
 "i'm the one",
 "what's my name?",
 'only girl',
 'see you again',
 'all about that bass',
 "can't feel my face",
 'the hills',
 'all of me',
 'just the way you are',
 'locked out of heaven',
 'grenade',
 'when i was your man',
 "that's what i like",
 'we are young',
 "can't stop the feeling!",
 'dark horse',
 'california gurls',
 'like a g6',
 'born this way',
 'e.t.',
 'closer',
 'teenage dream',
 'rolling in the deep',
 'just give me a reason',
 'somebody that i used to know',
 'one more night',
 'blurred lines',
 'love yourself',
 'roar',
 'sorry',
 'love the way you lie',
 'not afraid',
 'humble.',
 'starboy',
 'despacito',
 'black beatles',
 'bad and boujee',
 'what do you mean?',
 'diamonds',
 'royals',
 'someone like you',
 'last friday night',


In [133]:
#check songs present in the billboard top songs list that aren't present in the spotify result set
for x in title_list:
    if x.lower() in spotify_tracks:
        pass
    else:
        print(f'{x} not found')

In [134]:
#print results
top_song_playlists.to_csv('../data/Spotify/top_song_playlists_tracks.csv')

### Analysis

In [135]:
#read in spotify results csv:
spotify = pd.read_csv('../data/Spotify/top_song_playlists_tracks.csv')

In [136]:
#convert all billboard data to compatible format
billboard_data = billboard
billboard_data[['base_artist','expand']] = billboard_data['artist'].str.split(' Featuring ', expand=True)
#billboard_data[['base_artist','expand']] = billboard_data['artist'].str.split(' &', expand=True)
billboard_data['base_artist'] = billboard_data['base_artist'].str.lower()
billboard_data['base_artist'] = billboard_data['base_artist'].replace({'ke$ha':'kesha', 'far*east movement':'far east movement','luis fonsi & daddy yankee':'luis fonsi'})
billboard_data['title'] =billboard_data['title'].replace('Uptown Funk!','uptown funk')
billboard_data['title'] =billboard_data['title'].str.lower()
billboard_data['key'] = billboard_data['base_artist'] + '_' + billboard_data['title'].str.lower().str.split(" \(").str[0]

billboard_data = billboard_data[billboard_data.key.isin(billboard_key)]

In [137]:
# Of x playlists with 3 or fewer edits, how many had a Billboard Charting song on it?
print('Total playlist count, original dataset: 1000000')
print('Total playlist count, three or fewer edits: ' + str(len(pid_list)))
print('Total playlists with one or more Billboard No. 1 song(s): ' + str(spotify.pid.nunique()))

Total playlist count, original dataset: 1000000
Total playlist count, three or fewer edits: 174083
Total playlists with one or more Billboard No. 1 song(s): 43751


In [138]:
spotify.columns

Index(['Unnamed: 0', 'pos', 'artist_name', 'track_uri', 'artist_uri',
       'track_name', 'album_uri', 'duration_ms_x', 'album_name', 'key', 'name',
       'collaborative', 'pid', 'modified_at', 'num_tracks', 'num_albums',
       'num_followers', 'num_edits', 'duration_ms_y', 'num_artists'],
      dtype='object')

In [139]:
billboard_data.columns

Index(['week_of', 'rank_current_week', 'title', 'artist', 'rank_prior_week',
       'peak_pos', 'weeks_on_chart', 'artist_title', 'base_artist', 'expand',
       'key'],
      dtype='object')

In [163]:
# #Which artists had the most number one songs on the billboard charts?
# billboard_data.groupby(['base_artist'],group_keys=True)[['title']].nunique().sort_values('title', ascending=False) #.head(20)

In [160]:
# #Which artists had the most #1 songs in playlists?
# spotify.groupby(['artist_name'],group_keys=True)[['track_name']].nunique().sort_values('track_name', ascending=False)

In [162]:
#Which songs stayed #1 longest?
weeks_on_chart_rank = billboard_data.groupby(['base_artist', 'title', 'key'],group_keys=True)[['week_of']].nunique().sort_values('week_of', ascending=False).reset_index()
weeks_on_chart_rank

Unnamed: 0,base_artist,title,key,week_of
0,lmfao,party rock anthem,lmfao_party rock anthem,68
1,adele,rolling in the deep,adele_rolling in the deep,65
2,gotye,somebody that i used to know,gotye_somebody that i used to know,59
3,john legend,all of me,john legend_all of me,59
4,katy perry,dark horse,katy perry_dark horse,57
...,...,...,...,...
85,baauer,harlem shake,baauer_harlem shake,20
86,taylor swift,look what you made me do,taylor swift_look what you made me do,17
87,britney spears,hold it against me,britney spears_hold it against me,17
88,ed sheeran,perfect,ed sheeran_perfect,16


In [144]:
#Which #1 songs were most present in playlists? 
s_track_count = spotify.groupby(['artist_name', 'track_name'],group_keys=True)[['pid']].count().sort_values('pid', ascending=False).reset_index()
s_track_count
#Are there any #1 songs that were not on any playlists?
#No - our count of 90 unique titles shows this.

Unnamed: 0,artist_name,track_name,pid
0,drake,one dance,3789
1,the chainsmokers,closer,3219
2,kendrick lamar,humble.,2997
3,mark ronson,uptown funk,2732
4,luis fonsi,despacito,2724
...,...,...,...
85,katy perry,part of me,286
86,taylor swift,blank space,252
87,baauer,harlem shake,230
88,taylor swift,bad blood,202


In [145]:
#10 Top Performing Billboard songs
b_top10 = weeks_on_chart_rank[['base_artist','title','key']].head(10)

b_top10

Unnamed: 0,base_artist,title,key
0,lmfao,party rock anthem,lmfao_party rock anthem
1,adele,rolling in the deep,adele_rolling in the deep
2,gotye,somebody that i used to know,gotye_somebody that i used to know
3,john legend,all of me,john legend_all of me
4,katy perry,dark horse,katy perry_dark horse
5,mark ronson,uptown funk,mark ronson_uptown funk
6,justin timberlake,can't stop the feeling!,justin timberlake_can't stop the feeling!
7,sia,cheap thrills,sia_cheap thrills
8,the chainsmokers,closer,the chainsmokers_closer
9,wiz khalifa,see you again,wiz khalifa_see you again


In [149]:
#Do the top 10 performing billboard songs appear most on the user playlists?
#10 Top Performing (most playlisted) songs
s_top10 = spotify.groupby(['artist_name', 'track_name','key'],group_keys=True)[['pid']].count().sort_values('pid', ascending=False).reset_index().head(10)
s_top10

Unnamed: 0,artist_name,track_name,key,pid
0,drake,one dance,drake_one dance,3789
1,the chainsmokers,closer,the chainsmokers_closer,3219
2,kendrick lamar,humble.,kendrick lamar_humble.,2997
3,mark ronson,uptown funk,mark ronson_uptown funk,2732
4,luis fonsi,despacito,luis fonsi_despacito,2724
5,justin bieber,sorry,justin bieber_sorry,2693
6,the weeknd,the hills,the weeknd_the hills,2658
7,the weeknd,can't feel my face,the weeknd_can't feel my face,2602
8,ed sheeran,shape of you,ed sheeran_shape of you,2540
9,rihanna,work,rihanna_work,2458


In [178]:
b_toplist = b_top10.title.str.lower().tolist()
s_toplist = s_top10.track_name.str.lower().tolist()

b_keys = b_top10.key.str.lower().tolist()
s_keys = s_top10.key.str.lower().tolist()

In [179]:
b_keys

['lmfao_party rock anthem',
 'adele_rolling in the deep',
 'gotye_somebody that i used to know',
 'john legend_all of me',
 'katy perry_dark horse',
 'mark ronson_uptown funk',
 "justin timberlake_can't stop the feeling!",
 'sia_cheap thrills',
 'the chainsmokers_closer',
 'wiz khalifa_see you again']

In [180]:
#which songs were top performers on billboard and spotify?

def both(b_list, s_list):
    return [x for x in b_list if x in s_list]

both(b_toplist, s_toplist)

['uptown funk', 'closer']

In [181]:
#what are the titles of the top performing songs across billboard and spotify?

top_song_list = list(set(b_toplist + s_toplist))
top_song_keys = list(set(b_keys + s_keys))

In [182]:
len(top_song_list)

18

In [173]:
#billboard data export - all top performers
billboard_data[billboard_data.key.isin(top_song_keys)].to_csv('../viz exports/all_top_performers_billboard.csv', index=False)

In [174]:
#spotify data export - all top performers
spotify[spotify.key.isin(top_song_keys)].to_csv('../viz exports/all_top_performers_spotify.csv', index=False)

In [175]:
#billboard data export - top 10 on billboard by weeks at number 1
billboard_data[billboard_data.key.isin(b_keys)].to_csv('../viz exports/billboard_top_performers.csv', index=False)

In [176]:
#spotify data export - top 10 on spotify by playlist count
spotify[spotify.key.isin(s_keys)].to_csv('../viz exports/spotify_top_perfomrers.csv', index=False)

In [157]:
#billboard data read in - all top perfomers
topspotify = pd.read_csv('../viz exports/all_top_performers_spotify.csv')

In [155]:
#billboard data read in - all top perfomers
topbillboard = pd.read_csv('../viz exports/all_top_performers_billboard.csv')

In [None]:
# For the 10 number one songs that have the most playlist adds, what was the playlist activity in relation to chart activity?

In [None]:
# For the top 10 charting songs on billboard, what was the playlist activity in relation to chart activity?