ANALYSIS </br>
Bilboard Exploration: </br>
1. How many songs reached the number 1 position during the sample period </br>
   Will looking at only #1 songs be useful for analysis, or should we look at songs that entered the top 5/10? </br>
2. How many weeks did each of those songs appear on the charts? </br>

Spoitfy Exploration </br>
1. Of 1,000,000 playlists, how many were updated three or fewer times? </br>

Preparation: </br>
1. Artist Name, Track Title of all #1 songs </br>
2. Match Artist Names and Track Titles to Spotify IDs </br>
3. Iterate over Spotify data to identify playlists that have one or more #1 tracks in the playlist </br>
4. Isolate relevant song lines and write to a new Spotify dataframe/file </br>
5. Of x playlists with 3 or fewer edits, how many had a Billboard Charting song on it? </br>

In [67]:
import json
import pandas as pd
import os
import numpy as np

### Billboard Exploration

In [9]:
# How many songs reached the number 1 position during the sample period
# Will looking at only #1 songs be useful for analysis, or should we look at songs that entered the top 5/10? all?

billboard = pd.read_csv('../data/Billboard/billboard_chart_data.csv')

In [12]:
billboard.columns

Index(['Unnamed: 0', 'week_of', 'rank_current_week', 'title', 'artist',
       'rank_prior_week', 'peak_pos', 'weeks_on_chart'],
      dtype='object')

In [60]:
billboard.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36500 entries, 0 to 36499
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   Unnamed: 0         36500 non-null  int64 
 1   week_of            36500 non-null  object
 2   rank_current_week  36500 non-null  int64 
 3   title              36500 non-null  object
 4   artist             36500 non-null  object
 5   rank_prior_week    36500 non-null  object
 6   peak_pos           36500 non-null  int64 
 7   weeks_on_chart     36500 non-null  int64 
 8   artist_title       36500 non-null  object
dtypes: int64(4), object(5)
memory usage: 2.5+ MB


In [17]:
billboard['artist_title'] = billboard.artist+ '-' + billboard.title

In [20]:
# total number of songs that entered the billboard top 100 during the sample period
billboard.artist_title.nunique()

3003

In [21]:
# all songs that reached the number 1 position during the sample period
no1 = billboard[billboard.peak_pos == 1]
no1.artist_title.nunique()

94

In [23]:
# all songs that reached at least the number 5 position during the sample period
top5 = billboard[billboard.peak_pos <= 5]
top5.artist_title.nunique()

236

In [24]:
# all songs that reached at least the number 10 position during the sample period
top10 = billboard[billboard.peak_pos <= 10]
top10.artist_title.nunique()

394

In [49]:
#How many weeks did each of those songs appear on the charts?
week_counts = pd.DataFrame(billboard.groupby('artist_title')['artist_title'].value_counts())
week_counts.sort_values('count', ascending=False).head(10)

Unnamed: 0_level_0,count
artist_title,Unnamed: 1_level_1
Imagine Dragons-Radioactive,87
AWOLNATION-Sail,79
OneRepublic-Counting Stars,68
LMFAO Featuring Lauren Bennett & GoonRock-Party Rock Anthem,68
Adele-Rolling In The Deep,65
The Lumineers-Ho Hey,62
Imagine Dragons-Demons,61
Gotye Featuring Kimbra-Somebody That I Used To Know,59
John Legend-All Of Me,59
Ed Sheeran-Thinking Out Loud,58


### Spotify Exploration

In [74]:
#Of 1,000,000 playlists, how many were updated three or fewer times? 

In [75]:
#file names - create list of flie strings to leverage in loops
file_list = []                                                    #empty list for strings to land
for file in os.listdir('..\data\Spotify\data'):                   #for loop to locate each file in source folder
    file_name = '..\\data\\Spotify\\data\\' + os.fsdecode(file)   #create a file string name to be read in
    file_list.append(file_name)                                   #add name to list
#file_list                                                         #print resulting list

In [76]:
# get playlist IDs where the playlists were updated three or fewer times
pid_list = []
for file in file_list: #for loop to iterate through files
    with open(file) as data_file:
        d = json.load(data_file)
        playlists = pd.json_normalize(d['playlists'])
        edit_reqs = playlists[playlists.num_edits <= 3]
        pid_list.append(edit_reqs.pid.unique())

In [77]:
#list of pid arrays to single list of pids
pid_list = np.concatenate(pid_list).ravel().tolist()


In [80]:
#count of playlists with three or fewer edits
len(pid_list)

174083

In [86]:
#percent of playlists from Spotify's million dataset
round(len(pid_list) / 1000000 * 100,2)

17.41

### Preparation

In [142]:
# Artist Name, Track Title of all #1 songs
no1_df = no1[['artist','title']].drop_duplicates().reset_index(drop=True)
no1_df

no1_artist_list = no1_df.artist.tolist()
no1_track_list = no1_df.title.tolist()

In [143]:
no1_track_list

['TiK ToK',
 'Fireflies',
 'Empire State Of Mind',
 'Whatcha Say',
 'Down',
 '3',
 'I Gotta Feeling',
 'Imma Be',
 'Break Your Heart',
 'Rude Boy',
 "Nothin' On You",
 'OMG',
 'Not Afraid',
 'California Gurls',
 'Love The Way You Lie',
 'Teenage Dream',
 'Just The Way You Are',
 'Like A G6',
 'We R Who We R',
 "What's My Name?",
 'Only Girl (In The World)',
 'Raise Your Glass',
 'Firework',
 'Grenade',
 'Hold It Against Me',
 'Black And Yellow',
 'Born This Way',
 'E.T.',
 'S&M',
 'Rolling In The Deep',
 'Give Me Everything',
 'Party Rock Anthem',
 'Last Friday Night (T.G.I.F.)',
 'Moves Like Jagger',
 'Someone Like You',
 'We Found Love',
 'Sexy And I Know It',
 'Set Fire To The Rain',
 "Stronger (What Doesn't Kill You)",
 'I Will Always Love You',
 'I Wanna Dance With Somebody (Who Loves Me)',
 'Greatest Love Of All',
 'Part Of Me',
 'How Will I Know',
 'We Are Young',
 'Somebody That I Used To Know',
 'Call Me Maybe',
 'Whistle',
 'We Are Never Ever Getting Back Together',
 'One Mor

In [144]:
top_song_playlists = pd.DataFrame()

In [151]:
# Match Billboard Track Titles to Spotify IDs

for file in file_list: #for loop to iterate through files
    with open(file) as data_file:
        d = json.load(data_file)
        playlists = pd.json_normalize(d['playlists'])
        while playlists.pid in pid_list:
            tracks = pd.json_normalize(d, record_path=['playlists','tracks'],meta=[['playlists','pid']])
            while tracks.title.lower() in no1_track_list.lower():
                playlists = playlists.explode('tracks')
                df = tracks.merge(playlists, how='right', left_on='playlists.pid', right_on='pid').reset_index()
                df = df.drop(columns=['tracks','playlists.pid','description'])
                top_song_playlists = pd.concat([top_song_playlists, df], index=False)
            else:
                pass
        else:
            pass

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [3]:
# Iterate over Spotify data to identify playlists that have one or more #1 tracks in the playlist

In [4]:
# Isolate relevant song lines and write to a new Spotify dataframe/file

In [5]:
# Of x playlists with 3 or fewer edits, how many had a Billboard Charting song on it?