In [1]:
import urllib
import requests
import pandas as pd
import json
import base64

# Analyzing the Impact of Release Date on Streaming Success
## Chris Grant

The hypothesis I will be testing is that albums released during the summer months experience a higher average number of streams compared to tracks released in other seasons. My theoretical application of this hypothesis would be that people are typically out doing more activities during the summer. People are out of school and have a lot more free time. Additionally, pleasant weather can lead to things like parties, where you'd be playing music. The statistical application of my hypothesis would be that artists and labels strategically pick what time of the year they release an album. I theorize that at times labels may want their artist to release an album during an off-season. This would be in effort to have less competition at the time the album releases because other labels may be thinking similarly.

The two endpoints I am testing my hypothesis are Drake's album "Her Loss" and Kendrick Lamar's "Mr. Morale & The Big Steppers". I am chosing these two to test my hypothesis because both albums released in 2022. "Mr. Morale & The Big Steppers" released on May 13th, 2022, and "Her Loss" released on November 4th, 2022. I also chose these two artists because they are from the same genre and they have been rather equal in popularity for years. This should strengthen my hypothesis to make it more reliable. It could be unreliable because of bias. Some people viewing the data may prefer one artist over an other or view their success a different way. I do not believe there will be any limitations testing this hypothesis.

In [2]:
def get_session_token(SessionID, SessionKey):
    url = 'https://accounts.spotify.com/api/token'
    data = {'grant_type':'client_credentials'}
    encoded_key = base64.b64encode(str(SessionID + ":" + SessionKey).\
                                   encode("ascii"))
    header = {'Authorization': 'Basic {}'.format(encoded_key.decode("ascii"))}
    response = requests.post(url,
                            data = data,
                            headers = header)
    print(response.status_code)
    return response.json()['access_token']

In [3]:
keys = pd.read_csv("client.txt")

In [4]:
access_token = get_session_token(keys['Client_ID'].iloc[0], keys['Client_Secret'].iloc[0])

200


In [5]:
ab_tracks_ep = 'https://api.spotify.com/v1/albums/{}/tracks'
tracks_ep = 'https://api.spotify.com/v1/tracks'

In [6]:
def api_call(endpoint_url, api_header): 
    response = requests.get(endpoint_url, headers = api_header)
    print(response.status_code)
    return response.json()

In [7]:
session_header = {'Authorization': 'Bearer {}'.format(access_token)}
drake_album_id = '5MS3MvWHJ3lOZPLiMxzOU6'
kendrick_album_id = '79ONNoS4M9tfIA1mYLBYVX'  

In [8]:
drake_album_response = api_call(ab_tracks_ep.format(drake_album_id) , session_header)
kendrick_album_response = api_call(ab_tracks_ep.format(kendrick_album_id) , session_header)

200
200


In [9]:
drake_album_df = pd.DataFrame(drake_album_response['items'])
kendrick_album_df = pd.DataFrame(kendrick_album_response['items'])

In [10]:
drake_track_ids= ','.join(drake_album_df['id'].to_list())
kendrick_track_ids= ','.join(kendrick_album_df['id'].to_list())

In [11]:
drake_track_info = api_call(tracks_ep + '?market=US&ids={}'.format(drake_track_ids), session_header)
kendrick_track_info = api_call(tracks_ep + '?market=US&ids={}'.format(kendrick_track_ids), session_header)

200
200


#
The chart below shows information on Drake's album "Her Loss". Specifically, I chose to focus on the names of the tracks on the album, the href or link to the track, how popular it is, and the artists who perform on the song. I decided to sort the chart so you would see the most popular track first in order to truly test my hypothesis. In this case, the first song on the album, Rich Flex, is the most popular song from the album. Drake's album released in November, 2022.

#

In [30]:
drake_tracks_df = pd.DataFrame(drake_track_info['tracks'])
drake_data = drake_tracks_df[["name", "href", "popularity"]]
drakes_data_df = drake_tracks_df[["name", "href", "popularity"]].sort_values(by=["popularity"], ascending=[False])
drakes_data_df

Unnamed: 0,name,href,popularity
0,Rich Flex,https://api.spotify.com/v1/tracks/1bDbXMyjaUIo...,75
5,Spin Bout U,https://api.spotify.com/v1/tracks/2ZL7WZcjuYKi...,74
6,Hours In Silence,https://api.spotify.com/v1/tracks/0sSRLXxknVTQ...,70
9,Pussy & Millions (feat. Travis Scott),https://api.spotify.com/v1/tracks/2KLwPaRDOB87...,70
2,On BS,https://api.spotify.com/v1/tracks/34tz0eDhGuFE...,69
4,Privileged Rappers,https://api.spotify.com/v1/tracks/7l2nxyx7IkBX...,68
8,Circo Loco,https://api.spotify.com/v1/tracks/7GeTsDIc5ykN...,67
1,Major Distribution,https://api.spotify.com/v1/tracks/46s57QULU02V...,65
12,Jumbotron Shit Poppin,https://api.spotify.com/v1/tracks/2ZlABhxMMa43...,65
10,Broke Boys,https://api.spotify.com/v1/tracks/45pUWUeEWGoS...,64


#
Below is Kendrick's chart to further push my hypothesis. Kendrick's album released in May, 2022.

#

In [27]:
kendrick_tracks_df = pd.DataFrame(kendrick_track_info['tracks'])
kendrick_data = kendrick_tracks_df[["name", "href", "popularity"]]
kendricks_data_df = kendrick_tracks_df[["name", "href", "popularity"]].sort_values(by=["popularity"], ascending=[False])
kendricks_data_df

Unnamed: 0,name,href,popularity
1,N95,https://api.spotify.com/v1/tracks/0fX4oNGBWO3d...,76
9,Count Me Out,https://api.spotify.com/v1/tracks/6BU1RZexmvJc...,75
6,Rich Spirit,https://api.spotify.com/v1/tracks/1QPreu0BNOrU...,74
0,United In Grief,https://api.spotify.com/v1/tracks/5Gt9bxniM1Sx...,73
3,Die Hard,https://api.spotify.com/v1/tracks/2g6tReTlM2Ak...,72
4,Father Time (feat. Sampha),https://api.spotify.com/v1/tracks/28qA8y1sz0FT...,69
11,Silent Hill,https://api.spotify.com/v1/tracks/3lzUeaCbcCDB...,66
13,Savior,https://api.spotify.com/v1/tracks/3drdWsJKiVCS...,64
8,Purple Hearts,https://api.spotify.com/v1/tracks/1REVvAphiSTJ...,63
7,We Cry Together,https://api.spotify.com/v1/tracks/67XC51nlZncN...,62


#
After wrangling all of the data and trying to get it into a digestible form, I ran into some issues. I would have liked to add on another column, artists. I was having issues using iloc and trying to get the specific artists names with the track titles. This would have been relevant if there was a feature on any of the tracks because that could have added to a songs streaming success. Another issue I have is that just viewing the two charts I created alone is not enough to determine what season would be better to release an album. I feel I needed more information the Spotify API itself could not provide.

To further support my hypothesis, I figured I would find the overall average popularity for both albums by adding up each individual tracks popularity, dividing that number and finally subtracting that from 100 to give it a final popularity score.

Kendrick's album has 16 tracks for a total of 1223 points. Kendrick's final popularity score is 76.
Drake's album has 15 total tracks for a total of 1046 points. Drake's final popularity score is 70, rounded up.

To conclude my report, I would have done some things differently and incorporated different methods to wrangle or tidy my data. Despite how I feel about it, my hypothesis is somewhat supported by the data I did find. Kendrick's album is more popular than Drake's, but this only supports my hypothesis somewhat because of a few reasons. Kendrick's album has been out longer than Drake's, with Kendrick's album releasing in May of 2022 and Drake's releasing that November. There has been more time for people to be able to stream Kendrick's music. Additionally, there was a "beef" between Kendrick and Drake earlier this year. Kendrick seemingly won the feud, so his streaming numbers are likely inflated because of that recent victory. My hypothesis is supported by the data I collected from the Spotify API, with some discrepancies.