### Part 3: Data Analytics - Spotify

This project will attempt to examine how music has changed over time.

To achieve this, data will be obtained from the Spotify web API via the Spotipy Python library.

To obtain the data, Spotify Developer was connected to my spotify account, and a client ID created.

Using the obtained client ID and secret ID to authenticate credentials, data can be queried from the Spotify API's endpoints.

In [6]:
# Run the latest version of Spotipy

pip install spotipy --upgrade

SyntaxError: invalid syntax (<ipython-input-6-b7e99a4376cc>, line 3)

In [7]:
# Import needed libraries

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
#import time 
import pandas as pd

In [8]:
# Authenticate client credentials flow to allow access to data

SPOTIPY_CLIENT_ID = '6707ce712ade4311b1970b5676fb226e'
SPOTIPY_CLIENT_SECRET = '7f4a7afd967f4c39aa88d8a55dfa4a94'

client_credentials_manager = SpotifyClientCredentials(SPOTIPY_CLIENT_ID, SPOTIPY_CLIENT_SECRET)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

**Data Collection**

Having authenticated the credentials, the next step was to actually retrieve the data.

To do this, a function was created that uses the search endpoint of the Spotify API. This takes a query value to search for, which can take many forms such as artist or album name, song title, or year of release. Additionally a type can be specified for further granularity, as well as a limit on the number of search results, which has a default maximum of 50. The search endpoint returns various kinds of data on the query. As the primary aim of the project was to examine changes in music over time, the query was configured to return songs by the inputted year of release. 

The function uses a for loop and the offset argument to the search endpoint to work around the 50 result limit, returning 1000 results per query. The query results were then reduced to just the required variables, which are summarised as follows:


- Artist - The artist who released the song.
- Song - The name of the song.
- Popularity - The popularity of a song, given a value between 0-100 where 100 is the most popular.
- Release Date - The date of the first release of the album containing the song.
- Song Length - The song length in milliseconds.
- Explicit - Whether the song has explicit lyrics. False means it does not or it is unknown.

Finally, the function combines the required variables into a pandas data frame, sorting the results by popularity and reducing it to the top 50 songs.



In [9]:
def get_years_top_songs(years):
    
    artist = []
    track_name = []
    track_id = []
    release_date = []
    track_length = []
    explicit = []
    popularity = []
    
    for i in range(0, 1000, 50):
        year_tracks = sp.search(q='year:' + years, type='track', limit = 50, offset = i)
    
#         items = year_tracks['tracks']['items']
#         if len(items) > 0:
#            return items
#         else:
#            return None

        for track in year_tracks['tracks']['items']:
            #print(i,track)
            #print(track['artists'][0])
            artist.append(track['artists'][0]['name'])
            track_name.append(track['name'])
            track_id.append(track['id'])
            release_date.append(track['album']['release_date'])
            track_length.append(track['duration_ms'])
            explicit.append(track['explicit'])
            popularity.append(track['popularity'])

    year_df = pd.DataFrame({'Artist' : artist,
                            'Song' : track_name,
                            'track id' : track_id,
                            'Release Date' : release_date,
                            'Song Length (ms)' : track_length,
                            'Explicit' : explicit,
                            'Popularity' : popularity,
                            'Decade' : years})
    
    
    year_df_sorted = year_df.sort_values('Popularity', ascending = False).head(50)

   #return year_tracks
    
    return year_df_sorted

    
    

In [11]:
year_df1 = get_years_top_songs('1900-1909')

year_df2 = get_years_top_songs('1910-1919')

year_dfs = [year_df1, year_df2]

result = pd.concat(year_dfs)

#result


In [13]:
#result

In [18]:
decades = ["1900-1909",
           "1910-1919",
           "1920-1929",
           "1930-1939",
           "1940-1949",
           "1950-1959",
           "1960-1969",
           "1970-1979",
           "1980-1989",
           "1990-1999",
           "2000-2009",
           "2010-2019",
           "2020-2029"]

decades

['1900-1909',
 '1910-1919',
 '1920-1929',
 '1930-1939',
 '1940-1949',
 '1950-1959',
 '1960-1969',
 '1970-1979',
 '1980-1989',
 '1990-1999',
 '2000-2009',
 '2010-2019',
 '2020-2029']

Each individual decade is then brought together into the same data frame.

In [43]:
def final_df():
    
    final_df = []

    for decade in decades:
        
        top_50_decade_songs = get_years_top_songs(decade)
        
        final_df.append(top_50_decade_songs)
        
    final = pd.concat(final_df)
    
    return final
        
        
    

In [44]:
final1 = final_df()

final1

Unnamed: 0,Artist,Song,track id,Release Date,Song Length (ms),Explicit,Popularity,Decade
5,Frédéric Chopin,"Chopin: Piano Sonata No. 2 in B-Flat Minor, Op...",4tDa5P1so01pdVc5Ywl6Or,1900,597960,False,45,1900-1909
0,Peter Gabriel,Sledgehammer,3wLZ69kr5J2sb934Kpv02c,1900,295653,False,44,1900-1909
18,Henry Purcell,"Purcell / Arr. Pluhar: Oedipus, Z. 583: No. 2,...",2UK3kMSQc8fMTCjEykifQ7,1900,354333,False,44,1900-1909
2,Roy Brown,Mighty Mighty Man,5tBDBrsPypLVJ9Rbpy3MNm,1900-01-30,143230,False,43,1900-1909
124,Horkýže Slíže,A ja sprostá,1tRUTDC96dxQv8truEHmmM,1900,177960,True,43,1900-1909
...,...,...,...,...,...,...,...,...
274,Tiësto,Don't Be Shy,0bI7K9Becu2dtXK1Q3cZNB,2021-08-12,140500,True,91,2020-2029
48,Lil Nas X,MONTERO (Call Me By Your Name),1SC5rEoYDGUK4NfG82494W,2021-09-17,137704,True,91,2020-2029
71,Emmy Meli,I AM WOMAN,3nOz1U41SZZ0N3fuUWr9nb,2021-11-19,232813,False,90,2020-2029
585,Blessd,Medallo,6lX6l7OuA3qrnIRfdsr0dw,2021-10-27,233453,True,90,2020-2029


In [45]:
print('number of elements in the year data frame:', len(final1))

number of elements in the year data frame: 650


Need to explore the number of rows and columns, ranges of values etc.

Handle missing data, if any

Perform any additional steps (parsing dates, creating additional columns, etc)

In [46]:
DataFrame.info and DataFrame.describe
Column values: DataFrame.value_counts, DataFrame.unique 

SyntaxError: invalid syntax (<ipython-input-46-7009bb1e9013>, line 2)

In [47]:
final1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 650 entries, 5 to 127
Data columns (total 8 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Artist            650 non-null    object
 1   Song              650 non-null    object
 2   track id          650 non-null    object
 3   Release Date      650 non-null    object
 4   Song Length (ms)  650 non-null    int64 
 5   Explicit          650 non-null    bool  
 6   Popularity        650 non-null    int64 
 7   Decade            650 non-null    object
dtypes: bool(1), int64(2), object(5)
memory usage: 41.3+ KB


In [48]:
final1.describe()

Unnamed: 0,Song Length (ms),Popularity
count,650.0,650.0
mean,213202.2,66.18
std,80515.21,25.238277
min,49573.0,6.0
25%,169336.2,43.25
50%,196963.5,79.0
75%,240086.2,84.0
max,1239400.0,100.0


In [49]:
final1.value_counts()

Artist         Song                                         track id                Release Date  Song Length (ms)  Explicit  Popularity  Decade   
*NSYNC         Merry Christmas, Happy Holidays              4v9WbaxW8HdjqfUiWYWsII  1998-10-30    255306            False     80          1990-1999    1
Nat King Cole  The Christmas Song (Merry Christmas To You)  4PS1e8f2LvuTFgUs1Cn3ON  1962          192160            False     90          1960-1969    1
Marvin Gaye    Ain't No Mountain High Enough                7tqhbajSfrz2F7E1Z75ASX  1967-08-29    151666            False     83          1960-1969    1
Michael Bublé  Holly Jolly Christmas                        5PuKlCjfEVIXl0ZBp5ZW9g  2012-11-09    119786            False     90          2010-2019    1
               It's Beginning to Look a Lot like Christmas  5a1iz510sv2W9Dt1MvFd5R  2012-11-09    206639            False     94          2010-2019    1
                                                                                       

In [32]:
#final1.unique()

Song length in milliseconds makes little sense, would be better understood in minutes.

In [59]:
final1['Song Length (ms)'] = final1['Song Length (ms)']/1000

In [60]:
final1

Unnamed: 0,Artist,Song,track id,Release Date,Song Length (ms),Explicit,Popularity,Decade
5,Frédéric Chopin,"Chopin: Piano Sonata No. 2 in B-Flat Minor, Op...",4tDa5P1so01pdVc5Ywl6Or,1900,597.960,False,45,1900-1909
0,Peter Gabriel,Sledgehammer,3wLZ69kr5J2sb934Kpv02c,1900,295.653,False,44,1900-1909
18,Henry Purcell,"Purcell / Arr. Pluhar: Oedipus, Z. 583: No. 2,...",2UK3kMSQc8fMTCjEykifQ7,1900,354.333,False,44,1900-1909
2,Roy Brown,Mighty Mighty Man,5tBDBrsPypLVJ9Rbpy3MNm,1900-01-30,143.230,False,43,1900-1909
124,Horkýže Slíže,A ja sprostá,1tRUTDC96dxQv8truEHmmM,1900,177.960,True,43,1900-1909
...,...,...,...,...,...,...,...,...
274,Tiësto,Don't Be Shy,0bI7K9Becu2dtXK1Q3cZNB,2021-08-12,140.500,True,91,2020-2029
48,Lil Nas X,MONTERO (Call Me By Your Name),1SC5rEoYDGUK4NfG82494W,2021-09-17,137.704,True,91,2020-2029
71,Emmy Meli,I AM WOMAN,3nOz1U41SZZ0N3fuUWr9nb,2021-11-19,232.813,False,90,2020-2029
585,Blessd,Medallo,6lX6l7OuA3qrnIRfdsr0dw,2021-10-27,233.453,True,90,2020-2029


In [67]:
final1.rename(
    columns={
        'Song Length (ms)' : 'Song Length (s)'
    })

Unnamed: 0,Artist,Song,track id,Release Date,Song Length (s),Explicit,Popularity,Decade
5,Frédéric Chopin,"Chopin: Piano Sonata No. 2 in B-Flat Minor, Op...",4tDa5P1so01pdVc5Ywl6Or,1900,597.960,False,45,1900-1909
0,Peter Gabriel,Sledgehammer,3wLZ69kr5J2sb934Kpv02c,1900,295.653,False,44,1900-1909
18,Henry Purcell,"Purcell / Arr. Pluhar: Oedipus, Z. 583: No. 2,...",2UK3kMSQc8fMTCjEykifQ7,1900,354.333,False,44,1900-1909
2,Roy Brown,Mighty Mighty Man,5tBDBrsPypLVJ9Rbpy3MNm,1900-01-30,143.230,False,43,1900-1909
124,Horkýže Slíže,A ja sprostá,1tRUTDC96dxQv8truEHmmM,1900,177.960,True,43,1900-1909
...,...,...,...,...,...,...,...,...
274,Tiësto,Don't Be Shy,0bI7K9Becu2dtXK1Q3cZNB,2021-08-12,140.500,True,91,2020-2029
48,Lil Nas X,MONTERO (Call Me By Your Name),1SC5rEoYDGUK4NfG82494W,2021-09-17,137.704,True,91,2020-2029
71,Emmy Meli,I AM WOMAN,3nOz1U41SZZ0N3fuUWr9nb,2021-11-19,232.813,False,90,2020-2029
585,Blessd,Medallo,6lX6l7OuA3qrnIRfdsr0dw,2021-10-27,233.453,True,90,2020-2029


In [72]:
final1['Popularity'].describe() 

count    650.000000
mean      66.180000
std       25.238277
min        6.000000
25%       43.250000
50%       79.000000
75%       84.000000
max      100.000000
Name: Popularity, dtype: float64

In [33]:
# artist = []
# track_name = []
# track_id = []
# release_date = []
# popularity = []
# #for i in range(0, 1000, 50):
# year_tracks = sp.search(q='year:' + '2018', type='track', limit = 1)
    
# year_tracks['tracks']['items']
