# Purpose

As a newer musical artist, I spend a large amount of time writing lyrics and I want to explore data from my favorite artists to see what makes their music so appealing to me so I can take their insights and use them in my own creations.  One of the core hypotheses I've held is that good music successfully evokes emotion.  While I can speak on a personal level about this, I wanted to see if I could explore the topic from a data perspective as well to see if there are any additional insights they could bring.

The point of this project is to compile lyrics and feature information with songs from an artist I'm drawing inspriation from at the moment (Stand Atlantic) and then analyze that information to see what commonalities there may be between records.

In [30]:
#Start with importing all relevant modules
#Modules to work with the data and clean it
import pandas as pd
import string

#Modules to get Genius data (Artist Name, Song Name, and Lyrics)
import lyricsgenius as genius

#Modules to get Spotify data using the API (Album Name, Song Features)
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import spotipy.util as util

In [31]:
#Setting environment variables for the Genius and Spotify APIs
os.environ["GENIUS_ACCESS_TOKEN"] = "ASmxe_sHdsWD_y9U8QtHC2aCq1W-SFQnRXhIxEIcwraNMZoUoTFgP-IbwycgEKHn"
os.environ["SPOTIPY_CLIENT_ID"] = "1a1a46f4308b4d0a9e0175ca84f2155e"
os.environ["SPOTIPY_CLIENT_SECRET"] = "c2f80015c6b74173922c8bdb0a784596"

## Step 1: Gather Data

This project is starting by pulling the needed information given the artist.  The artist I've selected for this analysis is Stand Atlantic because I listen to them on a very frequent basis (at least daily) and have the most exposure to them at this current point in time.

In [32]:
#To start the analysis, lets define an artists discography I'd like to examine
artist = "Stand Atlantic"

#Set up the Genius API with the access token 
genius_api = genius.Genius(os.getenv("GENIUS_ACCESS_TOKEN"))

#Create lists for each column for the dataframe
list_artist = []
list_title = []
list_lyrics = []

#Note: I limited to 29 songs because some of the lesser known songs are not in Spotify 
# and that created issues joining the datasets
genius_data = genius_api.search_artist(artist, max_songs = 29)
songs = genius_data.songs
for song in songs:
    list_lyrics.append(song.lyrics)
    list_title.append(song.title)
    list_artist.append(song.artist)

df = pd.DataFrame({'artist':list_artist,'title':list_title,'lyric':list_lyrics})

Searching for songs by Stand Atlantic...

Song 1: "Lavender Bones"
Song 2: "Skinny Dipping"
Song 3: "Coffee At Midnight"
Song 4: "Lost My Cool"
Song 5: "Jurassic Park"
Song 6: "Toothpick"
Song 7: "Hate Me (Sometimes)"
Song 8: "Sidewinder"
Song 9: "Chemicals"
Song 10: "Shh!"
Song 11: "Drink To Drown"
Song 12: "Blurry"
Song 13: "Wavelength"
Song 14: "Eviligo"
Song 15: "Bullfrog"
Song 16: "Your Graduation"
Song 17: "Cigarette Kiss"
Song 18: "Like That"
Song 19: "Push"
Song 20: "‎deathwish"
Song 21: "Burn in the Afterthought"
Song 22: "Mess I Made"
Song 23: "DWYW"
Song 24: "Clay"
Song 25: "Speak Slow"
Song 26: "Roses"
Song 27: "Soap"
Song 28: "MakeDamnSure"
Song 29: "Silk & Satin"

Reached user-specified song limit (29).
Done. Found 29 songs.


In [33]:
#Now that I have my basic dataframe, I'm going to get additional information from Spotify's API using spotipy
spotify = spotipy.Spotify(client_credentials_manager=SpotifyClientCredentials())

#Use a for loop to loop through the songs in my dataframe and get the album name and song id 
# which will be used to get the features
list_album = []
list_song_id = []

for song in df["title"]:
    spotify_song_info = spotify.search(q='artist:' + artist + ' track:' + song, type='track')
    list_album.append(spotify_song_info["tracks"]["items"][0]["album"]["name"])
    list_song_id.append(spotify_song_info["tracks"]["items"][0]["id"])

df['album'] = list_album
df["song_id"] = list_song_id

#Getting blank lists for all the audio features I want
list_danceability = []
list_energy = []
list_key = []
list_loudness = []
list_mode = []
list_speechiness = []
list_acousticness = []
list_instrumentalness = []
list_liveness = []
list_valence = []
list_tempo = []
list_duration_ms = []
list_time_signature = []

#Using the song id to get the audio features and putting them in the dataframe
for spotify_id in df["song_id"]:
    spotify_song_features = spotify.audio_features(tracks = spotify_id)
    list_danceability.append(spotify_song_features[0]["danceability"])
    list_energy.append(spotify_song_features[0]["energy"])
    list_key.append(spotify_song_features[0]["key"])
    list_loudness.append(spotify_song_features[0]["loudness"])
    list_mode.append(spotify_song_features[0]["mode"])
    list_speechiness.append(spotify_song_features[0]["speechiness"])
    list_acousticness.append(spotify_song_features[0]["acousticness"])
    list_instrumentalness.append(spotify_song_features[0]["instrumentalness"])
    list_liveness.append(spotify_song_features[0]["liveness"])
    list_valence.append(spotify_song_features[0]["valence"])
    list_tempo.append(spotify_song_features[0]["tempo"])
    list_duration_ms.append(spotify_song_features[0]["duration_ms"])
    list_time_signature.append(spotify_song_features[0]["time_signature"])

df['danceability'] = list_danceability
df['energy'] = list_energy
df['key'] = list_key
df['loudness'] = list_loudness
df['mode'] = list_mode
df['speechiness'] = list_speechiness
df['acousticness'] = list_acousticness
df['instrumentalness'] = list_instrumentalness
df['liveness'] = list_liveness
df['valence'] = list_valence
df['tempo'] = list_tempo
df['duration_ms'] = list_duration_ms
df['time_signature'] = list_time_signature

df2 = df

## Step 2: Data Cleaning

The lyric data includes some non-lyric information (like markers for the portion of the song that it is currently in).  These will need to be removed from the analysis.  After the data cleaning, I will export the results as a csv to complete the rest of the analysis in R.

In [34]:
#Clean up the lyrics using some basic string manipulation as suggested by the example project
df["lyric"] = df["lyric"].str.lower() #make lower case
df["lyric"] = df["lyric"].str.replace(r"verse|[1|2|3]|pre-chorus|chorus|bridge|outro|instrumental|intro|guitar|solo","", regex = "True").str.replace("[","", regex = "True").str.replace("]","", regex = "True") #remove song part identifiers
df["lyric"] = df["lyric"].str.replace("\n"," ", regex = "True").str.replace(r"[^\w\d'\s]+","", regex = "True") #remove newline character and remove other odd characters like parenthesises
df["lyric"] = df["lyric"].str.strip() #clean up any odd spacing 

In [35]:
#Final step for the Python portion is to export the data to a csv file for analysis in R
df.to_csv('music_data.csv',index=False, encoding='utf-8-sig')