#Music Dataset

This Notebook contains code used to obtain a music mood dataset from AllMusic.com. All of the data obtained using this code is only used for the CPEN 291 final project.

Preliminary setup, install and import selenium and other relevant libraries for scraping

In [None]:
!apt update
!apt install chromium-chromedriver
!pip install selenium

[33m0% [Working][0m            Get:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease [3,626 B]
[33m0% [Connecting to archive.ubuntu.com] [Connecting to security.ubuntu.com] [1 In[0m[33m0% [Connecting to archive.ubuntu.com] [Connecting to security.ubuntu.com] [Conn[0m                                                                               Ign:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
[33m0% [Connecting to archive.ubuntu.com] [Connecting to security.ubuntu.com] [Conn[0m[33m0% [1 InRelease gpgv 3,626 B] [Connecting to archive.ubuntu.com] [Connecting to[0m                                                                               Ign:3 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
[33m0% [1 InRelease gpgv 3,626 B] [Connecting to archive.ubuntu.com] [Connecting to[0m                                                                          

In [None]:
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import time

Setup selenium and Chrome drivers to obtain tables, then open Chrome

In [None]:
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')

In [None]:
driver = webdriver.Chrome(options=options)

Function below is used to obtain tables from the website and also adds extra columns to fit our specifications

In [None]:
def getTable (url, str, orig_str):
  driver.get(url)                                                               # go to a website, and get website from url
  html = driver.page_source
  df = pd.read_html(html)                                                       # obtain table from website
  dataframe = df[0]                            
  sentiment = []
  orig_sentiment = []
  for i in range(len(dataframe.index)):                                         # add sentiment columns
    sentiment.append(str)
  for i in range(len(dataframe.index)):
    orig_sentiment.append(orig_str)
  dataframe['Orig Sentiment'] = orig_sentiment                                  # add new rows containing sentiments
  dataframe['Sentiment'] = sentiment 
  return dataframe

Use the getTable function above to obtain the data needed from the website 'allmusic.com'. Since the data is stored on various parts of the website, search all related links and get the dataframes and store each of them into a single array

In [None]:
# different links with data we need
categories = ["happy-xa0000001016", "joyous-xa0000001029", "cheerful-xa0000000704",
              "energetic-xa0000000990", "celebratory-xa0000000703", "fun-xa0000001006", 
              "calm-peaceful-xa0000000701", "soft-quiet-xa0000001097", "relaxed-xa0000000755", 
              "sad-xa0000000761", "gloomy-xa0000000714", "somber-xa0000001098", 
              "sexual-xa0000001091", "sensual-xa0000000764", "romantic-xa0000000758",
              "angry-xa0000000695", "aggressive-xa0000000694", "rebellious-xa0000001075"]

# labels given to the music by 'allmusic'
orig_sentiments = ["happy", "joyous", "cheerful", 
                   "energetic", "celebratory", "fun", 
                   "calm-peaceful", "soft-quiet", "relaxed", 
                   "sad", "gloomy", "somber",
                   "sexual", "sensual", "romantic",
                   "angry", "aggressive", "rebellious"]

# sentiments we need to match out ML model ouput
sentiments = ["happy", "happy", "happy", 
              "surprise", "surprise", "surprise", 
              "fear", "fear", "fear", 
              "sad", "sad", "sad", 
              "love", "love", "love",
              "anger", "anger", "anger"]

# website but {} is replaced by categories seen above
url_link = 'https://www.allmusic.com/mood/{}/songs'

dataframes = []
for i in range(len(categories)):
  url = url_link.format(categories[i])                                          # format each website
  dataframe_new = getTable(url, sentiments[i], orig_sentiments[i])
  dataframes.append(dataframe_new)                                              # store each table from different parts of the website into one array

Check that the dataframe outputs are correct

In [None]:
len(dataframes)

18

In [None]:
dataframes[16]

Unnamed: 0,Title/Composer,Performer,Stream,Orig Sentiment,Sentiment
0,Fuck tha Police O'Shea Jackson / Lorenzo Patte...,N.W.A,,aggressive,anger
1,Breaking the Law K.K. Downing / Rob Halford / ...,Judas Priest,Spotify,aggressive,anger
2,Mama Said Knock You Out George Clinton / Willi...,LL Cool J,Spotify,aggressive,anger
3,Fight the Power Carlton Ridenhour / Eric Sadle...,Public Enemy,Spotify,aggressive,anger
4,Wish Trent Reznor,Nine Inch Nails,Spotify,aggressive,anger
5,"Ace of Spades ""Fast"" Eddie Clarke / Lemmy Kilm...",Motörhead,,aggressive,anger
6,Kick Out the Jams Michael Davis / Wayne Kramer...,MC5,Spotify,aggressive,anger
7,Bawitdaba Kid Rock / Jason Krause / Mark Schafer,Kid Rock,,aggressive,anger
8,Get at Me Dog A. Fields / Earl Simmons,DMX,Spotify,aggressive,anger
9,Communication Breakdown John Bonham / John Pau...,Led Zeppelin,,aggressive,anger


Concatenate all the dataframes that are stored in the array to one large dataframe then convert that into a csv file

In [None]:
df_final = pd.DataFrame(columns = ['Title/Composer', 'Performer', 'Stream', 'Orig Sentiment', 'Sentiment'])

for i in range(len(dataframes)):
  df_final = pd.concat([df_final, dataframes[i]])

In [None]:
df_final

Unnamed: 0,Title/Composer,Performer,Stream,Orig Sentiment,Sentiment
0,I'm a Believer Neil Diamond,The Monkees,,happy,happy
1,"ABC Berry Gordy, Jr. / Alphonso Mizell / Fredd...",The Jackson 5,,happy,happy
2,Love Shack Kate Pierson / Fred Schneider / Kei...,The B-52s,Spotify,happy,happy
3,"Going to a Go-Go Warren ""Pete"" Moore / Smokey ...",Smokey Robinson & the Miracles,,happy,happy
4,In the Mood Joe Garland / Andy Razaf,Glenn Miller,Spotify,happy,happy
...,...,...,...,...,...
45,Good Hearted Woman,Waylon Jennings / Willie Nelson,Spotify,rebellious,anger
46,Heartbreak Hotel,Elvis Presley,,rebellious,anger
47,That's All Right,Elvis Presley,,rebellious,anger
48,Elected,Alice Cooper,Spotify,rebellious,anger


In [None]:
df_final.to_csv('music_mood.csv')