Skip to content

Text classification model trained on the song lyrics of two similar artists, with the corpus built from web scraping and HTML parsing

Notifications You must be signed in to change notification settings

rahman-rakib/Song_Lyrics_Classification

Repository files navigation

Song Lyrics Classification

Overview

This project aims at building a text classification model on song lyrics. The task is to predict the artist from song text. Training such a model requires first of all that we collect our own lyrics dataset. We will focus on two artist from the "Heavy Metal" genre: Ronnie James Dio (Dio) and Ozzy Osbourne (Ozzy).

First, we will make use of the website: http://www.darklyrics.com for collecting the dataset. Through webscraping we will download for each artist a HTML page with links to his albums, from whch we will extract album hyperlinks by HTML parsing. Then, we can again download HTML pages for all the albums, extracting song lyrics from each one of them.

Then, we train various models on the dataset we collected. We will select the best model hyperparameter tuning with k-fold cross validation.

Technology Stack

About

Text classification model trained on the song lyrics of two similar artists, with the corpus built from web scraping and HTML parsing

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published