Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
r
 
 
 
 
 
 

Lyrics, Pt. 1: Genre Classification

In today’s day and age, we’re seeing more crossover than ever between musical artists of different genres. This project builds a model which predicts a song's genre based solely on its lyrical content.

A full description of the project can be found at saisenberg.com.

Getting started

Prerequisite software

  • Python (suggested install through Anaconda)

  • R

Prerequisite libraries

  • Python:

    • bs4, numpy, pandas, re, requests, sklearn, string, warnings (all installed with Anaconda)
    • json (!pip install json)
    • nltk (!pip install nltk)
    • xgboost (!pip install xgboost)
  • R:

lib <- c('dplyr', 'geniusR', 'jsonlite', 'lubridate', 'stringr')
install_packages(lib)

Instructions for use

1. Run the code contained in /python/artist_collection.ipynb

This code scrapes Billboard, Ranker, and TheTopTens for artists of different genres. Any duplicate artists are removed as appropriate.

The output of /python/artist_collection.ipynb can also be found at /data/json_genres.json.

2. Run /r/genius_scraper.R

This program scrapes and cleans lyrics from Genius, categorizing results by genre. Visit Genius to view or obtain a Genius client access token.

The output of /r/genius_scraper.R can also be found at /data/lyrics.csv.

3. Run the code contained in /python/lyrics_classifier.ipynb

This code preprocesses all lyrics for modeling, and runs Naïve Bayes, support vector machine, and gradient boosting models to predict a song's genre from its lyrics.

Author

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Acknowledgements

About

Predicting a song's genre from its lyrical content

Resources

License

Releases

No releases published

Packages

No packages published