Predict if a song will hit the Billboard Year-End Hot 100 singles
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Heat Replay

Slides for the project are available on Google Slides or in PDF here!

A data science project that will attempt to determine if the lyrical content of a song can predict if it will hit the Billboard Year-End Hot 100 singles. The project will intersect several datasets to create a final dataframe that will consist of songs that charted and those that did not chart, with each comprising almost 50% of the set, along with the bag of words version of their lyrics and the analyses on them, such as sentiment analysis, frequency of obscene words, frequency of words pertaining to certain themes, total number of unique words, etc. and the year they charted. The dataframe will also include the last column 'charted', a binary variable that corresponds to the chart status of the song.

Structure of features

  1. Track information

  2. Year (int)

  3. Decade (int)

  4. Lyrical content

  5. Unique Words, w/o stopwords (int)

  6. Density, w/o stopwords (int)

  7. Unique Words, w/ stopwords (int)

  8. Density, w/ stopwords (int)

  9. Nouns (int)

  10. Verbs (int)

  11. Adjectives (int)

  12. Syllables (int)

  13. Most used term (string)

  14. Most used frequency (int)

  15. Curses (binary)

  16. Total curses (int)

  17. Reading score (float)

  18. Sentiment (float)

  19. Chart

  20. Charted (binary)

Structure of repository

├── data; the datasets for the project
├── code; scripts to build the datasets
└── assets; static files and docs

23 directories, 60 files