This is analysis of whether it is possible to predict how well a song will perform on the billboard charts based on its lyrics.
This is the original csv from the kaggle dataset found at https://www.kaggle.com/datasets/brianblakely/top-100-songs-and-lyrics-from-1959-to-2019
This is the final report of our findings from the project
This is the cleaned up set with the feature engineered variables. This was the dataset that was used for the project
This is a list of stopwords for the model not to consider when calculating shap values.
This is the jupyter notebook that was used for the project. It includes all the code used for data cleaning/engineering, exploratory data analysis, and all models that were fit.