Artificial Neural Network (ANN) built from scratch to estimate the Billboard Hot-100 Chart positions of popular songs based on their chart histories. Uses one hidden layer and trained using backpropagation and gradient descent.
These instructions will allow you to build the dataset, train the network and use it to predict the Billboard charts locally. Start by cloning the repo - git clone https://github.com/rsriv/billboard100_predict.git
The main dependencies are NumPy and the Unofficial Python Billboard API. See instructions below.
pip install numpy
pip install billboard.py
Takes command line options indicated by a hyphen (-) as input to do one or more of the functions in the table below. See examples for details on how to use options. Note: order does not matter when inputting options.
-v verbose output raw prediction after each training iteration
-t train train neural network and write parameters to file
-p predict predict current chart's performance next week and this week's chart (to test accuracy)
-d get data download up-to-date Billboard Chart history data
-help help display this
Below is an example of how to fetch and save an updated dataset, train the network and output predictions in verbose mode. (Recommended after installation)
python billboard_predict.py -dtpv
Below is an example of how to get the help menu shown above.
python billboard_predict.py -help
- Create large feature sets for more accurate predictions - including a rank for an artist's popularity/chartablility
- Implement more advanced neural network architectures
- billboard.py - API for collecting chart data
- NumPy - Framework for facilitating advanced computations
Feedforward feature set through 3 layers (single hidden layer) with nodes using a sigmoid activation function to compute a prediction h. Compute cost then backpropogate to get gradients for each parameter Theta. Perform gradient descent by updating parameters using gradients from backpropagation. Repeat until convergence is roughly achieved.
Achieved ~64% accuracy over 4 bins for top-100. 4, exponentially-scaled bins (1-10, 10-35, 35-60, 60-100) were used because in practice, increments in a single's position become more important and relevant as you ascend the chart. Ex. 99th vs. 98th is a lot less significant than 2nd vs. 1st.