In [1]:
from pulp import *
import pandas as pd

# The Spotify Dataset

The goal of this project is to create an optimal subplaylist of a bigger playlist, maximizing some metrics related to how good the tracks are, subject to maximum duration and possibly other constraints.

We'll use the Spotify Playlist [For Those About to Rock](https://open.spotify.com/playlist/5TUxgTIxzLbLVh7RUf9V8i?si=9c057405f51747b8) created by [Rafaella Bellerini](https://github.com/rafaballerini) with more than a thousand rock songs.

Data was colected through Spotify API, which provides the following metrics for each track (descriptions from [API documentation]((https://developer.spotify.com/documentation/web-api/reference/#/operations/get-audio-features))):

- Track Popularity:
    
    The value will be between 0 and 100, with 100 being the most popular.

- Artist Popularity:

    The artist's popularity is calculated from the popularity of all the artist's tracks, and also ranges from 0 to 100.

- Duration:

    The track length (in minutes)

- Danceability:

    Describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0 is least danceable and 1 is most danceable.

- Energy:

    A measure from 0 to 1 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.

- Acousticness:

    A confidence measure from 0 to 1 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.

- Instrumentalness:

    Predicts whether a track contains no vocals. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1, the greater likelihood the track contains no vocal content

- Liveness:

    Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live.

- Valence:

    A measure from 0 to 1 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

In [6]:
df = pd.read_csv("../data/for_those_about_to_rock.csv")
df.head()

Unnamed: 0,track_name,artist,track_pop,artist_pop,duration,danceability,energy,acousticness,instrumentalness,liveness,valence,link
0,Sweet Child O' Mine,Guns N' Roses,84,82,5.908667,0.454,0.91,0.0866,0.0996,0.116,0.629,open.spotify.com/track/7o2CTH4ctstm8TNelqjb51
1,Welcome To The Jungle,Guns N' Roses,80,82,4.533767,0.447,0.954,0.0235,0.403,0.298,0.331,open.spotify.com/track/0bVtevEgtDIeRjCJbK3Lmv
2,Paradise City,Guns N' Roses,81,82,6.760667,0.273,0.952,0.0169,0.0111,0.142,0.472,open.spotify.com/track/3YBZIN3rekqsKxbJc9FZko
3,November Rain,Guns N' Roses,0,82,8.958433,0.197,0.629,0.0165,0.279,0.125,0.221,open.spotify.com/track/53968oKecrFxkErocab2Al
4,Knockin' On Heaven's Door,Guns N' Roses,0,82,5.6,0.486,0.747,0.0203,0.00607,0.0992,0.368,open.spotify.com/track/7gXdAqJLCa5aYUeLVxosOz


# Problem Definition

The goal is to select an optimal subplaylist out of hundreds of tracks in Rafaella's playlist. A natural constraint is a limited duration since the whole playlist has over 24 hours of rock.

Another natural question is: "What is an optimal playlist?"

We can use any of the above-described audio features, as a maximization metric, but here we'll define "optimal" as "most popular" using track and artist popularity.

As additional restrictions, we can force the solver to include tracks of selected artists to match the user's tastes. Here we're going to force the inclusion of Guns N'Roses and AC/DC with at least three songs each.

The audio features not used for maximization can be used as contraints in the following way: The selected tracks must have an average metric above (or below) a given threshold. Here we're going to add a minumum energy and valence threshold trying to select energetic and happy songs.


In [None]:
The goal is to select 