It's Not Rockette Science: Determining if a Classical Piece is from a Ballet

I had the idea to classify classical pieces as ballet or non-ballet pieces a few weeks ago, right before I saw Balanchinne's Jewels. Jewels isn't a typical ballet -- it is comprised of three disjoint acts (Emeralds, Rubies, and Diamonds) with no real story. Rather than tell a story, it an exposition of talent and style and it is evocative of the style (French, American, and Russian, respectively) the act represented. Because this ballet seemed to antithetical to a traditional ballet, it got me thinking: what is a ballet, anyway? How does a ballet sound? Is it just any classical music, pretty costumes and sets, and some pirouettes? In my heart, I thought no, of course not!, and so I set out to prove it (or at least part of it) by creating a classifier of ballet music.

Data
- Source
- Features
EDA and Feature Engineering
- Interesting Findings
- Feature Engineering
Modeling
Closing Thoughts
Next Steps

Data

Source

I began by creating two playlists on Spotify -- one playlist had about 1,000 ballet pieces, and the other had about 1,000 non-ballet, classical pieces. Then I used Spotify's API via the Spotipy library to get their features and analysis for each piece. Later I scraped Last.fm to get genres when trying to find patterns in misclassified data, but more on that later!

Features

Features - Spotify gives developers a few features that are ready to use from the get go.
- Acousticness - A continous measure from 0 to 1; closer to 1 represents a higher confidence that the track is acoustic.
- Danceability - A continuous measure of how suitable the song is to dancing; closer to 1 represents a higher level of danceability.
- Energy - A continous measure of intensity and activity throughout the whole song; closer to 1 represents that the track is more intense.
- Instrumentalness - A continous measure of confidence in whether the song has vocals or not; above .5 represents instrumental music, whereas closer to 0 represents music with with voices. ("oohs" and "aahs" are not treated as vocals).
- Key - They key the track is in, where 0 is C, 1 is C#, 2 is D, 3 is D#, 4 is E, 5 is F#, etc.
- Liveness - A continous measure of confidence that the song is live; .8 or higher represents strong confidence that the song is live.
- Loudness - The loudness in decibels (typically ranging between -60 and 0).
- Mode - A categorical representation of the modality of the track -- 0 (minor) or 1 (major).
- Speechiness - A measure of words in a song -- .33 and below are music, between .33 and .66 are music with spoken words, like rap, and values above .66 are probably podcasts, audiobooks, or the like.
- Tempo - the pace of the song, in beats per minute.
- Time signature - the meter of the song, measured in beats per bar/measure.
- Valance - the emotion of the song -- closer to 0 is more negative, closer to 1 is more positive.
Analysis - a list of dictionaries for each tatum, beat, segment, bar, and section were provided, but I only ended up using the segments and sections, so I'll only discuss those below.
- Sections - Splits the song into chunks that sound distinctly different from the last section. The data I used (there were additional features I did not use) that was given with each section were:
  - Duration
  - Loudness
  - Tempo
  - Key
  - Mode
  - Time signature These have the same definitions as those listed above, just specific to the section, rather than the whole song.
- Segments - A list of dictionaries for each segment, which was a further division of sections. Each segment sounds roughly the same. The data I used (again, there were additional features I did not use) that was given with each segment were:
  - Duration - same definition as above
  - Pitches - a column vector for which each position in the vector represents one of the 12 pitches (C, C#, D, etc.). Each value is between 0 and 1, representing how common the pitch was in the segment.
  - Timbre - A vector representing the quality of a music note, which, when expressed as a linear combination, represent the overall timbre of the segment.

EDA & Feature Engineering

Interesting Findings

When initially exploring the data, I really wanted danceability to matter, even though I knew Spotify's measure is probably more suited to disco and salsa danceability. Nevertheless, I plotted it, and thought it might be interesting to others. Below is the graph of the distributions of danceability for ballet pieces and non-ballet pieces. You can see that the distributions are slightly different, with the distribution of ballet pieces shifter more to the right, towards the 'more danceable' end of the spectrum, but not quite enough to emphatically declare that there is a significant difference.

Then, while looking at other distributions, I came across a feature with a difference in distribution that felt exciting -- valence! As you can see, there is a significant different in both the shape and center of the distributions of valance for ballet and non-ballet pieces. While both are right-skewed, the tail of non-ballet songs is much skinnier, and the peak is much taller. Thus, we conclude that non-ballet songs tend to evoke more negative emotions, whereas the emotion of ballet songs is more evenly spread (even though there are still more negatively evocotive ballet songs).

Feature Engineering

I introduced a number of different aggregations of the data on the sections and segments, as the ways of parsing apart and boiling down the songs using this data were endless. I created the following new features:

Using section data:
- duration range
- loudness range
- key range
- tempo range
- mode range (equivalent to seeing if any section was in major)
- time signature range
Using segment data:
- duration range
- number of unique pitches (unique "strengths", you could say? since each is a measure how of strong the presence of that pitch was in the segment)
- number of unique timbre values
- mean pitch In all honesty, I had ideas about what some of these features could mean in the context of ballet vs not a ballet, but for others, I figured "hmm, the range of the duration could be useful, since one would expect ballets to have shorter sections as the the story moves quickly and action is unfolding", thought "why not see if the key range is any different between the two!"

However, later I did think hard about what these features told me, and if I could do it over again, I would be more thoughtful. This is especially true if I had a lot more data, which would mean that including all of those features could really slow down the time it takes to run all those models.

Modeling

Choosing the Model

Logistic regression, KNN, Random Forest, SVM, and XGBoost were all considered, and performed very similarly. After parameter tuning through grid search, the top 3 contenders were Random Forest, SVM, and XGBoost. These the top three based on accuracy, AUC-ROC, and F1 score, since I wanted to capture the model to penalize false positives and false negatives roughly equally. In the end, I chose Random Forest because it was a good balance of accuracy and interpretability -- SVM and XGBoost only performed slightly better, while failing to be interpretable. The Random Forest ended up with 82% accuracy, with an F1 score of 82% as well, which were both only 1% lower than SVM and XGBoost's respective metrics. The confusion matrix and ROC curve are shown below.

Interpretation of the Model's Feature Importance

When looking at the bar chart of feature importance, below, we can see that the top 5 most important features are:

Duration
Number of sections
Number of segments
Acousticness
Mean pitch

When I see this, it appears to me that the model captures the story element of ballet pieces -- the number of sections and segments increase with ballet pieces, as the story progresses, since you're constantly shifting perspective from one character to the next, watching plot twists unfold, seeing the introduction of new characters, and these are all revealed both visually and musically. Each of these plot points sound distinctly different, and they need to -- as an audience, we wouldn't understand what was going on if they didn't! My interpretation of the mean pitch also supports my idea of story, since I would imagine that the mean of the pitches would be higher in songs with obvious stories, since a story is often told through a range of pitches. Thus, the mean would be brought up by a higher value in each position in the column vector.

What the Model Got Wrong

In an effort to improve my model, I wanted to see what it was getting wrong. Since Spotify doesn't have genres associated with their songs (only with artists, which wouldn't be helpful, since an artist like "Evergreen Symphony" may play all sorts of classical music, and I wanted just the subgenre of a particular song), I webscraped Last.fm to get the user generated tags for each piece that was misclassified. Then, I plotted the frequency of each tag.

Above, we can see that the top 10 genres (ignoring tags like "composer" that are not indicative of the subgenre) are

Romantic
Russian
Contemporary
Baraoque
20th Century
German
Italian
Instrumental
Opera
Avante-Garde

I would've expected the classifier to correctly classify more unique-sounding music, such as Baroque, since Baroque music is really not 'balletic' (which I know from my own domain knowledge). So, I looked at the breakdown of misclassified data by ballet/non-ballet:

However, even still, this wasn't particularly illuminating! It still seems that it slipped up on somewhat obvious songs! I expected orchestral pieces to be the number one genre misclassified as a ballet. I'm thinking that maybe it could be more illuminating if I maintained the groupings (since songs has multiple tags, and maybe they say more together), or if I found another source? An exploration for another day!

Making Predictions with the Model

The fun part! I tested a few songs:

A ballet piece (Sleeping Beauty Intro) - classified as a ballet
A classical piece (River Free) - classified as a classical piece
A 'ballet sounding' classical piece (Elgar: Variations on an Original Theme, Op. 36, Enigma IX) - classified as a ballet

Then, for fun, just to see what would happen:

A Beatles song - "Across the Universe"; classified as a ballet
A Rolling Stones song - "Wild Horses"; classified as not a ballet
An Allman Brothers song (rock) - "Whipping Post"; classified as not a ballet

Closing Thoughts

Though my classifier was not completely accurate, I'm not too upset! Through playing around with making predictions, I came to discover that it would misclassify songs that I, myself, misclassified. Is that really a misclassification? If you're looking to always accurately classify whether a song has been used in a ballet, then yes. However, if you're looking to accurately classify songs that are ballet songs or could be ballet songs, then no! Personally, I'm more interested in the latter. I feel as though songs (like the third tested in the Making Predictions section) that sound balletic should be classified as such, and in the end, I'm happy that my classifier was able to "hear" those distinctions.

Next Steps

Examine the distribution of tags across all pieces and incorporate groups of tags (since each song has multiple tags) to hopefully make the charts of what the model got wrong more meaningful
Explore more deeply the distributions of the features that were the most importance and try and get a better grasp on why they matter so much and what they might tell us about the characteristics ballet pieces
- I feel the worst about not spending more time on this! With only a week, I got so focused on creating a strong classifier that I lost sight of my focus -- I definitely will remember this trap in my next project.
Inspect the features that seem most important to PCA and feature regularization (for logistic regression)
- I didn't get to spend nearly as much time as I would've liked on this, which I regret for the same reasons as above.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Data		Data
Images		Images
Notebooks		Notebooks
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data

Data

Images

Images

Notebooks

Notebooks

.DS_Store

.DS_Store

.gitignore

.gitignore

README.md

README.md

Repository files navigation

It's Not Rockette Science: Determining if a Classical Piece is from a Ballet

Data

Source

Features

EDA & Feature Engineering

Interesting Findings

Feature Engineering

Modeling

Choosing the Model

Interpretation of the Model's Feature Importance

What the Model Got Wrong

Making Predictions with the Model

Closing Thoughts

Next Steps

About

Releases

Packages

Languages

h-parker/ballet-or-not

Folders and files

Latest commit

History

Repository files navigation

It's Not Rockette Science: Determining if a Classical Piece is from a Ballet

Data

Source

Features

EDA & Feature Engineering

Interesting Findings

Feature Engineering

Modeling

Choosing the Model

Interpretation of the Model's Feature Importance

What the Model Got Wrong

Making Predictions with the Model

Closing Thoughts

Next Steps

About

Resources

Stars

Watchers

Forks

Languages