(related repo: MetadataPrediction)
This repo contains the code for my term paper in the module Cultural Analytics of the MSc Digital Humanities at Leipzig University. Here I will explore classifier training using album covers or alternatively text descriptions from BLIP for the classification of genres and subgenres in music. For my analysis I use album covers crawled from MusicBrainz along with their meta data on artists, releases, genres and subgenres. The dataset is currently being crawled and will feature over 1 million album covers for over 200 genres with more than 900 subgenres.
In the related repo (linked above) I pursue a similar project on musical metadata of the album cover data set. The results of both project are aimed to be comparable, giving insight on the same research question from two different perspectives.
Genre-Defining Features in Album Cover Art: Investigating Common Visual Motifs Across Musical Subgenres Using BLIP-2 Captions and Machine Learning Classifiers
In my research paper, I aim to explore the classification of musical subgenres through their album covers using machine learning algorithms. Music genres typically encompass various subgenres, each possessing unique yet subtly connected features that tie them to their overarching genre. However, these connecting features are often nuanced and challenging to pinpoint. My study will investigate whether machine learning algorithms can detect statistical patterns in album cover designs, both within individual subgenres and across their broader genre categories. A key method of analysis will be examining the confusion matrix from the classification results. I will argue that a significant number of true positives in the matrix may indicate a statistical relationship within a subgenre. More importantly, the rate of false positives, especially between subgenres of the same genre, could reveal genre-spanning features. For example, I anticipate a higher rate of false positives within subgenres of Metal compared to false positives between a Metal subgenre and a Hip Hop subgenre. This pattern, if observed, could suggest the presence of distinct, genre-specific characteristics in album cover designs.
- album covers are extremely diverse and artistic; lots of noise in the data is to be expected
- data is not evenly distributed across genres and subgenres and careful sampling is needed; maybe sacrifice diversity in favor of consistency and only use the 10 most common genres with their 10 most common subgenres each?
- rate of false positives might not necessarily be an indicator for features connecting subgenres to a genre; there could be a bias in distribution of other factors between genres like release date or geographical origin that can't be prevented even through careful sampling
- get list of all genres and subgenres from MusicBrainz
- extract information of artists, their releases and their genres
- get the ids of all releases listed on MusicBrainz
- use these ids to download all front covers
- in 500x500 (crawling currently in progress; finished eta end of January)
- in 1200x1200
- map all genres to their respective subgenres
- check resolution of all scraped covers and decide on most useful resolution
- plot distribution of available covers among all (sub)genres
- decide on included subgenres based on that distribution (most smaller subgenres might not have enough data)
- sample (balanced) dataset with at least 20.000 (?!) covers for earch genre and 2000 (?) for each subgenre
- make sure all selected items also have metadata for the other project (using data of the same artists and albums results in optimal comparability)
- study best practice of preprocessing of visual data
- train classifiers
- evaluate
- repeat
- ???
- profit
- perform clustering analysis like k-means (maybe some subgenre might fit more in a different parent genre)
- visualise distance between subgenres or clustering of classes
- possible approaches:
- multidimensional scaling
- t-distributed stochastic neighbor embedding
- network graphs using Three.js (or some other fancy interactive 3D visualization framework)