-
Notifications
You must be signed in to change notification settings - Fork 1
Dataset
In 2002, Tzanetakis present a new well-known musical dataset : the GTZAN dataset. It consists of 10 groups of 100 song extracts of 30 seconds, so a total of 1000 musical extracts. Each group represent a musical genre in the following list : classical, disco, country, rock, pop, metal, blues, jazz, hiphop, techno, R&B and reggae. We found the link of this dataset on Kaggle which comes pre-organised into folders for each genre. This dataset is very interesting because classes are already well balanced. However, it is quite small, some of the tracks are mislabeled and others come from the same song.
So, we decided to improve this dataset by adding new songs and new genres using Youtube-DL. To do that, we chose some well-known labelled playlist on YouTube to add songs in existing genres and to add new genres.
Here is the examples of four songs from four different genres.

Lilia Ben Baccar, Erwan Rahis, ENSAE Paris (https://github.com/erwanrh/ML_Python-Music_Classification)