Autor: Matyáš Sládek <br>
Rok: 2020 <br>

Tento soubor obsahuje buňky pro vytvoření CSV souborů s informacemi o žánrech skladeb pro datové sady "Extended Ballroom Dataset", "FMA: A Dataset For Music Analysis" a "GTZAN". <br>
Při použití jiné datové sady je nutné pro správnou funkčnost projektu vytvořit takovýto soubor, který pro každou skladbu obsahuje dva sloupce ve formátu [id_skladby] [žánr_skladby]. <br>
Soubor musí být uložen ve formátu CSV do složky ../metadata/track_genre_lists/ s názvem {název_datové_sady}_track_genre_lists. <br>

In [1]:
import os
import sys
import pandas as pd

Tato buňka vytvoří CSV pro datovou sadu FMA extrakcí informací s přiloženého souboru CSV.

In [2]:
df = pd.read_csv("../data/FMA/fma_metadata/tracks.csv", index_col=0, header=[0, 1])   # Load the CSV containing track info
subset = df.index[df['set', 'subset'] == 'small']   # Extract track ids for the smallest dataset
df = df.loc[subset]   # Remove information about tracks from other versions of the dataset
df = df['track']   # Leave only the information about the tracks themselves. (there is other info included)
df = df[['genre_top']]   # From the info about tracks, leave only the info about its main genre
df = df.rename(columns={'genre_top':'genre'})   # Rename the genre column to be consistent with other used datasets
df.index = df.index.map(str)   # Transform the index from integer to string format
df.index = df.index.str.pad(6, side='left', fillchar='0')   # Append zeroes to the beginning of the string, so that the sizes of the indices are consistent
df = df.sort_index()   # Sort the dataset by indices for more efficient searching

# Create folder for storing track_genre_lists
if not os.path.exists('../metadata/track_genre_lists'):
    try:
        os.mkdir('../metadata/track_genre_lists')
    except Exception as e:
        print('{}: {}'.format('os.mkdir(../metadata/track_genre_lists)', repr(e)), file=sys.stderr)

df.to_csv('../metadata/track_genre_lists/FMA_track_genre_list.csv', header=True)   # Save the dataframe to CSV file

Tato buňka vytvoří CSV pro datovou sadu GTZAN extrakcí informací z názvu jejích skladeb.

In [3]:
data = {}   # Stores the info about tracks genre for later dataframe creation

for _, _, files in os.walk("../data/GTZAN"):   # Go through each file in the dataset
    for file in files:   # For each of the files
        if file.lower().endswith('.au'):   # If the current file is in the ULAW format
            entry = os.path.splitext(file)[0]   # Leave only the file name itself
            data.update({entry:entry.split('.')[0]})   # Append the index - genre info from the file name to the data dictionary
            
df = pd.DataFrame.from_dict(data, orient='index', columns=['genre'])   # Create the dataframe from the data dictionary
df.index.name = "track_id"   # Rename the index column to be consistent with other used datasets
df = df.sort_index()   # Sort the dataset by indices for more efficient searching

# Create folder for storing track_genre_lists
if not os.path.exists('../metadata/track_genre_lists'):
    try:
        os.mkdir('../metadata/track_genre_lists')
    except Exception as e:
        print('{}: {}'.format('os.mkdir(../metadata/track_genre_lists)', repr(e)), file=sys.stderr)

df.to_csv('../metadata/track_genre_lists/GTZAN_track_genre_list.csv', header=True)   # Save the dataframe to CSV file

Tato buňka vytvoří CSV pro datovou sadu EBD extrakcí informací z názvů skladeb a složek, ve kterých jsou tyto uloženy.              

In [4]:
data = {}   # Stores the info about tracks genre for later dataframe creation

for dir_path, _, files in os.walk('../data/EBD'):   # Go through each file in the dataset
    for file in files:   # For each of the files
        if file.lower().endswith('.mp3'):   # If the current file is in the MP3 format
            data.update({os.path.splitext(file)[0]:os.path.basename(dir_path)})   # Append the index - genre info from the folder and file name to the data dictionary
                    
df = pd.DataFrame.from_dict(data, orient='index', columns=['genre'])   # Create the dataframe from the data dictionary
df.index.name = "track_id"   # Rename the index column to be consistent with other used datasets
df = df.sort_index()   # Sort the dataset by indices for more efficient searching

# Create folder for storing track_genre_lists
if not os.path.exists('../metadata/track_genre_lists'):
    try:
        os.mkdir('../metadata/track_genre_lists')
    except Exception as e:
        print('{}: {}'.format('os.mkdir(../metadata/track_genre_lists)', repr(e)), file=sys.stderr)

df.to_csv('../metadata/track_genre_lists/EBD_track_genre_list.csv', header=True)   # Save the dataframe to CSV file