# 03-extract_pitch_class_distribution

This notebook extracts pitch class distribution from the predominant melody of each audio recording using [tomato v0.14.0](https://github.com/sertansenturk/tomato/tree/v0.14.0):


In [1]:
import mre


## Stop if pitch class distibution was run in the past

Extracting the feature may take a long time. It is better to check if there is already a run in advance.

In [2]:
run = mre.mlflow_common.get_run_by_name(
    mre.data.PitchClassDistribution.EXPERIMENT_NAME,
    mre.data.PitchClassDistribution.RUN_NAME)

if run is not None:
    raise ValueError(
        "There is already a run for %s:%s. Overwriting is not "
        "permitted. Please delete the run manually if you want "
        "to extract the pitch class distribution again."
        % (mre.data.PitchClassDistribution.RUN_NAME, run.run_id))


No runs with the name pitch_class_distribution in experiment data_processing


# Read predominant melody filepaths from mlflow


In [3]:
melody_paths = mre.data.PredominantMelodyMakam.from_mlflow()
display(melody_paths[:5])

['/data/artifacts/2/c2f9cc075e2648009cc274eb75c6807c/artifacts/006536f8-bf54-4cc0-a510-5a52456d09f8.npy',
 '/data/artifacts/2/c2f9cc075e2648009cc274eb75c6807c/artifacts/009309d2-c260-4808-8f1d-44a5ddc6bc5f.npy',
 '/data/artifacts/2/c2f9cc075e2648009cc274eb75c6807c/artifacts/00a48b5f-a35a-436c-a7a0-4438130f4abf.npy',
 '/data/artifacts/2/c2f9cc075e2648009cc274eb75c6807c/artifacts/00ab81ec-07f8-47ba-9610-47ad56393eb9.npy',
 '/data/artifacts/2/c2f9cc075e2648009cc274eb75c6807c/artifacts/00be9a4b-b85f-4601-a6d5-9ce1f5b3f91c.npy']

# Read tonic annotations from mlflow

In [4]:
annotations = mre.data.Annotation.from_mlflow()
tonic_frequencies = annotations.data.set_index("mbid")["tonic"]
tonic_frequencies.head()


mbid
00f1c6d9-c8ee-45e3-a06f-0882ebcb4e2f    256.0
168f7c75-84fb-4316-99d7-acabadd3b2e6    115.2
24f549dd-3fa4-4e9b-a356-778fbbfd5cad    232.5
407bb0b4-f19b-42ab-8c0a-9f1263126951    233.5
443819eb-6092-420c-bd86-d946a0ad6555    219.6
Name: tonic, dtype: float64

# Extract pitch class distributions from normalized predominant melody features and log to mlflow


In [5]:
pcd = mre.data.PitchClassDistribution()
pcd.transform(melody_paths, tonic_frequencies)
pcd.log()


100%|██████████| 1000/1000 [00:39<00:00, 25.14it/s]
No runs with the name pitch_class_distribution in experiment data_processing
