## Pearson's correlation coefficient

Pearson's correlation coefficient is the standardized covariance.
 
Population Pearson coefficient (𝜌). 
In practice, one rarely has access to the entire population, so its value is estimated from a sample using the sample Pearson coefficient.

The Pearson correlation coefficient measures the linear association between variables. Its value can be interpreted like so:

+1 - Complete positive correlation
+0.8 - Strong positive correlation
+0.6 - Moderate positive correlation
0 - no correlation whatsoever
-0.6 - Moderate negative correlation
-0.8 - Strong negative correlation
-1 - Complete negative correlation

### Conditions

The conditions that must be met for the Pearson correlation coefficient to be valid are:

* The relationship to be studied is of a linear type (otherwise, Pearson's coefficient cannot detect it).

* The two variables must be numerical.

* Normality: both variables must be normally distributed. In practice, it is usually considered valid even when they are moderately away from normality.

* Homoscedasticity: the variance of 𝑌 must be constant along the variable 𝑋.
This can be contrasted if in a scatterplot the values of 𝑌 maintain the same dispersion in the different zones of the variable 𝑋.



### Characteristics

* It takes values between [-1, +1], being +1 a perfect positive linear correlation and -1 a perfect negative linear correlation.

* It is independent of the scales on which the variables are measured.

* It does not vary if transformations are applied to the variables.

* It does not take into consideration whether the variables are dependent or independent.

* Pearson's correlation coefficient is not equivalent to the slope of the regression line.

* It is sensitive to outliers, so it is recommended, if they can be justified, to exclude them before performing the calculation.

The Pearson Correlation coefficient can be computed in Python using the `corrcoef()` method from NumPy.

In [1]:
!git clone https://github.com/iferco/GMD-alignment

Cloning into 'GMD-alignment'...
remote: Enumerating objects: 842, done.[K
remote: Counting objects: 100% (42/42), done.[K
remote: Compressing objects: 100% (29/29), done.[K
remote: Total 842 (delta 20), reused 35 (delta 13), pack-reused 800[K
Receiving objects: 100% (842/842), 727.19 MiB | 22.92 MiB/s, done.
Resolving deltas: 100% (315/315), done.
Updating files: 100% (282/282), done.


In [2]:
!pip install tqdm
from tqdm import tqdm
import os

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [3]:
import pandas as pd
import numpy as np
align_chroma_fonsagrada = '/content/GMD-alignment/data/Muiñeira da Fonsagrada/alignments/align_chroma.txt'
df = pd.read_csv(align_chroma_fonsagrada, sep = ",", names=['score_time', 'perf_time'], header=0)
display(df)

Unnamed: 0,score_time,perf_time
0,0.000000,0.000000
1,0.011610,0.011610
2,0.023220,0.023220
3,0.034830,0.034830
4,0.046440,0.046440
...,...,...
7080,82.198639,45.325351
7081,82.210249,45.336961
7082,82.221859,45.348571
7083,82.233469,45.360181


In [4]:
my_rho = np.corrcoef(df['score_time'],df['perf_time'])
my_rho

array([[1.        , 0.95449158],
       [0.95449158, 1.        ]])

In [5]:
coef_corr = df["score_time"].corr(df["perf_time"], method="pearson")
coef_corr

0.9544915834001346

In [6]:
data_dir = '/content/GMD-alignment/data'
list_muiñeiras = []
for root, dirs, files in os.walk(data_dir):
    for dir in dirs:
        #dir - name of the folder; path - path to the folder
        path=os.path.join(root,dir)
        if 'score' not in path and 'performance' not in path:
            list_muiñeiras.append(path)

In [7]:
list_muiñeiras

['/content/GMD-alignment/data/Muiñeira da Fonsagrada',
 '/content/GMD-alignment/data/Muiñeira de Oliveira',
 '/content/GMD-alignment/data/Muiñeira_2',
 '/content/GMD-alignment/data/Muiñeira do Hío',
 '/content/GMD-alignment/data/Muiñeira de Negros',
 '/content/GMD-alignment/data/Muiñeira de Parada de Achas',
 '/content/GMD-alignment/data/Muiñeira do Molete',
 '/content/GMD-alignment/data/Muiñeira de Vilarmide',
 '/content/GMD-alignment/data/Muiñeira do Condado',
 '/content/GMD-alignment/data/Muiñeira de Pol',
 '/content/GMD-alignment/data/Muiñeira de Ramelle',
 '/content/GMD-alignment/data/Muiñeira de Desiderio',
 '/content/GMD-alignment/data/Muiñeira de Quindous',
 '/content/GMD-alignment/data/Muiñeira de Coia',
 '/content/GMD-alignment/data/Muiñeira do Chao',
 '/content/GMD-alignment/data/Muiñeira de Pastoriza',
 '/content/GMD-alignment/data/Muiñeira de Arabexo',
 '/content/GMD-alignment/data/Muiñeira de Corme',
 '/content/GMD-alignment/data/Muiñeira de Vilar de Conforto',
 '/content

In [8]:
list_muiñ_alignments = []

for path in list_muiñeiras:
  if 'alignments' in path:
    list_muiñ_alignments.append(path)
    #print(path)
print(list_muiñ_alignments)

['/content/GMD-alignment/data/Muiñeira da Fonsagrada/alignments', '/content/GMD-alignment/data/Muiñeira de Oliveira/alignments', '/content/GMD-alignment/data/Muiñeira_2/alignments', '/content/GMD-alignment/data/Muiñeira do Hío/alignments', '/content/GMD-alignment/data/Muiñeira de Negros/alignments', '/content/GMD-alignment/data/Muiñeira de Parada de Achas/alignments', '/content/GMD-alignment/data/Muiñeira do Molete/alignments', '/content/GMD-alignment/data/Muiñeira de Vilarmide/alignments', '/content/GMD-alignment/data/Muiñeira do Condado/alignments', '/content/GMD-alignment/data/Muiñeira de Pol/alignments', '/content/GMD-alignment/data/Muiñeira de Ramelle/alignments', '/content/GMD-alignment/data/Muiñeira de Desiderio/alignments', '/content/GMD-alignment/data/Muiñeira de Quindous/alignments', '/content/GMD-alignment/data/Muiñeira de Coia/alignments', '/content/GMD-alignment/data/Muiñeira do Chao/alignments', '/content/GMD-alignment/data/Muiñeira de Pastoriza/alignments', '/content/GMD

In [9]:
coef_corr_chroma = pd.DataFrame(columns=['txt','coef_corr_chroma', 'corr_total'])
coef_corr_spectra = pd.DataFrame(columns=['txt','coef_corr_spectra'])

for alignments in list_muiñ_alignments:
  #print(alignments)
  files = os.listdir(alignments)
  #print(txt)
  txts=[]
  txts.append(os.path.join(alignments,files[0]))
  txts.append(os.path.join(alignments,files[1]))
  for txt in txts:
    if 'chroma' in txt:
      df = pd.read_csv(txt, sep = ",", names=['score_time', 'perf_time'], header=0)
      coef_corr = df["score_time"].corr(df["perf_time"], method="pearson")
      coef_corr_chroma = coef_corr_chroma.append({'txt': alignments, 'coef_corr_chroma': coef_corr}, ignore_index=True)

    if 'spectra' in txt:
      df = pd.read_csv(txt, sep = ",", names=['score_time', 'perf_time'], header=0)
      coef_corr = df["score_time"].corr(df["perf_time"], method="pearson")
      coef_corr_spectra = coef_corr_spectra.append({'txt': alignments, 'coef_corr_spectra': coef_corr}, ignore_index=True)

  coef_corr_spectra = coef_corr_spectra.append({'txt': alignments, 'coef_corr_spectra': coef_corr}, ignore_index=True)
  coef_corr_chroma = coef_corr_chroma.append({'txt': alignments, 'coef_corr_chroma': coef_corr}, ignore_index=True)
  coef_corr_spectra = coef_corr_spectra.append({'txt': alignments, 'coef_corr_spectra': coef_corr}, ignore_index=True)
  coef_corr_chroma = coef_corr_chroma.append({'txt': alignments, 'coef_corr_chroma': coef_corr}, ignore_index=True)
  coef_corr_spectra = coef_corr_spectra.append({'txt': alignments, 'coef_corr_spectra': coef_corr}, ignore_index=True)
  coef_corr_chroma = coef_corr_chroma.append({'txt': alignments, 'coef_corr_chroma': coef_corr}, ignore_index=True)
  coef_corr_spectra = coef_corr_spectra.append({'txt': alignments, 'coef_corr_spectra': coef_corr}, ignore_index=True)
  coef_corr_chroma = coef_corr_chroma.append({'txt': alignments, 'coef_corr_chroma': coef_corr}, ignore_index=True)
  coef_corr_spectra = coef_corr_spectra.append({'txt': align

In [10]:
display(coef_corr_chroma)

Unnamed: 0,txt,coef_corr_chroma,corr_total
0,/content/GMD-alignment/data/Muiñeira da Fonsag...,0.954492,
1,/content/GMD-alignment/data/Muiñeira de Olivei...,0.997942,
2,/content/GMD-alignment/data/Muiñeira_2/alignments,0.950076,
3,/content/GMD-alignment/data/Muiñeira do Hío/al...,0.980367,
4,/content/GMD-alignment/data/Muiñeira de Negros...,0.924768,
5,/content/GMD-alignment/data/Muiñeira de Parada...,0.230136,
6,/content/GMD-alignment/data/Muiñeira do Molete...,0.952227,
7,/content/GMD-alignment/data/Muiñeira de Vilarm...,0.961173,
8,/content/GMD-alignment/data/Muiñeira do Condad...,0.94486,
9,/content/GMD-alignment/data/Muiñeira de Pol/al...,0.986251,


In [11]:
pearson_merged = coef_corr_chroma.merge(coef_corr_spectra, on='txt')
pearson_merged

Unnamed: 0,txt,coef_corr_chroma,corr_total,coef_corr_spectra
0,/content/GMD-alignment/data/Muiñeira da Fonsag...,0.954492,,0.955487
1,/content/GMD-alignment/data/Muiñeira de Olivei...,0.997942,,0.993291
2,/content/GMD-alignment/data/Muiñeira_2/alignments,0.950076,,0.848749
3,/content/GMD-alignment/data/Muiñeira do Hío/al...,0.980367,,0.908266
4,/content/GMD-alignment/data/Muiñeira de Negros...,0.924768,,0.949639
5,/content/GMD-alignment/data/Muiñeira de Parada...,0.230136,,0.980152
6,/content/GMD-alignment/data/Muiñeira do Molete...,0.952227,,0.952755
7,/content/GMD-alignment/data/Muiñeira de Vilarm...,0.961173,,0.968285
8,/content/GMD-alignment/data/Muiñeira do Condad...,0.94486,,0.955052
9,/content/GMD-alignment/data/Muiñeira de Pol/al...,0.986251,,0.985886


In [12]:
for i in range(len(pearson_merged)):
      if pearson_merged['coef_corr_chroma'][i] > pearson_merged['coef_corr_spectra'][i]:
        pearson_merged['corr_total'][i] = 'Chroma'
      else:
        pearson_merged['corr_total'][i] = 'Spectra'

display(pearson_merged)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pearson_merged['corr_total'][i] = 'Spectra'
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pearson_merged['corr_total'][i] = 'Chroma'


Unnamed: 0,txt,coef_corr_chroma,corr_total,coef_corr_spectra
0,/content/GMD-alignment/data/Muiñeira da Fonsag...,0.954492,Spectra,0.955487
1,/content/GMD-alignment/data/Muiñeira de Olivei...,0.997942,Chroma,0.993291
2,/content/GMD-alignment/data/Muiñeira_2/alignments,0.950076,Chroma,0.848749
3,/content/GMD-alignment/data/Muiñeira do Hío/al...,0.980367,Chroma,0.908266
4,/content/GMD-alignment/data/Muiñeira de Negros...,0.924768,Spectra,0.949639
5,/content/GMD-alignment/data/Muiñeira de Parada...,0.230136,Spectra,0.980152
6,/content/GMD-alignment/data/Muiñeira do Molete...,0.952227,Spectra,0.952755
7,/content/GMD-alignment/data/Muiñeira de Vilarm...,0.961173,Spectra,0.968285
8,/content/GMD-alignment/data/Muiñeira do Condad...,0.94486,Spectra,0.955052
9,/content/GMD-alignment/data/Muiñeira de Pol/al...,0.986251,Chroma,0.985886


In [13]:
num_chroma = pearson_merged['corr_total'].value_counts()['Chroma']
print('Chroma percentage',(num_chroma/len(pearson_merged))*100,'%')

num_spectra = pearson_merged['corr_total'].value_counts()['Spectra']
print('Spectra percentage',(num_spectra/len(pearson_merged))*100,'%')

if num_spectra>num_chroma:
  print("Spectra's performance is better for this dataset")
else:
  print("\nChroma's performance is better for this dataset")

Chroma percentage 58.333333333333336 %
Spectra percentage 41.66666666666667 %

Chroma's performance is better for this dataset


In [14]:
features_merged = pd.read_csv('/content/drive/Shareddrives/MIR2/features_merged.csv')
display(features_merged)

Unnamed: 0,id,keyMidi,tempoMidi,durationMidi,keyWAV,tempoWAV,durationWav,name,key,time_signature,ratio_negras_corcheas,muiñeira_type,location,lat,lon,classification,province,audio_kind
0,249,C,119.986694,83.75,C,111.792847,95.816193,Muiñeira do Areal,C major,6/8,0.134453781512605,C1,Lugo,430.462.247,-74.739.921,Not classified,LU,REC
1,413,C,80.015915,89.75,C,112.187714,95.816193,Muiñeira de Folgoso,a minor,6/8,0.5135135135135135,C1,"Folgoso do Courel, Lugo",425.882.486,-71.925.449,Courel,LU,REC
2,74,C,129.459579,34.230724,C,122.405617,95.816193,Muiñeira de Monterrei,a minor,6/8,0.2131147540983606,C1,"Monterrei, Ourense",419.467.657,-74.491.474,Portuguese Border,OU,REC
3,9,C,120.005966,108.16655,C,96.897034,102.934059,Muiñeira de Coia,F major,6/8,0.2188841201716738,C1,"Coia, Vigo, Pontevedra",422.182.924,-87.441.073,Rias Baixas,PO,YTB
4,36,C,146.183685,28.12331,C,117.485382,95.816193,Muiñeira de Arabexo,C major,6/8,0.1176470588235294,C1,"Arabexo, Val do Dubra, A Coruña",43.068.020.849.999.900,-8.657.671.483.625.100,Costa do Morte,CO,REC
5,341,C,120.061508,82.25,C,129.108368,62.624218,Muiñeira de Ramelle,c minor,6/8,0.108695652173913,C2,"Ramelle, Friol, Lugo",430.338.813,-78.263.409,Not classified,LU,YTB
6,81,C,135.99913,35.85297,C,123.659935,45.376915,Muiñeira de Pazos de Borbén,C major,6/8,19.047.619.047.619.000,C2,"Pazos de Borbén, Pontevedra",422.849.584,-8.530.047.868.320.790,Rias Baixas,PO,REC
7,135,C,120.011917,50.5,C,112.379189,95.816193,Muiñeira de Ponte Maceira,C major,6/8,0.0,C2,"Ponte Maceira, Negreira, A Coruña",42.905.445,-86.962.309,Costa do Morte,CO,REC
8,78,C,80.001793,82.25,C,117.84095,95.816193,Muiñeira de Eiriz,C major,6/8,0.4680851063829787,C2,"Eiriz, Folgoso do Courel, Lugo",425.894.237,-72.220.799,Courel,LU,REC
9,312,C,120.017487,50.5,C,132.561844,238.480545,Muiñeira do Chao,C major,6/8,0.144927536231884,C1,"O Chao, O Barco de Valdeorras, Ourense",424.189.993,-69.817.358,Not classified,OU,YTB


In [15]:
for i, row in pearson_merged.iterrows():
    path = row['txt']
    pearson_merged.loc[i, 'txt'] = path.split('/')[-2]


In [16]:
display(pearson_merged)

Unnamed: 0,txt,coef_corr_chroma,corr_total,coef_corr_spectra
0,Muiñeira da Fonsagrada,0.954492,Spectra,0.955487
1,Muiñeira de Oliveira,0.997942,Chroma,0.993291
2,Muiñeira_2,0.950076,Chroma,0.848749
3,Muiñeira do Hío,0.980367,Chroma,0.908266
4,Muiñeira de Negros,0.924768,Spectra,0.949639
5,Muiñeira de Parada de Achas,0.230136,Spectra,0.980152
6,Muiñeira do Molete,0.952227,Spectra,0.952755
7,Muiñeira de Vilarmide,0.961173,Spectra,0.968285
8,Muiñeira do Condado,0.94486,Spectra,0.955052
9,Muiñeira de Pol,0.986251,Chroma,0.985886


In [17]:
full_merged = pearson_merged.merge(features_merged, left_on='txt', right_on='name')
display(full_merged)

Unnamed: 0,txt,coef_corr_chroma,corr_total,coef_corr_spectra,id,keyMidi,tempoMidi,durationMidi,keyWAV,tempoWAV,...,key,time_signature,ratio_negras_corcheas,muiñeira_type,location,lat,lon,classification,province,audio_kind
0,Muiñeira da Fonsagrada,0.954492,Spectra,0.955487,130,C,119.996109,82.25,C,117.204842,...,C major,6/8,0.1195652173913043,C1,"A Fonsagrada, Lugo",431.248.311,-70.676.605,A Fonsagrada-Asturias,LU,REC
1,Muiñeira da Fonsagrada,0.954492,Spectra,0.955487,260,C,120.029724,116.75,C,112.188095,...,C major,6/8,0.0814814814814814,C2,"A Fonsagrada, Lugo",431.248.311,-70.676.605,A Fonsagrada-Asturias,LU,REC
2,Muiñeira de Oliveira,0.997942,Chroma,0.993291,126,C,80.001228,61.0,C,129.5056,...,C major,6/8,0.96,,"Santiago de Oliveira, Ponteareas, Pontevedra",421.822.609,-84.523.004,Rias Baixas,PO,REC
3,Muiñeira do Hío,0.980367,Chroma,0.908266,416,C,119.980469,68.75,C,142.695358,...,C major,6/8,0.1785714285714285,C1,"Hío, Cangas, Pontevedra",422.728.376,-8.841.498,Rias Baixas,PO,YTB
4,Muiñeira de Negros,0.924768,Spectra,0.949639,287,C,119.959625,74.5,C,90.643219,...,C major,6/8,0.3444444444444444,C1,"Negros, Redondela, Pontevedra",422.598.627,-86.216.602,Rias Baixas,PO,REC
5,Muiñeira de Parada de Achas,0.230136,Spectra,0.980152,362,C,119.860115,28.25,C,142.645538,...,c minor,6/8,0.1506849315068493,,"Parada de Achas, A Cañiza, Pontevedra",42.173.403,-830.473.225,Rias Baixas,PO,YTB
6,Muiñeira do Molete,0.952227,Spectra,0.952755,317,C,80.002487,82.25,C,112.439461,...,C major,6/8,0.2985074626865671,C2,Pontevedra,426.075.172,-84.714.942,Rias Baixas,PO,REC
7,Muiñeira de Vilarmide,0.961173,Spectra,0.968285,450,C,119.99527,127.25,C,123.979095,...,C major,6/8,0.0920245398773006,C2,"Vilarmide, A Pontenova, Lugo",432.789.245,-71.913.498,A Fonsagrada-Asturias,LU,YTB
8,Muiñeira do Condado,0.94486,Spectra,0.955052,203,C,120.080956,68.75,C,103.3843,...,C major,6/8,0.1063829787234042,C1,"O Condado, Pontevedra",4.217.671.145,-8.488.781.069.275.160,Rias Baixas,PO,REC
9,Muiñeira de Pol,0.986251,Chroma,0.985886,52,C,132.046097,30.999954,C,122.480614,...,C major,6/8,0.1294117647058823,C1,"Pol, Lugo",4.314.281.665,-7.321.889.132.285.170,A Fonsagrada-Asturias,LU,REC


In [19]:
full_merged.to_csv('/content/drive/Shareddrives/MIR2/full_merged48.csv', index=False)