**Description**:

This script use as input the JM labels files generared manually by experts and create several output files for direct usage in this dataset.

The input the JM labels files ('JM_grazing.txt' and 'JM_rumination.txt') are a list specifying the start, end and label of the JMs.

<b>The following outfiles (*.txt and *.csv) are a list specifying the start, end and label of the JMs:</b>

- 'JM_grazing.csv': 'JM_grazing.txt' saved in CSV format. These labels correspond to the 'JM_grazing.wav' file.

- 'JM_rumination.csv': 'JM_rumination.txt' saved in CSV format. These labels correspond to the 'JM_rumination.wav' file.

- 'JM_grazing_adjusted.txt/csv': This file adjusts the expert-generated start and end timestamps closer to the audible bounds of the movements. The labels generated by the experts are not modified. These labels correspond to the 'JM_grazing.wav' file.

- 'JM_rumination_adjusted.txt/csv': This file adjusts the expert-generated start and end timestamps closer to the audible bounds of the JM. The labels generated by the experts are not modified. These labels correspond to the 'JM_rumination.wav' file.

- 'D3Eq4Id2909p3_JM_adjusted.txt/csv': This file contains the timestamps and labels of the 'JM_grazing_adjusted.txt/csv' and 'JM_rumination_adjusted.txt/csv' shifted in time to use with the 'D3RS4ID2909P3.mp3'.

<b>The following outfiles (*.txt and *.csv) are a list specifying the middle position and label of the JMs. The middle position are computed as the average of the start and end generated by the experts:</b>

- 'JM_grazing_one_mark.txt/csv': These labels correspond to the 'JM_grazing.wav' file.

- 'JM_rumination_one_mark.txt/csv': These labels correspond to the 'JM_rumination.wav' file.

- 'D3Eq4Id2909p3_JM_one_mark.txt/csv': These labels correspond to the 'D3RS4ID2909P3.mp3' file.

-------------------------------------------------------------------------
Author: Luciano Martinez Rau (Mid Sweden University - sinc(<i>i</i>)-CONICET)
------------------------------------------------------------------------

In [1]:
def preprocessing(x,freq):
  import numpy as np
  Flower=175
  Fupper=900
  Fcut=4
  Fs=44100
  # compute energy
  b, a = signal.butter(1, [Flower/(freq/2),Fupper/(freq/2)],'bandpass')
  audio_filtered = signal.filtfilt(b, a, x)
  signal_energy = np.power(audio_filtered , 2)
  # compute envelope
  d, c = signal.butter(2, Fcut/(freq/2) ,'low')
  audio_envelope = signal.filtfilt(d ,c , signal_energy)
  audio_envelope[audio_envelope<0]=0
  return audio_envelope

In [2]:
import numpy as np
import pandas as pd
import os
from scipy.io.wavfile import read
import soundfile
from scipy import signal

Specify the path of the JM label generated by experts

In [3]:
if 'google.colab' in str(get_ipython()):
    from google.colab import drive
    drive.mount('/drive',force_remount=True)
    directory = '/drive/My Drive/Colab Notebooks/DatabaseMichigan/data/'
else:
    directory = 'data/'

Import the timestamps and JM labels

In [11]:
fileWAV_grazing = directory+'JM_grazing.wav'
audio_grazing = open(fileWAV_grazing, 'rb')
file_info = soundfile.info(audio_grazing)
freq = file_info.samplerate

fileWAV_rumination = directory+'JM_rumination.wav'
audio_rumination = open(fileWAV_rumination, 'rb')

#envelopeComputation
x,_ = soundfile.read(fileWAV_rumination,dtype='float32')
envelopeRumination = preprocessing(x[:,0],freq)
x,_ = soundfile.read(fileWAV_grazing,dtype='float32')
envelopeGrazing = preprocessing(x[:,0],freq)
del x

#Read JM-events labelled by experts
label_grazing = directory + 'JM_grazing.txt'
label_rumination = directory + 'JM_rumination.txt'
dfGrazing = pd.read_csv(label_grazing,sep='\t',engine='python',header=None,index_col=False)
dfRumination = pd.read_csv(label_rumination,sep='\t',engine='python',header=None,index_col=False)

dfGrazing.columns = ['Start', 'Finish', 'Label']
dfRumination.columns = ['Start', 'Finish', 'Label']

if not(os.path.exists(directory + "JM_grazing.csv")):
    dfGrazing.to_csv(directory + "JM_grazing.csv",index=None)
if not(os.path.exists(directory + "JM_rumination.csv")):
    dfRumination.to_csv(directory + "JM_rumination.csv",index=None)
if not(os.path.exists(directory + "D3RS4ID2909P3_JM.csv")):
    df_mp3 = pd.read_csv(directory +"D3RS4ID2909P3_JM.txt",sep='\t',engine='python',header=None,index_col=False)
    df_mp3.columns = ['Start', 'Finish', 'Label']
    df_mp3.to_csv(directory + "D3RS4ID2909P3_JM.csv",index=None)
    del df_mp3

Compute the adjusted timestamps

In [None]:
offset = 0.05

dfRumination['Start'] = dfRumination['Start'] - offset
dfRumination['Finish'] = dfRumination['Finish'] + offset
dfGrazing['Start'] = dfGrazing['Start'] - offset
dfGrazing['Finish'] = dfGrazing['Finish'] + offset

#compute adjusted Labels for rumination
newBeginLabel = []
newEndLabel = []

for idx in dfRumination.index:
  window_envelope = envelopeRumination[round(dfRumination['Start'][idx]*freq) : round(dfRumination['Finish'][idx]*freq)]
  minEnv = min(window_envelope)
  maxEnv = max(window_envelope)
  posmaxEnv = window_envelope.argmax
  threshold = minEnv + (maxEnv - minEnv)/4
  portion = np.argwhere(window_envelope>threshold)
  newBeginLabel.append(portion[0][0]/freq + dfRumination['Start'][idx])
  newEndLabel.append(portion[-1][0]/freq + dfRumination['Start'][idx])

dfRumiAdjusted = pd.DataFrame(zip(newBeginLabel,newEndLabel,dfRumination['Label']),columns= ['Start', 'Finish', 'Label'])

for idx in dfRumiAdjusted.index:
  if idx>0 and dfRumiAdjusted['Start'][idx] < dfRumiAdjusted['Finish'][idx-1]: 
    dfRumiAdjusted['Start'][idx] = dfRumiAdjusted['Finish'][idx-1]
  if dfRumiAdjusted['Start'][idx] <  dfRumination['Start'][idx]: dfRumiAdjusted['Start'][idx] = dfRumination['Start'][idx]
  if dfRumiAdjusted['Finish'][idx] > dfRumination['Finish'][idx]: dfRumiAdjusted['Finish'][idx] = dfRumination['Finish'][idx]

#compute adjusted Labels for grazing
newBeginLabel = []
newEndLabel = []

for idx in dfGrazing.index:
  window_envelope = envelopeGrazing[round(dfGrazing['Start'][idx]*freq) : round(dfGrazing['Finish'][idx]*freq)]
  minEnv = min(window_envelope)
  maxEnv = max(window_envelope)
  posmaxEnv = window_envelope.argmax
  if dfGrazing['Label'][idx]=='x':
    threshold = minEnv + (maxEnv - minEnv)/64
  else:
    threshold = minEnv + (maxEnv - minEnv)/32
  portion = np.argwhere(window_envelope>threshold)
  newBeginLabel.append(portion[0][0]/freq + dfGrazing['Start'][idx])
  newEndLabel.append(portion[-1][0]/freq + dfGrazing['Start'][idx])

dfGrazAdjusted = pd.DataFrame(zip(newBeginLabel,newEndLabel,dfGrazing['Label']),columns= ['Start', 'Finish', 'Label'])

for idx in dfGrazAdjusted.index:
  if idx>0 and dfGrazAdjusted['Start'][idx] < dfGrazAdjusted['Finish'][idx-1]: 
    dfGrazAdjusted['Start'][idx] = dfGrazAdjusted['Finish'][idx-1]
  if dfGrazAdjusted['Start'][idx] <  dfGrazing['Start'][idx]: dfGrazAdjusted['Start'][idx] = dfGrazing['Start'][idx]
  if dfGrazAdjusted['Finish'][idx] > dfGrazing['Finish'][idx]: dfGrazAdjusted['Finish'][idx] = dfGrazing['Finish'][idx]

Save the files containing the adjusted timestamps wit the JM labels

In [None]:
BeginGrazing, BeginRumination = 7106  , 2820

nameRuminationTXT = directory + 'JM_rumination_adjusted.txt'
nameRuminationCSV = directory + 'JM_rumination_adjusted.csv'
nameGrazingTXT = directory + 'JM_grazing_adjusted.txt'
nameGrazingCSV = directory + 'JM_grazing_adjusted.csv'
dfRumiAdjusted.to_csv(nameRuminationTXT,index=None,sep='\t',header=None,float_format='%.2f')
dfRumiAdjusted.to_csv(nameRuminationCSV,index=None,float_format='%.2f')
dfGrazAdjusted.to_csv(nameGrazingTXT,index=None,sep='\t',header=None,float_format='%.2f')
dfGrazAdjusted.to_csv(nameGrazingCSV,index=None,float_format='%.2f')


dfRumiAdjusted['Start'] = dfRumiAdjusted['Start']+ BeginRumination
dfRumiAdjusted['Finish'] = dfRumiAdjusted['Finish']+ BeginRumination
dfGrazAdjusted['Start'] = dfGrazAdjusted['Start']+ BeginGrazing
dfGrazAdjusted['Finish'] = dfGrazAdjusted['Finish']+ BeginGrazing
dfJoint=pd.concat([dfRumiAdjusted, dfGrazAdjusted], axis=0)
LabelsTXT = directory + 'D3Eq4Id2909p3_JM_adjusted.txt'
LabelsCSV = directory + 'D3Eq4Id2909p3_JM_adjusted.csv'
dfJoint.to_csv(LabelsTXT,index=None,sep='\t',header=None,float_format='%.2f')
dfJoint.to_csv(LabelsCSV,index=None,float_format='%.2f')

Compute the files with one mark timestamps and the JM label

In [8]:
dfGrazing_onemark = pd.DataFrame()
dfRumination_onemark = pd.DataFrame()

dfGrazing_onemark['Average'] = (dfGrazing['Start'] + dfGrazing['Finish']) / 2
dfGrazing_onemark['Label'] = dfGrazing['Label']
dfRumination_onemark['Average'] = (dfRumination['Start'] + dfRumination['Finish']) / 2
dfRumination_onemark['Label'] = dfRumination['Label']

Save the files containing one mark timestamp wit the JM labels

In [10]:
BeginGrazing, BeginRumination = 7106  , 2820

nameRuminationTXT = directory + 'JM_rumination_one_mark.txt'
nameRuminationCSV = directory + 'JM_rumination_one_mark.csv'
nameGrazingTXT = directory + 'JM_grazing_one_mark.txt'
nameGrazingCSV = directory + 'JM_grazing_one_mark.csv'
dfRumination_onemark.to_csv(nameRuminationTXT,index=None,sep='\t',header=None,float_format='%.2f')
dfRumination_onemark.to_csv(nameRuminationCSV,index=None,float_format='%.2f')
dfGrazing_onemark.to_csv(nameGrazingTXT,index=None,sep='\t',header=None,float_format='%.2f')
dfGrazing_onemark.to_csv(nameGrazingCSV,index=None,float_format='%.2f')


dfRumination_onemark['Average'] = dfRumination_onemark['Average']+ BeginRumination
dfGrazing_onemark['Average'] = dfGrazing_onemark['Average']+ BeginGrazing
dfJoint=pd.concat([dfRumination_onemark, dfGrazing_onemark], axis=0)
LabelsTXT = directory + 'D3Eq4Id2909p3_JM_one_mark.txt'
LabelsCSV = directory + 'D3Eq4Id2909p3_JM_one_mark.csv'
dfJoint.to_csv(LabelsTXT,index=None,sep='\t',header=None,float_format='%.2f')
dfJoint.to_csv(LabelsCSV,index=None,float_format='%.2f')