Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data preprocessing codes #1

Closed
katerynaCh opened this issue Dec 11, 2021 · 12 comments
Closed

Data preprocessing codes #1

katerynaCh opened this issue Dec 11, 2021 · 12 comments

Comments

@katerynaCh
Copy link

Hi! I am facing issues trying to reproduce your results on AMIGOS dataset for both binary and multiclass classification (both with training from scratch and with extracting the features first with your provided model). The results I am getting are about 10% lower than reported ones. Can you please share more explicitly what kind of preprocessing has been done. You report the high-pass filtering at 0.8 Hz, in one of the papers you also report person-specific z-normalization. Has anything else been done? In which order have they been performed? Did you apply high-pass filter on the whole sequence or on 10-second segments? Also, did you use the self-reported labels or labels from external annotators?

In general, if you could share the preprocessing codes that would be extremely helpful, even if they are messy.

@katerynaCh katerynaCh changed the title About reproducibility Data preprocessing and reproducibility Dec 11, 2021
@pritamqu
Copy link
Owner

hi - thanks for your interest...
I was wondering, are you using the uploaded pretrained models? or did you train the model by yourself?
could you please tell me a bit more about your experiment setup? Because it's quite expected to have minor differences in accuracies but surely not as big as 10%!!

@katerynaCh
Copy link
Author

Thanks for quick reply! I have tried both ways, training from scratch or using your pretrained model. I preprocess the ECGs by applying a high-pass filter at 0.8 Hz, split into 10 sec non-overlapping segments and et 2560-length vectors, and z-normalize the data (per person). With the provided pretrained model, I extract the features and feed them to the supervised model for Amigos dataset, and train as described in the paper for 100 epochs. In the end I am getting around 72% for binary classification of arousal ( where the binary labels are given as < 5 being negative, otherwise positive). So I suppose that the issue should be somewhere in preprocessing before the feature extraction.

@pritamqu
Copy link
Owner

pritamqu commented Dec 11, 2021

thanks for sharing your setup.. however, I am not sure what is exactly going wrong at your end..
I used Matlab for this filter part and here is the filter code; the same filter can also be designed in python.
hope this helps!

highpass_filter = designfilt('highpassiir', 'StopbandFrequency', 0.4, 'PassbandFrequency', 0.8, ...
'StopbandAttenuation', 60, 'PassbandRipple', 1, 'SampleRate', 256, 'DesignMethod', 'cheby2');

@ZaraNaSha
Copy link

Hi, Thanks for your paper and implementation. I also has a problem with the filtered data. could you share your matlab code that you use for your data?
Best regards.

@ZaraNaSha
Copy link

Hi! I am facing issues trying to reproduce your results on AMIGOS dataset for both binary and multiclass classification (both with training from scratch and with extracting the features first with your provided model). The results I am getting are about 10% lower than reported ones. Can you please share more explicitly what kind of preprocessing has been done. You report the high-pass filtering at 0.8 Hz, in one of the papers you also report person-specific z-normalization. Has anything else been done? In which order have they been performed? Did you apply high-pass filter on the whole sequence or on 10-second segments? Also, did you use the self-reported labels or labels from external annotators?

In general, if you could share the preprocessing codes that would be extremely helpful, even if they are messy.

Hi, I want to know that could you prepare the data for train the model? Is it possible for you to send your code for me?
Best regards.

@katerynaCh
Copy link
Author

Hi! I am facing issues trying to reproduce your results on AMIGOS dataset for both binary and multiclass classification (both with training from scratch and with extracting the features first with your provided model). The results I am getting are about 10% lower than reported ones. Can you please share more explicitly what kind of preprocessing has been done. You report the high-pass filtering at 0.8 Hz, in one of the papers you also report person-specific z-normalization. Has anything else been done? In which order have they been performed? Did you apply high-pass filter on the whole sequence or on 10-second segments? Also, did you use the self-reported labels or labels from external annotators?
In general, if you could share the preprocessing codes that would be extremely helpful, even if they are messy.

Hi, I want to know that could you prepare the data for train the model? Is it possible for you to send your code for me? Best regards.

Hi, after some trying I was still not able to reproduce the results reported in the paper with my preprocessing implementation

@pritamqu
Copy link
Owner

@katerynaCh @zara6697
could you please share the preprocessing codes that you're trying, I may quickly check and let you know if I see any issue. Otherwise, I already shared the filter I used here: #1 (comment)
You may consider seeing the paper as well that has detailed description.

@pritamqu pritamqu changed the title Data preprocessing and reproducibility Data preprocessing codes Apr 20, 2022
@ZaraNaSha
Copy link

Hi! I am facing issues trying to reproduce your results on AMIGOS dataset for both binary and multiclass classification (both with training from scratch and with extracting the features first with your provided model). The results I am getting are about 10% lower than reported ones. Can you please share more explicitly what kind of preprocessing has been done. You report the high-pass filtering at 0.8 Hz, in one of the papers you also report person-specific z-normalization. Has anything else been done? In which order have they been performed? Did you apply high-pass filter on the whole sequence or on 10-second segments? Also, did you use the self-reported labels or labels from external annotators?
In general, if you could share the preprocessing codes that would be extremely helpful, even if they are messy.

Hi, I want to know that could you prepare the data for train the model? Is it possible for you to send your code for me? Best regards.

Hi, after some trying I was still not able to reproduce the results reported in the paper with my preprocessing implementation

Ok, if I reach the result, I would send you the code. thanks.

@ZaraNaSha
Copy link

@katerynaCh @zara6697 could you please share the preprocessing codes that you're trying, I may quickly check and let you know if I see any issue. Otherwise, I already shared the filter I used here: #1 (comment) You may consider seeing the paper as well that has detailed description.

thanks, I used it but the files of signals and the labels is not defined. I do not understand how to save the files.
this is my code which I used for AMIGOS dataset. the text file and the label file is not clear.
path ='C:\Users\p\Downloads\Compressed\am_dataset';
name = dir([path '*.zip']);
highpass_filter = designfilt('highpassiir', 'StopbandFrequency', 0.4, 'PassbandFrequency', 0.8,'StopbandAttenuation', 60, 'PassbandRipple', 1, 'SampleRate', 256, 'DesignMethod', 'cheby2');
for i=1:length(name)
a = unzip([path name(i).name],path);
aa1 = load(a{1});
for j=1:20
aa = aa1.ECG_DATA{j};
aa = aa(:,2);
bb = filter(highpass_filter,aa);
T = table(bb, 'VariableNames', { '1'} );
writetable(T, [path 'filtered' name(i).name(1:end-4) num2str(j) '.txt']);
end
end
Best regards.

@pritamqu
Copy link
Owner

pritamqu commented Apr 30, 2022

I am adding a piece of preprocessing code here for reference. Hope this helps.

import numpy as np
from biosppy.signals import tools as tools

def filter_ecg(signal, sampling_rate):
    
    signal = np.array(signal)
    order = int(0.3 * sampling_rate)
    filtered, _, _ = tools.filter_signal(signal=signal,
                                  ftype='FIR',
                                  band='bandpass',
                                  order=order,
                                  frequency=[3, 45],
                                  sampling_rate=sampling_rate)
    
    return filtered

@ZaraNaSha
Copy link

I am adding a piece of preprocessing code here for reference. Hope this helps.

import numpy as np
from biosppy.signals import tools as tools

def filter_ecg(signal, sampling_rate):
    
    signal = np.array(signal)
    order = int(0.3 * sampling_rate)
    filtered, _, _ = tools.filter_signal(signal=signal,
                                  ftype='FIR',
                                  band='bandpass',
                                  order=order,
                                  frequency=[3, 45],
                                  sampling_rate=sampling_rate)
    
    return filtered

Thanks for your help, could you also help me and say for using the function "def extract_amigos_dataset(overlap_pct, window_size_sec, data_save_path, save):", how should I save the files (for example one file for each subject or all subject in one file)? how should I save the label file?
another question about this function why do you sort the data (line 300 data = np.sort(data) )in this function?

Best regards.

@dousocool
Copy link

我在这里添加了一段预处理代码以供参考。希望这有帮助。

import numpy as np
from biosppy.signals import tools as tools

def filter_ecg(signal, sampling_rate):
    
    signal = np.array(signal)
    order = int(0.3 * sampling_rate)
    filtered, _, _ = tools.filter_signal(signal=signal,
                                  ftype='FIR',
                                  band='bandpass',
                                  order=order,
                                  frequency=[3, 45],
                                  sampling_rate=sampling_rate)
    
    return filtered

感谢您的帮助,您也可以帮助我并说使用“def extract_amigos_dataset(overlap_pct,window_size_sec,data_save_path,保存):”功能,我应该如何保存文件(例如,每个主题或所有主题在一个文件中的一个文件)?我应该如何保存标签文件?关于这个函数的另一个问题,为什么你在这个函数中对数据进行排序(第300行数据= np.sort(data))?

此致敬意。

Hello, I have also encountered difficulties in data processing. I cannot convert dreamer original dataset and wesad original dataset into the format required by the model. How is your current progress? Can you share the format of your dataset after conversion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants