Training on another sleep database #9

PaulMuadDib · 2018-10-03T13:56:24Z

Hi Simon,
I am starting to train the CNN/RNN on other sleep databases, and the 1st one is the SHHS database, but I have a doubt about the way to set the channels that are used (channels) and I did this:

sleep.available_channels
['SaO2', 'H.R.', 'EEG(sec)', 'ECG', 'EMG', 'EOG(L)', 'EOG(R)', 'EEG', 'THOR RES', 'ABDO RES', 'POSITION', 'LIGHT', 'NEW AIR', 'OX stat']
channels = {'EEG':['EEG(sec)','EEG'], 'EMG':'EMG', 'EOG':['EOG(L)','EOG(R)']} # set channels that are used: EEG(sec) = C3/A2 & EEG = C4/A1

Is it the correct way to retain the two available EEG channels 'EEG(sec)' and 'EEG' ? I ask because the script/training is running, so it has recognized/infered the EEG channel(s), but it missed the EOG channels.
Jeff

skjerns · 2018-10-03T14:15:15Z

I do not know the details of this dataset, but in principle, you can use whichever channels and channel combinations you want. I would not be surprised if it would also work with ECG (however be aware of the different sample frequency that needs to be interpolated).

PaulMuadDib · 2018-10-03T14:32:38Z

Thanks Simon. I was just wondering if the square brackets (a list) where syntaxically the correct way to retain several (available) channels like 'EEG(sec)' and 'EEG' (which presumably are extracted from a edf file header).

PaulMuadDib · 2018-10-04T18:05:36Z

I thought the scorer could be used with the two available EEG channels of the PSG (plus EOG & EMG, but data is shaped with a last dim of 3 in the SleepLoader), hence my question. (Resampling it to 100 Hz) Would it be possible to put the other available EEG channel in the 'EMG' field in the channel definition, for instance in this way : channels = { 'EEG' : 'EEG(SEC)', 'EMG' : 'EEG', 'EOG' : 'EOG(L)' } ? (in the SHHS, 'EEG(SEC)' and 'EEG' are C3/A2 and C4/A1).

skjerns · 2018-10-04T20:17:51Z

I don't quite understand what you mean?

The network can be trained with any number of channels, and any configuration of channels, but you will need to adapt the script a bit to adjust the input layer of the network. Once a network is trained, the channels are fixed and you can't input any other channels or combinations.

The already trained weights provided by this repository do necessarily require the existence of EEG/EMG and EOG.

balandongiv · 2018-10-08T08:17:55Z

If you would allow me to rephrase Paul question.
Say we have different data-set with the following setting
EEG channels: C3_A2 and C4_1A. These two channel were refered as EEG(sec) and EEG, respectively in the SHHS dataset
EMG channel: EMG SUBMENTAL
EOG channel: 'EOG HORIZONTAL'

How to set the variable channels as in run_sample

my best guess will

channels = {'EEG': ('EEG(sec)', 'EEG'), 'EMG': 'EMG SUBMENTAL', 'EOG': 'EOG HORIZONTAL'}

Update!
However, by doing so, the program failed at def check_channels(self, header):

PaulMuadDib · 2018-10-08T15:19:16Z

Sorry for bringing confusion here.

I am indeed reusing the script run_sample.py, where one EEG channel (among the two available), one EMG and one EOG channels are used.
My first question was indeed to know whether it is possible to use both EEG channels only modifying the channels definition, and I asked if the following (for 5 channels) was the correct way to do this:

channels = {'EEG':['EEG(sec)','EEG'], 'EMG':'EMG', 'EOG':['EOG(L)','EOG(R)']}

But doing this will not enable to recruit/use the two EEG channels (and the two EOG channels), it is only useful in the case where the name of a channel changes from one edf file to another. And in the SHHS database, this happens to be the case: the channel 'EEG(SEC)' is also named 'EEG2' depending on the files in the SHHS1 repository. In this case,

channels = {'EEG':['EEG(sec)','EEG2'], 'EMG':'EMG', 'EOG':'EOG(L)'}

...did the job: selecting either EEG(SEC) or EEG2 depending on the edf file. Recruiting more channels than the 3 used maybe requires to modify varied scripts (models.py, SleepLoader.py, etc...).

balandongiv · 2018-10-08T16:10:19Z

Pcode1
The modification should work and been tested

In run_sample, change channels to

  ` channels = {'EEG': ('EEG(SEC)', 'EEG'), 'EMG': 'EMG', 'EOG': 'EOG(L)'}  `

Under def check_channels change the following

                if isinstance(ch_pick, tuple):
                    s1 = bool(((set([x.upper() for x in ["%s" % x for x in (self.channels[ch])]])).intersection(set(channels))))
                    if s1 is False: raise ValueError(
                    'ERROR: Channel {} for {} not found in {}\navailable channels: {}'.format(self.channels[ch], ch,
                                                                                              filename, channels))
                    for i in ch_pick:
                        picks.append(channels.index(i.upper()))
                else:                           # Newly add: 
                    picks.append(channels.index(ch_pick.upper()))

balandongiv · 2018-10-08T16:11:27Z

SImon, may I know the following for-loop is for what.

Im unable to modify the following, because I do not know under what circumstances it will triger the for-loop. This syntax under def check_channels

            if type(ch_pick) is list:
                found = False
                for c in ch_pick:
                    if c in channels:
                        found = True
                        ch_pick = c
                        break
                if found == False:   raise ValueError(
                    'ERROR: Channel {} for {} not found in {}\navailable channels: {}'.format(self.channels[ch], ch,
                                                                                              filename,

PaulMuadDib · 2018-10-08T18:09:33Z

I think this for-loop is precisely the one which picks up 'EEG(SEC)' or 'EEG2' (when the 'EEG' dict. item is declared as a list of strings like in: channels = {'EEG':['EEG(sec)','EEG2'],.....}) from the available channels (from header), depending on the SHHS edf files...

balandongiv · 2018-10-09T02:58:21Z

Yes you are right. The for-loop was for class list. But with my proposed workaround, we can omit this for loop, since we declared multiple EEG channels as a set

PaulMuadDib · 2018-10-09T16:32:08Z

Or, if you want to retain the possibility to select ('EEG(SEC)' OR 'EEG2') AND 'EEG' like with:
channels = {'EEG': (['EEG(SEC)','EEG2], 'EEG'), 'EMG': 'EMG', 'EOG': 'EOG(L)'}
(useful for the SHHS database since you will find either 'EEG(SEC)' or 'EEG2' for the same C3-A2 channel in the edf files), maybe something like the following ? (no tested, there might be some other modifications to implement)
`

def check_channels_ext(self, header):
    channels = [c.upper() for c in header.ch_names]
    filename = os.path.basename(header.filenames[0])
    labels = []
    picks  = []
    for ch in self.channels:
        ch_pick= []
        if not self.channels[ch]: continue
        ch_to_pick = self.channels[ch]
        if type(ch_to_pick) is tuple:				# if there are several channels to pick up
            for chs in ch_to_pick:
                if type(chs) is list:
                    for c in chs: 
                        if c in channels: 
                            ch_pick.append(c)
                            break
                elif type(chs) is str:
                    if chs in channels:
                        ch_pick.append(chs)
                else:
                    raise TypeError('ERROR: The channel {} type in the channels {} to select should be a list [ch1,ch2] to select ch1 or ch2, or a string'.format(ch,ch_to_pick))
            if not len(ch_pick)==len(ch_to_pick):
                print('WARNING: Some channels of {} in {} are missing'.format(self.channels[ch], ch))
        elif type(ch_to_pick) is list:				# if there is one channel to pick between several ones
            for chs in ch_to_pick: 
                if chs in channels: 
                    ch_pick = c
                    break
        else:						        # if there only one channel to pick up
            if ch_to_pick.upper() in channels:  ch_pick = ch_to_pick
			
        if len(ch_pick)==0:   raise ValueError('ERROR: None of the channels {} for {} are found in {}\navailable channels: {}'.format(self.channels[ch], ch, filename, channels))

        picks.append(channels.index(ch_pick.upper()))
        labels.append(ch)

`

PaulMuadDib · 2018-10-09T16:54:18Z

Btw, the best model for the 50 males between 38 and 58 yrs drawn from the 200 cases of SHH1 only reaches:
fold 6: val acc/f1: 0.80947/0.73136, test acc/f1: 0.76639/0.68004

using both EEG channels (EEG(SEC) and EEG), no EMG, and EOG(L)-EOG(R)
using only a slim model (models.cnn3adam_filter_morel2_slim)(since the 4 Go of VRAM my of GTX980M GPU are not sufficients on this database, although it ran smoothly for the EDFx database) with these parameters:

`

kernel_length = 100
output_filters= 500
...
model.add(Conv1D (kernel_size = (kernel_length), filters = output_filters, strides=10,...				  
# model.layers[0].set_weights([np.array(kernel).reshape(kernel_shape), bias])      # initialize input layer weights with a dict. of atoms (wavelets waveforms)
model.add(Conv1D (kernel_size = (10), filters = 300, strides=2,...
model.add(Conv1D (kernel_size = (5), filters = 200, strides=1,....
model.add(Dense (1200, activation='relu', kernel_initializer='he_normal',name='fc1'))	
model.add(Dense (1200, activation='relu', kernel_initializer='he_normal',name='fc2'))

`

balandongiv · 2018-10-10T05:14:36Z

Seem not the best performance you have there. Are you using the pre-trained parameter?.

skjerns · 2018-10-11T06:32:00Z

Congratulations on getting it work!

Regarding the accuracy: There are a multitude of things that can be held responsible here: Using a different architecture, different data set, not using all channels. Additionally, I found that each dataset has it's own hyperparameters that achieve optimal performance: Changing the learning rate, stronger/weaker regularization, using a different initialization, etc. Sometimes it turns out that just the random seed was at fault for not reaching a high performance. It's a gamble.
If the scoring within the dataset is consistent (inter-rater reliability), it will certainly be possible to approximate this inter-rater score.

Unfortunately neural networks are still notoriously difficult to debug.

skjerns · 2018-10-12T11:37:31Z

Please be aware, that as far as I remember (correct me if I'm wrong), this data set cuts off all EEG data at 120uV,which is a large part of slow oscillations and arousals etc. For this reason I also did not use it for my study.

PaulMuadDib · 2018-10-13T17:43:56Z

I was not aware that the dynamical range of the EEG channels was limited to 120µV.
Last week, I used Spinky to see whether it could produce useful features for AutoSleepScorer (to input in its rnn part, along with the last cnn layer signals) and did not see clipping in the EEG channels. But clipping should be easily detectable when we load the data, I will check this.

PaulMuadDib · 2018-10-13T17:55:04Z

Using a balanced training, or not, made a difference... would it make sense to you the more I select similar individuals (gender, age, bmi, %rem, etc...) in this database, the harder it is to train on the selected cohort & to produce good scoring stats ? I was really expecting the opposite.

skjerns · 2018-10-15T07:55:24Z

The higher your variance in training data, the higher your generalization will be. If you select only very similar individiuals both for testing and training, you should get high accuracies.

Be aware, that each scorer has a large bias (read this recent study). With a bit of bad luck the scorer of the data set you're using has a more extreme bias and generalization will not work very well, as we're basically learning to copy his scoring style.

PaulMuadDib · 2018-10-27T11:44:03Z

Yes Simon, thank you for this reference, the quality of the human scoring on which the automatic scorer is training maybe a main driver for this... but difficult to assess for the varied sleep databases (edfs, shhs, mass, etc) commonly used for machine learning ? Not being an expert in sleep research, I confess that in a first approximation, I would tend to look in the publications or in simulation how easily a given learning structure trains & reaches good training/testing accuracies on the varied sleep databases... but maybe this is more complex than that.

skjerns · 2018-10-27T12:33:13Z

The biggest problem in my opinion is, that there is no Gold standard and that current sleep staging is just a sub-optimal practice kept alive since the 60s. Probability scores calculated from several raters is the closest we can get to a gold standard, but that's extremely expensive.

PaulMuadDib · 2018-11-16T12:47:50Z

The EDFx sleep database is the one scored with the largest number (4 of them) of human experts ?

skjerns · 2018-11-17T13:12:20Z

No, see here for details: https://physionet.org/pn4/sleep-edfx/

If you find a database with 4 scorers per record, let me know.

PaulMuadDib · 2018-11-20T19:12:42Z

I misunderstood the table 1 p 15 of your thesis.
I just got a very strange result on the same SHHS data with a spiking neural network (based on bindsnet) trained with STDP on the spectrograms (rapidly reaching a 100% accuracy, 99% f1 value on 250x30s batches, and a very nice confusion matrix on the last 2500x30s)... I need to recheck these results and see how they transfer on the test signals but:

skjerns · 2018-11-21T11:49:33Z

100% accuracy is certainly overfitting. With sleep data, if you get past 90% accuracy on your training set, you can be sure that you were overfitting. Also numbers on the training set are meaningless. A neural network of sufficient size will always be able to reach 100% accuracy on the training set, simply by "memorizing" the input. This is why you need to report on the validation or even test set.

What's your reasoning for using bindsnet? Interesting approach.
Nice to see you are working with this, keep on going!

PaulMuadDib · 2018-11-21T12:37:25Z

Even if the same accuracy / f1 value is obtained on testing data, the remaining ~30% (4000 30s frames) of the dataset ? (14 000 30sec frames). Overfitting also popped into my mind but these accuracies were reached very early (at 5-6000 frames) in the training (10 000 30sec frames): the SNN hasn't learnt anything from nearly half of the training dataset and was basically already (during the training phase) in a predictive mode (not learning anything). The 50 individuals drawn from the SHHS are very similar (age, BMI, no to mild apnea,...). With the AutoSleepScorer (with 2D convs and a larger sequence = 12 for the RNN) on the two EEG & EOG, I have never exceeded 86% of accuracy & 76% of F1 value, so reaching above 99% (rounded to 100) in less than 1 hour of CPU looked very suspicious to me, also knowing the nature of the dataset itself.
This SNN is very simple, it is based on Diehl & Cook 2015 implementation on Bindsnet and has been used on the MNIST database originally: there are 3 neurons populations: an input/coding population which encodes each time-frequency bin of the spectrogram (of the sum of EEG channels) into a Poissonian spike train, an excitatory neurons population (1000) and an inhibitory population (1000) which are interacting (mutually connected). Only the connections between the input/coding population (100x100) and the excitatory population (1000) are learnt through STDP. I am going to go through this spiking code to see what I may be missing here, to test it on a more general population, to slim it to see how performance evolves, and to see if I can get interpretable stuff from its weights (how it discriminates the spectrograms parts).

skjerns · 2018-11-21T13:56:42Z

If this is on hold out data I assume there is some error in your setup.
Reaching 100%, especially with sleep data, is close to impossible, due to the limit of the low the inter- and intra-rater-reliability. I'd look at everything above 90% with high suspicion. You will not reach higher than that on AASM sleep staged data (staged by humans).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training on another sleep database #9

Training on another sleep database #9

PaulMuadDib commented Oct 3, 2018 •

edited

Loading

skjerns commented Oct 3, 2018

PaulMuadDib commented Oct 3, 2018

PaulMuadDib commented Oct 4, 2018

skjerns commented Oct 4, 2018

balandongiv commented Oct 8, 2018 •

edited

Loading

PaulMuadDib commented Oct 8, 2018 •

edited

Loading

balandongiv commented Oct 8, 2018 •

edited

Loading

balandongiv commented Oct 8, 2018

PaulMuadDib commented Oct 8, 2018

balandongiv commented Oct 9, 2018 •

edited

Loading

PaulMuadDib commented Oct 9, 2018

PaulMuadDib commented Oct 9, 2018 •

edited

Loading

balandongiv commented Oct 10, 2018

skjerns commented Oct 11, 2018 •

edited

Loading

skjerns commented Oct 12, 2018

PaulMuadDib commented Oct 13, 2018

PaulMuadDib commented Oct 13, 2018

skjerns commented Oct 15, 2018 •

edited

Loading

PaulMuadDib commented Oct 27, 2018 •

edited

Loading

skjerns commented Oct 27, 2018

PaulMuadDib commented Nov 16, 2018

skjerns commented Nov 17, 2018

PaulMuadDib commented Nov 20, 2018 •

edited

Loading

skjerns commented Nov 21, 2018 •

edited

Loading

PaulMuadDib commented Nov 21, 2018 •

edited

Loading

skjerns commented Nov 21, 2018 •

edited

Loading

Training on another sleep database #9

Training on another sleep database #9

Comments

PaulMuadDib commented Oct 3, 2018 • edited Loading

skjerns commented Oct 3, 2018

PaulMuadDib commented Oct 3, 2018

PaulMuadDib commented Oct 4, 2018

skjerns commented Oct 4, 2018

balandongiv commented Oct 8, 2018 • edited Loading

PaulMuadDib commented Oct 8, 2018 • edited Loading

balandongiv commented Oct 8, 2018 • edited Loading

balandongiv commented Oct 8, 2018

PaulMuadDib commented Oct 8, 2018

balandongiv commented Oct 9, 2018 • edited Loading

PaulMuadDib commented Oct 9, 2018

PaulMuadDib commented Oct 9, 2018 • edited Loading

balandongiv commented Oct 10, 2018

skjerns commented Oct 11, 2018 • edited Loading

skjerns commented Oct 12, 2018

PaulMuadDib commented Oct 13, 2018

PaulMuadDib commented Oct 13, 2018

skjerns commented Oct 15, 2018 • edited Loading

PaulMuadDib commented Oct 27, 2018 • edited Loading

skjerns commented Oct 27, 2018

PaulMuadDib commented Nov 16, 2018

skjerns commented Nov 17, 2018

PaulMuadDib commented Nov 20, 2018 • edited Loading

skjerns commented Nov 21, 2018 • edited Loading

PaulMuadDib commented Nov 21, 2018 • edited Loading

skjerns commented Nov 21, 2018 • edited Loading

PaulMuadDib commented Oct 3, 2018 •

edited

Loading

balandongiv commented Oct 8, 2018 •

edited

Loading

PaulMuadDib commented Oct 8, 2018 •

edited

Loading

balandongiv commented Oct 8, 2018 •

edited

Loading

balandongiv commented Oct 9, 2018 •

edited

Loading

PaulMuadDib commented Oct 9, 2018 •

edited

Loading

skjerns commented Oct 11, 2018 •

edited

Loading

skjerns commented Oct 15, 2018 •

edited

Loading

PaulMuadDib commented Oct 27, 2018 •

edited

Loading

PaulMuadDib commented Nov 20, 2018 •

edited

Loading

skjerns commented Nov 21, 2018 •

edited

Loading

PaulMuadDib commented Nov 21, 2018 •

edited

Loading

skjerns commented Nov 21, 2018 •

edited

Loading