Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numpy hypnogram format #15

Closed
alexblnn opened this issue Mar 11, 2021 · 7 comments
Closed

numpy hypnogram format #15

alexblnn opened this issue Mar 11, 2021 · 7 comments

Comments

@alexblnn
Copy link

Hi and sorry to bother you again Mathias!

I'm not managing to launch training with numpy hypnograms (in my case, they represent binary labels).

My np arrays contain 3 sub arrays of equal length corresponding to start, duration and label (I tried to follow the conventions mentionned in utime/hypnogram/formats.py) but I get an error when trying to train:

Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/utime/dataset/sleep_study.py", line 595, in load
self._load()
File "/opt/conda/lib/python3.7/site-packages/utime/dataset/sleep_study.py", line 555, in _load
sample_rate=header["sample_rate"])
File "/opt/conda/lib/python3.7/site-packages/utime/io/high_level_file_loaders.py", line 106, in load_hypnogram
sample_rate=sample_rate)
File "/opt/conda/lib/python3.7/site-packages/utime/io/hypnogram/hyp_extractors.py", line 144, in extract_hyp_data
ann_to_class=annotation_dict
File "/opt/conda/lib/python3.7/site-packages/utime/hypnogram/utils.py", line 280, in sparse_hypnogram_from_ids_format
ann_class_ints = [ann_to_class[a] for a in annotations]
File "/opt/conda/lib/python3.7/site-packages/utime/hypnogram/utils.py", line 280, in
ann_class_ints = [ann_to_class[a] for a in annotations]
KeyError: 0.0

Am i formatting my data wrong?

Best regards,

@perslev
Copy link
Owner

perslev commented Mar 12, 2021

Hi! Sorry I have been busy and completely forgot to return to you. Please keep letting me know if you experience issues.

The numpy format is a bit special (this is not well documented), it actually expects a flat and dense format of stages, that is 1 integer stage label for each period/segment in your input. For instance, if your input signal is 10 minutes long and your period length is 30 seconds then the npz file should store just a single array of shape [20,]. Finally, the data type should be integer and not float. Have a look at:

def extract_from_np(file_path, period_length_sec, sample_rate):

and

def ndarray_to_ids_format(array, period_length_sec, sample_rate):

Let me know if you need further help.

Cheers,
Mathias

@alexblnn
Copy link
Author

alexblnn commented Mar 12, 2021

EDIT : solved my removing the "sleep_stage_annotations" part in the yaml dataset file, finally i've been able to start training :)

Hi Mathias, thanks for you quick answer!

Each of my input signals are 90 seconds long and I have a label for each second. I have set "period_length_sec" to 1 second in the yaml dataset file. My PSG data is sampled at 100Hz.

I modified my .npy files in order that each contains a (90,) array of dtype int64, but I still get the same error. By the looks of it the code doesn't use "extract_from_np" as I would expect. Maybe utime is not recognizing that my hypnograms are in the numpy format? [ann_to_class[a] for a in annotations] doesn't seem to be coherent with the numpy format.

Here is the error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/utime/dataset/sleep_study.py", line 595, in load
self._load()
File "/opt/conda/lib/python3.7/site-packages/utime/dataset/sleep_study.py", line 555, in _load
sample_rate=header["sample_rate"])
File "/opt/conda/lib/python3.7/site-packages/utime/io/high_level_file_loaders.py", line 106, in load_hypnogram
sample_rate=sample_rate)
File "/opt/conda/lib/python3.7/site-packages/utime/io/hypnogram/hyp_extractors.py", line 144, in extract_hyp_data
ann_to_class=annotation_dict
File "/opt/conda/lib/python3.7/site-packages/utime/hypnogram/utils.py", line 280, in sparse_hypnogram_from_ids_format
ann_class_ints = [ann_to_class[a] for a in annotations]
File "/opt/conda/lib/python3.7/site-packages/utime/hypnogram/utils.py", line 280, in
ann_class_ints = [ann_to_class[a] for a in annotations]
KeyError: 0

Best regards,
Alexandre

@alexblnn
Copy link
Author

alexblnn commented Mar 12, 2021

I have a last question: it seems that by default utime uses the sparse categorical cross entropy loss, is it possible to use the sparse generalized dice loss you mentioned in the paper? My dataset is pretty highly imbalanced (7% for the "1" class only) and I believe it could be useful.

Thanks again for your help!

@perslev
Copy link
Owner

perslev commented Mar 12, 2021

Great that you made it work! :)

You can use the dice loss by replacing 'SparseCategoricalCrossentropy' with 'SparseDiceLoss' in the hyperparameter file.

@alexblnn
Copy link
Author

Thanks for the tip; weirdly enough the dice loss doesn't seem to manage to handle the class imbalance. After a few epochs the model always predicts 0 (majority class), do you have an idea of what could cause this?

@perslev
Copy link
Owner

perslev commented Mar 15, 2021

Hmm that is strange indeed. Did you try to let it run for a bit longer and see if it picks up on the minority class? And how does the learning curve look, is the loss decreasing at all?

@alexblnn
Copy link
Author

Hi Mathias, sorry for not updating you on this earlier, i forgot about this issue. I had a problem with GCP and I lost my UTime notebook, so I wasn't able to do any more testing. Anyways, thanks for your help on this and good luck in the future :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants