Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluation module #130

Merged
merged 45 commits into from
Apr 1, 2024
Merged

evaluation module #130

merged 45 commits into from
Apr 1, 2024

Conversation

remrama
Copy link
Collaborator

@remrama remrama commented Jan 1, 2023

This PR adds a new yasa.evaluation module to YASA. The purpose of the module is to offer convenience methods for evaluating/comparing sleep-staging from multiple devices or raters. See Issue #78 for more detail.

Major additions:

  • yasa.evaluation.EpochByEpochEvaluation class. This class takes in two YASA Hypnograms and returns an instance where agreement scores and comparison plots can be returned. Suggested use is from the evaluate Hypnogram method: ebe = hyp_a.evaluate(hyp_b).
  • yasa.evaluation.SleepStatsEvaluation class. This class takes in a pandas DataFrame that includes sleep statistics calculated from two different raters/devices, for multiple subjects/sessions.

Minor changes:

  • pingouin>=0.5.3 requirement. Pingouin is used for a lot of the underlying statistics, and 0.5.3 is required because SleepStatsEvaluation.plot_blandaltman uses a feature from pingouin.plot_blandaltman that was introduced in 0.5.3.
  • Minor changes to yasa.plot_hypnogram to help accommodate EpochByEpochEvaluation.plot_hypnograms. Most notably, the explicit lw keyword argument was removed and replaced with **kwargs, which can include lw but also other parameters controlling line aesthetics (e.g., ls, alpha). Note that because lw is still accepted, this will not break previous code.

Future additions (things to consider but will not be implemented in the current PR):

  • Keyword argument to jitter the overlaid hypnogram line (to avoid direct overlap when they are the same, which can be confusing). This is tricky now with the way that y-values are derived in yasa.plot_hypnogram, so avoiding for now.
  • Group EpochByEpoch analysis. There are some ways to analyze group-level Epoch-By-Epoch results (beyond getting sleep statistics and using SleepStatsEvaluation). I'm not sure if this should just be a tutorial with examples or an explicit function/class in YASA. It's mostly just aggregating individual Epoch-By-Epoch results.
  • A tutorial or function/method to output a detailed "report" rather than individual results. Currently, users can of course compile their own report from the individual outputs, but if there is truly a standardized reporting that is accepted in the field (e.g., so meta-analyses are easier), then we could spit out this exact output.

@raphaelvallat I'm posting this still in draft state, not ready for review yet. A few questions to address before finalizing:

  • Currently, the plotting code is all baked into the yasa.evaluation classes. Previously, I've been arguing for keeping all plotting code and imports to be isolated in yasa.plotting. So I'm planning to port it over there, but I want to check this is the right approach before I do this. Am I right in thinking they should stay separate or am I overthinking that? And if all the plotting code is isolated in yasa.plotting, it seems like those functions should have leading underscores and be removed from the API docs (the docs would in the class methods instead). This could be how the evaluation plots start, and yasa.plot_hypnogram could be converted to yasa._plot_hypnogram later in v0.8. But again, I'd like a second opinion on whether this is best.
  • Some terminology I'm not totally satisfied with: test vs reference (ref) devices? true vs pred is more consistent with scikit-learn, but "true" is a bit of a misnomer in the sleep staging case. Could also just be a vs b. Probably overthinking it, I just know we're kinda locked in to whatever we start with.

@remrama remrama added the enhancement 🚧 New feature or request label Jan 1, 2023
@remrama remrama linked an issue Jan 1, 2023 that may be closed by this pull request
@remrama remrama self-assigned this Jan 1, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jan 1, 2023

Codecov Report

Attention: Patch coverage is 18.47291% with 331 lines in your changes are missing coverage. Please review.

Project coverage is 83.90%. Comparing base (776a257) to head (b2e3624).

Files Patch % Lines
yasa/evaluation.py 14.06% 330 Missing ⚠️
yasa/hypno.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #130      +/-   ##
==========================================
- Coverage   92.53%   83.90%   -8.64%     
==========================================
  Files          23       24       +1     
  Lines        3108     3498     +390     
==========================================
+ Hits         2876     2935      +59     
- Misses        232      563     +331     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@raphaelvallat raphaelvallat self-requested a review January 1, 2023 03:53
@raphaelvallat
Copy link
Owner

Just scrolled through the new files very quickly, this looks incredible!! Can't wait to try it and review in details in the next few days.

To answer your questions:

  1. Hmm, I actually might vote for including the new plotting functions in the evaluation.py file. I worry that creating alias for the functions with leading underscore in plotting.py might be a bit confusing code-wise. But if you feel strongly about moving everything matplotlib-related to plotting.py then let's do it.
  2. You're right — we need to choose carefully now. I think test and reference is a good choice. Menghini et al have used reference and device in their original R pipeline, but I like test better than device.

@remrama
Copy link
Collaborator Author

remrama commented Jan 1, 2023

I actually might vote for including the new plotting functions in the evaluation.py file.

Agreed 👍

I think test and reference is a good choice. Menghini et al have used reference and device in their original R pipeline, but I like test better than device.

Agreed. device is a bit misleading because it assumes wearables, but this module is in no way restricted to that case.


Some more questions, and then I'll just leave any more for the real review:

  • I'm thinking I should convert these to "Results" classes, just to match your yasa.detection module convention. Agreed?
  • I'm adding the more formal @property and __repr__/__str__, is that overkill or appropriate here?
  • The EpochByEpoch class starts by trimming to the shorter hypnogram. It's slightly cumbersome code and there are probably going to be other needs for that. I don't wanna overpopulate yasa.Hypnogram but do you think there should be a .trim method that takes a new n_epochs and returns a new Hypnogram instance that is otherwise identical? Alternatively, there could be a .trim_to that could take another Hypnogram and return two Hypnograms matched in n_epochs (this would handle the "find Hypnogram with minimum length" part, but less generalizable).

@raphaelvallat
Copy link
Owner

  1. I'm thinking I should convert these to "Results" classes, just to match your yasa.detection module convention.

Yep sounds good

  1. I'm adding the more formal @Property and repr/str, is that overkill or appropriate here?

Appropriate! I would only add the @property decorator to attributes that are useful to the users. Decorated attributes will be automatically included in the Sphinx documentation and we don't want to end up with an endless list of attributes.

  1. The EpochByEpoch class starts by trimming to the shorter hypnogram.

I actually wanted to tell you that this should be the responsibility of the users. I wouldn't trim at all, just return an error/warning if the hypnogram length and index do not match, especially if the index of the Hypnogram is a pandas.RangeIndex. I guess we could trim/join if the index of the two hypnograms are pd.DatetimeIndex

  1. Do you think there should be a .trim method that takes a new n_epochs and returns a new Hypnogram instance that is otherwise identical?

Nope. I think it's easy enough for the users to regenerate a new Hypnogram after using hyp.hypno.loc[].

yasa/evaluation.py Show resolved Hide resolved
yasa/evaluation.py Outdated Show resolved Hide resolved
yasa/evaluation.py Outdated Show resolved Hide resolved
yasa/evaluation.py Outdated Show resolved Hide resolved
yasa/evaluation.py Outdated Show resolved Hide resolved
yasa/evaluation.py Outdated Show resolved Hide resolved
yasa/evaluation.py Outdated Show resolved Hide resolved
yasa/evaluation.py Outdated Show resolved Hide resolved
yasa/evaluation.py Outdated Show resolved Hide resolved
yasa/evaluation.py Outdated Show resolved Hide resolved
@raphaelvallat raphaelvallat self-requested a review January 2, 2023 22:41
@remrama
Copy link
Collaborator Author

remrama commented Jan 2, 2023

@raphaelvallat I'm still in draft mode! I've been making major changes the last few days. Please wait until I leave draft mode before a review, otherwise I'm afraid this will be waste of your time. Although if you're in a rush to merge this, then also let me know and I'll take quicker steps. Otherwise I except to have the PR finalized and ready for review by tomorrow end of day.

@raphaelvallat
Copy link
Owner

Oh shoot sorry about that! I'll be travelling for the next two days and unable to review but I can review later this week. No rush at all on the PR though!

@remrama
Copy link
Collaborator Author

remrama commented Jan 3, 2023

No worries! It would just suck if you put in lots of effort on old code. No time wasted so far though, your current comments are still useful and relevant so I'll incorporate them. Thanks! 👍

@remrama
Copy link
Collaborator Author

remrama commented Dec 23, 2023

@raphaelvallat, I just pushed some old (and some new) commits. I'm catching up and getting it ready for review. The EpochByEpoch part is done but still working out the SleepStats class, the latter of which is not functional right now. A few more commits and it should be ready for review. I'm with you in publishing the next YASA version over winter break 👍

@raphaelvallat
Copy link
Owner

Hi @remrama ! Unfortunately I was sick for my entire week of vacation during the winter break and did not have a chance to work on the open-source libraries at all (and work has been crazy ever since I got back) 😞 Are you still planning on adding more commits soon?

@raphaelvallat
Copy link
Owner

Can you please rebase to the latest master (version 0.6.4)? Let me know if you need help resolving conflicts.

@remrama
Copy link
Collaborator Author

remrama commented Jan 16, 2024

Same situation for me @raphaelvallat -- out sick for the end of break and then lots of stuff going on at work upon return. But yes, more commits coming soon. I will try to rebase as well.

docstrings examples

plot_hypnogram lw --> linekwargs

heatmap colorbar label

pass kwargs through to all pingouin calls

docstrings examples update

setting attrs, docstrings, var name changes

SleepStatsEval takes 2 dataframes as input, reshaping is done internally

EpochByEpoch accepts sequences of Hypnograms for group evaluation

EpochByEpoch gets sleep stats

SleepStats move statistical tests to __init__()

better plotting flexibility and baked-in sleepstats_order
use hyp integers instead of strings for big skm agreement speed improvements (~2s)

fmt

quick comment addresses

mad

typo

pd.crosstab --> skm.confusion_matrix
minor

normality bug

black fmt makes my pandas chains lonnggggg

diff_data --> discrepancies

trailing not leading _kwargs
class methods alphabetical order
@remrama remrama marked this pull request as ready for review February 12, 2024 14:22
@remrama
Copy link
Collaborator Author

remrama commented Feb 12, 2024

@raphaelvallat this is available to review at your convenience. There are still some things to do, but the core structure is done so I think we could move forward.

After review, I'll still need to:

  • Add Bland-Altman plots
  • Test accuracy
  • Update pytests
  • Update CHANGELOG

Copy link
Owner

@raphaelvallat raphaelvallat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What an absolute beast of a PR @remrama ! I read through the entire code and was really impressed by its quality, efficiency and extensive documentation. I have not yet tried to run the full pipeline but I trust that it is working as expected. I think that we should merge this PR now for the sake of simplicity, and then you and I can work on finishing the tasks that you listed in the PR description and adding an extensive tutorial in future PRs.

Once these are done, I think you need to publicize this amazing work (and get some recognition for it!), e.g. as a Twitter thread and/or as a blog post on the NSRR website.

I've included minor comments but none of these are blockers for merging the current PR.

Thanks again, really appreciate all this work. This is going to be a huge contribution to the sleep research/data science ecosystem.

PS: Can you please have a look into the conflicts on plotting.py?

PS2: Please run black . -l 100 with black>=24.1.1

yasa/evaluation.py Outdated Show resolved Hide resolved
yasa/evaluation.py Outdated Show resolved Hide resolved
yasa/evaluation.py Show resolved Hide resolved
yasa/evaluation.py Show resolved Hide resolved
@remrama
Copy link
Collaborator Author

remrama commented Feb 26, 2024

Merci beaucoup @raphaelvallat, the compliments are really appreciated.

I like your plan of merging and then working on the last bits in separate PR(s). I think it's safest to open up one issue that has all the tasks that must happen before v0.7 (e.g., testing), and then independent issues for things that are ideal but not as necessarily required (e.g., bland-altman plots). Let me know if you would like it implemented a different way.

There's one other thing I'd like to get nailed down before merging this: module name. We started with evaluation, but then I was thinking agreement is nicer because it more implies the evaluation of two scorers against each other. Other options are assessment or, as sklearn handles it, metrics. I assume the classes should match the module name (e.g., evaluate.EpochByEpochEvaluation ), but maybe leaving the module and class names separate (e.g., evaluation.EpochByEpochAgreement) makes sense if you envision other "evaluation" features being part of this module down the road). Let me know your preference and I'll make sure it's consistent before merging.

@raphaelvallat
Copy link
Owner

@remrama I have no strong preference between evaluation.EpochByEpochAgreement or agreement.EpochByEpochAgreement, so if you prefer the latter please go ahead with it!

I think it's safest to open up one issue that has all the tasks that must happen before v0.7 (e.g., testing), and then independent issues for things that are ideal but not as necessarily required (e.g., bland-altman plots).

Sounds great 👍

@remrama
Copy link
Collaborator Author

remrama commented Apr 1, 2024

PS: Can you please have a look into the conflicts on plotting.py?

I'm not sure where the conflict started, but I needed to be able to pass **kwargs into yasa.plot_hypnogram for drawing the two overlaid hypnograms in EpochByEpochAgreement. I think it's resolved now. The only change is that lw is not a specific keyword argument now, but it will pass through to the relevant matplotlib functions just the same as a **kwarg, so there should be no breaks in previous code.

@remrama remrama merged commit c734753 into raphaelvallat:master Apr 1, 2024
13 of 15 checks passed
@remrama remrama deleted the evaluation branch April 1, 2024 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🚧 New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add standardized framework for wearable validation
3 participants