Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spindles detection failing for some users #107

Closed
raphaelvallat opened this issue Nov 29, 2022 · 25 comments
Closed

Spindles detection failing for some users #107

raphaelvallat opened this issue Nov 29, 2022 · 25 comments
Assignees
Labels
bug 💥 Something isn't working URGENT ❗ To fix ASAP

Comments

@raphaelvallat
Copy link
Owner

As reported in #102 and #104, the yasa.spindles_detect function seems to be suddenly failing for some users. I am not able to reproduce the error.

Minimal (reproducible) example

Data here: data_N2_spindles_15sec_200Hz.txt

import numpy as np
import pandas as pd
import mne
import numba
import scipy
import yasa
import antropy
from platform import python_version

print("Python", python_version())
print(f"YASA {yasa.__version__}")
print(f"MNE {mne.__version__}")
print(f"NumPy {np.__version__}")
print(f"Pandas {pd.__version__}")
print(f"Numba {numba.__version__}")
print(f"SciPy {scipy.__version__}")
print(f"Antropy {antropy.__version__}")

sf = 200
data = np.loadtxt('data_N2_spindles_15sec_200Hz.txt')
sp = yasa.spindles_detect(data, sf)
sp.summary()

My output ✅

image

@remrama @chensnbetter @meniua @apavlo89 can you please run this minimal example and copy-paste a screenshot of the output here?

If you're getting the error, can you please try to run this notebook and let us know for any useful warnings/error?

Let's solve this! Thanks all

@raphaelvallat raphaelvallat added bug 💥 Something isn't working URGENT ❗ To fix ASAP labels Nov 29, 2022
@raphaelvallat raphaelvallat self-assigned this Nov 29, 2022
@meniua
Copy link

meniua commented Nov 30, 2022

I ran this example and the errors are as follows:
图片

@chensnbetter
Copy link

test1

test2

@chensnbetter
Copy link

And I use YASA in a Python(3.10.6) environment.
test3

@raphaelvallat
Copy link
Owner Author

🤔 I really don't get it. I have tried updating my Pandas, Numpy, SciPy, Numba, re-installing YASA and it's still working fine on my computer. The only difference I see is that you are running Python 3.9+ and I'm running Python 3.8. I really don't see how this could affect YASA to be honest. Has anyone else been able to run successfully the example with Python 3.9+?

@raphaelvallat
Copy link
Owner Author

Code is still working fine after updating to Python 3.10.6 🤯 And I have the exact same Python package versions as your 3.10.6 environment @chensnbetter.

image

Are you both using Windows btw? Is this bug specific to Windows?

@remrama
Copy link
Collaborator

remrama commented Nov 30, 2022

Works for me on Windows 🤷

Very mysterious...

@remrama
Copy link
Collaborator

remrama commented Nov 30, 2022

I wonder if someone who is getting the error could mess around with some of the detection parameters to see if that helps identify where the detection is failing. Sorry I can't try it myself, I'm just not able to reproduce the initial problem.

@chensnbetter
Copy link

Thanks. I'm using windows.

@remrama
Copy link
Collaborator

remrama commented Dec 1, 2022

Ok I think I found the problem. I was able to reproduce it on my dusty Desktop. It has to do with the moving correlation thresholding, and I know this sounds insane but I think it depends on a tiny difference in Windows version.

Before I go into detail can some people help verify something for me? Please show your output from this slightly modified version of the original sample code.

import sys
import platform
import numpy as np
import pandas as pd
import mne
import numba
import scipy
import yasa
import antropy

print("System", platform.platform())
print("Python", sys.version)
print(f"YASA {yasa.__version__}")
print(f"MNE {mne.__version__}")
print(f"NumPy {np.__version__}")
print(f"Pandas {pd.__version__}")
print(f"Numba {numba.__version__}")
print(f"SciPy {scipy.__version__}")
print(f"Antropy {antropy.__version__}")

sf = 200
data = np.loadtxt('data_N2_spindles_15sec_200Hz.txt')

# I expect this to work for everyone.
print("=" * 10, "Run with no moving correlation threshold.", "=" * 10)
sp = yasa.spindles_detect(data, sf, thresh={"corr": None}, verbose=True)

# I expect this to still break (no spindles found) for a subset of users.
print("=" * 10, "Run with moving correlation threshold set to default 0.65.", "=" * 10)
sp = yasa.spindles_detect(data, sf, verbose=True)

Screenshot 2022-12-01 134219

@chensnbetter
Copy link

test1

test2

@remrama
Copy link
Collaborator

remrama commented Dec 2, 2022

Alright! Thank you so much @chensnbetter! Sorry about all the requests, but I appreciate you running and sharing all these outputs, it's really helpful. We still need to work out the details of this bug, but for now you should be able to get through the tutorials by setting thresh={"corr": None} inside yasa.spindles_detect should detect spindles for you and get you running through the tutorial. Just change this one line:

# Apply the detection using yasa.spindles_detect
# sp = yasa.spindles_detect(data, sf)  # Old
sp = yasa.spindles_detect(data, sf, thresh={"corr": None})  # New

Thanks again for your output message, I'm now sure it was not an issue of Windows versioning, at least from what I can tell (so I guess I was crazy). Spindles are detected on my machine that has the same Windows version as you 10.0.19044, so it's not that. So what is it?

Moving correlation thresholding is the problem

cc @raphaelvallat

The problem comes from the corr thresholding, probably from the underlying yasa.numba._corr function? Or maybe some of the mne filtering. Wherever it is, I suspect this has to do with some seriously small differences in float point operations or rounding, but I'm not sure (could be something weird af like this obscure thread).

Turning on YASA's verbosity gives us insight. On a working system, keeping the default corr threshold to 0.65 you get the following message generated from line 885:

01-Dec-22 12:39:53 | INFO | N supra-theshold moving corr = 345

But on a system that detects no spindles, you will get this:

01-Dec-22 12:39:53 | INFO | N supra-theshold moving corr = 0

This message is generated by finding all values of the mcorr array that are greater than or equal to the corr threshold (should be 345, but it's 0 on systems that find no spindles). mcorr is returned by yasa.moving_transform, which is likely where the problem is. Note it's specific to setting method="corr", the other threshold methods don't give the error, so it might occur within the numba _corr() call. Maybe some float-point discrepancies across systems occurring within there? I have no numba experience so not totally sure how to approach this further but happy to take hints.

@remrama
Copy link
Collaborator

remrama commented Dec 2, 2022

Numba is the problem. If you replace the use of numba-based _corr() with scipy.stats.pearsonr to calculate the correlation, spindles are found as expected.

Based on printing stuff to debug, the breakdown occurs at rden. It's value is nan, and so the final returned value, a division of rden, is also nan. And so the returned numba list of r values is all nan.

So why is rden = nan on some systems but not others? I HAVE NO IDEA! The biggest mystery -- and I wouldn't believe me if I told me this -- is that the problem goes away if I add a print line right before or after the rden calculation, and only there. I'm NOT KIDDING. You can find some similar experiences on stackoverflow (eg) where printing inside a numba function causes unexpected behavior, and most thoughts are that it has to do with the nopython compiling from numba. Maybe using print with nopython=True causes weirdness. But noone seems to have any solution, probably because like in our situation the problem doesn't reproduce across systems.

jfc

@remrama
Copy link
Collaborator

remrama commented Dec 2, 2022

The only immediate solution I can think of is hideous. You could do some kind of check with the return from _corr and if you don't like it you could use pearsonr from scipy instead. Not sure what to do exactly.

@raphaelvallat
Copy link
Owner Author

Thanks so much @remrama for your in-depth investigation! So I think the error was introduced in #86. Interestingly, someone opened a very similar PR on antropy recently, and their solution was not to do an if..else statement, but instead to add a very small value to the denominator to avoid division by zero error.

Can you please try to run the code with:

import sys
eps = sys.float_info.epsilon

@jit("float64(float64[:], float64[:])", nopython=True)
def _corr(x, y):
    """Fast Pearson correlation."""
    n = x.size
    mx, my = x.mean(), y.mean()
    xm2s, ym2s, r_num = 0, 0, 0
    for i in range(n):
        xm = x[i] - mx
        ym = y[i] - my
        r_num += xm * ym
        xm2s += xm**2
        ym2s += ym**2
    r_d1 = np.sqrt(xm2s)
    r_d2 = np.sqrt(ym2s)
    r_den = r_d1 * r_d2
    return r_num / (r_den + eps)

If this is not working you can just do:

return r_num / (r_den + 10e-9)

@remrama
Copy link
Collaborator

remrama commented Dec 6, 2022

@raphaelvallat this doesn't solve the problem :/ Still no spindles detected.

I think this issue is different from that in Antropy, because when working properly (on most systems it seems), the r_den values are nowhere near zero! If I understand it correctly, which I'm not sure I do.

The current behavior is still truly bizarre to me. See here, showing no spindles found when making the zero-adjustment.

no_spindles_found

But with SAME technical code but now printing r_den, everything works completely as expected... Literally adding a print brings r_den to life 🧟 🤔

spindles_found_when_printing

@raphaelvallat
Copy link
Owner Author

It is truly bizarre.. Do you get the same bug when setting nopython=False?

I think we'll need to somehow re-implement this function then. The issue is that the spindles detection is already quite slow, and I fear that it will be even worse if we switch to np.corrcoef or scipy.stats.pearson... I'm out sick this week but I can run some benchmarks next week.

@remrama
Copy link
Collaborator

remrama commented Dec 9, 2022

Setting nopython=False does not help, but I think it makes sense because the compilation isn't failing and so that isn't changing behavior in our case. I think that only changes the fallback behavior if the initial compilation fails, which would've raised earlier errors if it happened. What we want to do is force the use of the fallback object mode. To do that, we can add forceobj=True, and that makes it work.

But the problem is, there are 100 other things that also fix the problem, and they all fix the problem independently. I've been racking my brain to try and figure out what common underlying change is going on in Numba when any of these fixes are used:

Things that fix _corr()

  • Set forceobj=True - I didn't find this too useful once I realized there were other fixes. It's basically saying it works when treated as normal Python code, but in these other patches it doesn't have to be.
  • Set fastmath=True - This felt like it was going to be informative because it changes the underlying computations, so this would've made total sense. But alas, you can revert back to default fastmath=False and all the next things will will fix it.
  • Set signature to "float64(float64[::1], float64[::1])" or just remove it entirely - We're using 1d arrays that should comply with both C- and F-contiguous expectations (check with the ndarray.flags), so I think the current signature specification of A-contiguous is supposed to work. And it does on most systems. But not here, though you can fix the OG problem by telling numba to expect C-contiguous, or don't tell it anything at all and it will infer it as C.
  • Set error_model="numpy" - This also seemed promising. The error_model parameter changes how numba handles zero-division. But the default is python, which should raise an error when any zero-division attempt is made, but it isn't doing that even when I make a simplified example. Despite not having errors in test cases, when switching to error_model="numpy", OG problem is fixed.
  • Set boundscheck=True - Works with just this change.
  • Use np.arange instead of range - Yes. This change -- all alone by itself -- fixes the problem.
  • Pass xm2s, ym2s, r_num as keyword arguments - Do this instead of setting them within the function and it works. Between this and the last one, I thought maybe this was a sneaky object-mode thing, but numba tutorials are constantly setting variables and using range in their functions.

Again, all of these are patches by themselves. I have a jupyter notebook with printouts of each separate fix. I didn't link it here bc it's a bit extra but I could if it would help. While it seems great (easy fix!), it's a bit unsettling to me because I have no idea what the root cause is. I'm hesitant to implement any of these because to me it all seems like completely unexpected behavior and has the potential for more downstream disruptions. I feel like these are just a ton of red herrings. I also feel like I'm losing my mind.

@raphaelvallat
Copy link
Owner Author

Wow, you went full-on into the rabbit hole 🤯

From all these solutions, I think I like the np.arange best because it will not affect performance and is straightforward — although I don't understand it. boundscheck=True is also good but it might cost some precious nanoseconds.

To be clear, this is not related to one specific Numba version?

@remrama
Copy link
Collaborator

remrama commented Dec 9, 2022

Yes the np.arange change seems most reasonable now that you mention it.

I haven't checked with other numba versions. I'll explore that, and if it's all good I'll submit a PR with that simple change.

@remrama
Copy link
Collaborator

remrama commented Dec 10, 2022

Not a version issue, it's some kind of conda environment issue for me. Though looking at previous screenshots, some users were using system Python so I suppose this only adds to confusion.

I'm running this test.py file:

import numpy
import scipy
import numba
import llvmlite
print(f"numpy-{numpy.__version__}")
print(f"scipy-{scipy.__version__}")
print(f"numba-{numba.__version__}")
print(f"llvmlite-{llvmlite.__version__}")

import numpy as np
from numba import jit

@jit("float64(float64[:], float64[:])")
def _corr(x, y):
    """Fast Pearson correlation."""
    n = x.size
    mx, my = x.mean(), y.mean()
    xm2s, ym2s, r_num = 0, 0, 0
    for i in range(n):
        xm = x[i] - mx
        ym = y[i] - my
        r_num += xm * ym
        xm2s += xm**2
        ym2s += ym**2
    r_d1 = np.sqrt(xm2s)
    r_d2 = np.sqrt(ym2s)
    r_den = r_d1 * r_d2
    return r_num / r_den

xx = np.arange(1, 100, dtype=np.float64)
yy = np.arange(1, 100, dtype=np.float64)
print("Correlation =", _corr(xx, yy))

Screenshot 2022-12-09 221816

@raphaelvallat -- want me to just change to np.arange and call it a day?

@raphaelvallat
Copy link
Owner Author

@remrama really confusing indeed. Btw I realized that we can even simplify further by using zip:

for xi, yi in zip(x, y):
    xm = xi - mx
    ym = yi - my
    r_num += xm * ym
    xm2s += xm**2
    ym2s += ym**2

It seems to be even a bit faster than the previous implementation. Can you check whether you have the bug when using zip?

@remrama
Copy link
Collaborator

remrama commented Dec 11, 2022

Works. Great idea. Submitting PR now 🚀

@raphaelvallat
Copy link
Owner Author

Bugfix implemented in #115

Will need to release a new version of YASA soon

@raphaelvallat
Copy link
Owner Author

I have just released a new version of YASA that should fix this issue. Please update YASA with pip install -U yasa.

Thanks,
Raphael

@skjerns
Copy link

skjerns commented Dec 22, 2022

Wow, this was really some awesome detective work @remrama

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 💥 Something isn't working URGENT ❗ To fix ASAP
Projects
None yet
Development

No branches or pull requests

5 participants