Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kdeplot raising LinAlgError("singular matrix") instead of warning #2762

Closed
Proch92 opened this issue Mar 16, 2022 · 3 comments · Fixed by #2862
Closed

kdeplot raising LinAlgError("singular matrix") instead of warning #2762

Proch92 opened this issue Mar 16, 2022 · 3 comments · Fixed by #2862

Comments

@Proch92
Copy link

Proch92 commented Mar 16, 2022

Example code:

import pandas as pd
import seaborn as sns

df = pd.DataFrame({'a': [1929245168.06679]*18})
sns.kdeplot(data=df, x='a')

seaborn version 0.11.2
python version 3.8.12

Error: numpy.linalg.LinAlgError: singular matrix

Expected: UserWarning: Dataset has 0 variance; skipping density estimate. Pass 'warn_singular=False' to disable this warning.

I tried other types of singular matrixes and the singular warning implemented in 0.11.2 work as expected. (for example: pd.DataFrame({'a': [0]*10}) or pd.DataFrame({'a': [0]}))
The problem seem to arise for particular floats. I tried other big floats and changed the value slightly resulting in different outcome.

Another interesting sequence:
pd.DataFrame({'a': [1929245168.06679]*18}) -> error
pd.DataFrame({'a': [1929245160.06679]*18}) -> error
pd.DataFrame({'a': [1929245100.06679]*18}) -> no error and no warning (singular_warn is True)

@mwaskom
Copy link
Owner

mwaskom commented Mar 16, 2022

Here is what seaborn is doing

import math
print(observation_variance := df["a"].var(ddof=1))
print(math.isclose(observation_variance, 0) )
5.48132967586363e-13
False

@mwaskom
Copy link
Owner

mwaskom commented Mar 16, 2022

I don’t recall if there’s a reason this does a proactive check rather than catching the error, but that may be easier.

mwaskom added a commit that referenced this issue Jun 15, 2022
mwaskom added a commit that referenced this issue Jun 16, 2022
* Improve robustness to numerical errors in kdeplot

Closes #2762

* Avoid finally block for Python 3.7 compat

* Try to catch another edge case in tests
@markusschlenker
Copy link

Here is what seaborn is doing

import math
print(observation_variance := df["a"].var(ddof=1))
print(math.isclose(observation_variance, 0) )
5.48132967586363e-13
False

Hi,
I didn't know where to best put this question, so please let me know if it is misplaced here. Sorry if this has been discussed elsewhere already.

I was wondering if like it is defined here, math.isclose(observation_variance, 0) will give False even for very small observation_variance because it uses relative tolerance by default. The documentation suggests using abs_tol for comparisons near zero https://docs.python.org/3/library/math.html#math.isclose. Was this considered? On the other hand I guess deciding on an abs_tol value is not a straight forward choice either.

I stumbled upon this for the following example:
sns.pairplot(pd.DataFrame([0] * 7), diag_kind="kde", diag_kws=dict(warn_singular=True)) --> no error, but warning
sns.pairplot(pd.DataFrame([0.34] * 7), diag_kind="kde", diag_kws=dict(warn_singular=True)) --> error, no warning
sns.pairplot(pd.DataFrame([0.341] * 7), diag_kind="kde", diag_kws=dict(warn_singular=True)) --> no error, no warning

I am using
Python 3.7.13
seaborn 0.11.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants