Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Explicitly ignore SparseEfficiencyWarning in DBSCAN #13539

Merged
merged 1 commit into from
Apr 1, 2019

Conversation

peay
Copy link
Contributor

@peay peay commented Mar 28, 2019

What does this implement/fix? Explain your changes.

py is a commonly used library, which has various utility subpackages, one of them being apipkg which implements lazy loading of Python modules. py is marked as a runtime dependency of more than 9K packages on PyPI, so it isn't too easy to avoid.

Importing py itself directly lazily loads certain packages like py.test. Lazy module loading is implemented by adding to sys.modules an apipkg.AliasModule, which implements __getattribute__ as:

        def __getattribute__(self, name):
            try:
                return getattr(getmod(), name)
            except ImportError:
                return None

When py.test is not installed or cannot be imported, this result in sys.modules["py.test"] (and a few others) being AliasModule instances for which module.__warningregistry__ is None (as are all attributes).

This is causing issues with sklearn. Here is an example I've been hitting although I suspect there could be other occurences:

  • The DBSCAN implementation uses ignore_warnings to silence warnings.
  • ignore_warnings in turns relies on clean_warning_registry.

This makes it impossible to run DBSCAN once py has been imported, because clean_warning_registry is calling clear() on all modules that have a __warningregistry__ attribute:

/opt/venv/python3/lib/python3.6/site-packages/sklearn/cluster/dbscan_.py in dbscan(X, eps, min_samples, metric, metric_params, algorithm, leaf_size, p, sample_weight, n_jobs)
    140 
    141         # set the diagonal to explicit values, as a point is its own neighbor
--> 142         with ignore_warnings():
    143             X.setdiag(X.diagonal())  # XXX: modifies X's internals in-place
    144 

/opt/venv/python3/lib/python3.6/site-packages/sklearn/utils/testing.py in __enter__(self)
    366         self._module.filters = self._filters[:]
    367         self._showwarning = self._module.showwarning
--> 368         clean_warning_registry()
    369         warnings.simplefilter("ignore", self.category)
    370 

/opt/venv/python3/lib/python3.6/site-packages/sklearn/utils/testing.py in clean_warning_registry()
    799             continue
    800         if hasattr(mod, reg):
--> 801             getattr(mod, reg).clear()
    802 
    803 

AttributeError: 'NoneType' object has no attribute 'clear'

This PR adds an additional check that __warningregistry__ is a dictionnary.

Any other comments?

I am not familiar enough with the project to know how we would want this to be fixed. In particular:

  • The issue really stems from py's lazily loaded modules, and one may wonder whether sklearn should be worrying about this at all. Here, the issue could also be fixed by making sure py.test can be imported.

  • However, __warningregistry__ isn't really documented either, and one could argue that sklearn should be checking whether it is a dictionnary before calling clear on it.

I'll let the maintainers decide whether this small additional check makes sense, in spite of this being a fix for a rather specific problem.

@peay peay changed the title [MRG] Check module is dict in clean_warning_registry [MRG] Check __warningregistry__ is dict in clean_warning_registry Mar 28, 2019
@jnothman
Copy link
Member

jnothman commented Mar 28, 2019 via email

@peay
Copy link
Contributor Author

peay commented Mar 29, 2019

Makes sense.

I assume we want to switch to warnings.catch_warnings/warnings.filterwarnings although I am not sure what warning we're actually silencing here. @TomDLT what do you think? Seems like this was added as part of #12105.

@jnothman
Copy link
Member

jnothman commented Mar 29, 2019 via email

@TomDLT
Copy link
Member

TomDLT commented Mar 29, 2019

My mistake. We should definitely replace ignore_warnings with warnings.catch_warnings/warnings.filterwarnings, ignoring only SparseEfficiencyWarning.

I double checked that it is the only use of ignore_warnings outside tests, examples, and benchmarks.

@peay
Copy link
Contributor Author

peay commented Apr 1, 2019

I've commented out the with clause and ran DBSCAN again (with a sparse precomputed metric) to try and figure out what the warning we're trying to silence is, but on scikit-learn==0.20.2, numpy==1.13.1, scipy==1.1.0, I don't get anything. Not sure if the with ignore_warnings() could just be removed or if there are warnings that could appear in other circumstances.

@peay
Copy link
Contributor Author

peay commented Apr 1, 2019

Oops, hadn't seen your answer @TomDLT. I will make the change.

@peay peay changed the title [MRG] Check __warningregistry__ is dict in clean_warning_registry [WIP] Check __warningregistry__ is dict in clean_warning_registry Apr 1, 2019
@peay peay changed the title [WIP] Check __warningregistry__ is dict in clean_warning_registry [MRG] Explicitly ignore SparseEfficiencyWarning in DBSCAN Apr 1, 2019
@peay peay force-pushed the utils-testing-clean-warning-reg branch from cdb1f80 to e7abd0a Compare April 1, 2019 06:07
@peay
Copy link
Contributor Author

peay commented Apr 1, 2019

I was able to reproduce the warning with csr/csc by setting include_self to False in radius_neighbors_graph - it doesn't occur if the input sparse matrix already has all its diagonal entries set since then the operation is efficient. I've pushed the new changes and renamed the PR.

@jnothman
Copy link
Member

jnothman commented Apr 1, 2019

Please add commits rather than force pushing where possible

@jnothman jnothman added this to the 0.20.4 milestone Apr 1, 2019
@TomDLT TomDLT merged commit 6216b24 into scikit-learn:master Apr 1, 2019
@TomDLT
Copy link
Member

TomDLT commented Apr 1, 2019

Thanks @peay !

jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Apr 25, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Jul 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants