Skip to content

Commit

Permalink
Make scipy an optional dependency (#2398)
Browse files Browse the repository at this point in the history
* Vendor reduced version of scipy's KDE, adapt existing KDE code

* Adapt violinplot and stripplot to not require scipy

* Remove vendored kde docstring examples

* Fix missing import

* Remove more unhelpful doctests

* Note that cumulative kdeplot requres scipy

* Make scipy optional in matrix module

* Replace scipy iqr calls and adapt old distplot code

* Standardize flag for scipy availability

* Define extra dependencies in setup file

* Rework CI workflow for new dependency strategy

* Pass extras_require to setuptools

* Remove previously deprecated utility functions

* Remove scipy from categorical tests

* Remove scipy from regression module

* Remove scipy from categorical

* Remove scipy from distribution tests

* Protect scipy-using code in matrix tests

* Remove scipy from distributions tests (again)

* Fix docs about scipy requirements

* Update installing docs

* Update release notes

* Cover some missing-scipy-triggered errors in tests

* Update README and Remove requirements file
  • Loading branch information
mwaskom committed Dec 23, 2020
1 parent 0a24478 commit f1852b5
Show file tree
Hide file tree
Showing 21 changed files with 618 additions and 280 deletions.
14 changes: 9 additions & 5 deletions .github/workflows/ci.yaml
Expand Up @@ -25,8 +25,7 @@ jobs:
- name: Install seaborn
run: |
python -m pip install --upgrade pip
pip install `cat ci/deps_latest.txt ci/utils.txt`
pip install .
pip install .[all] -r ci/utils.txt
- name: Install doc tools
run: |
Expand All @@ -47,20 +46,24 @@ jobs:

python: [3.7.x, 3.8.x, 3.9.x]
target: [test]
install: [all]
deps: [latest]
backend: [agg]

include:
- python: 3.7.x
target: unittests
install: all
deps: pinned
backend: agg
- python: 3.9.x
target: unittests
deps: minimal
install: light
deps: latest
backend: agg
- python: 3.9.x
target: test
install: all
deps: latest
backend: tkagg

Expand All @@ -77,8 +80,9 @@ jobs:
- name: Install seaborn
run: |
python -m pip install --upgrade pip
pip install `cat ci/deps_${{ matrix.deps }}.txt ci/utils.txt`
pip install .
if [[ ${{matrix.install}} == 'all' ]]; then EXTRAS='[all]'; fi
if [[ ${{matrix.deps }} == 'pinned' ]]; then DEPS='-r ci/deps_pinned.txt'; fi
pip install .$EXTRAS $DEPS -r ci/utils.txt
- name: Cache datastes
run: python ci/cache_test_datasets.py
Expand Down
15 changes: 12 additions & 3 deletions README.md
Expand Up @@ -29,20 +29,29 @@ Dependencies

Seaborn supports Python 3.7+ and no longer supports Python 2.

Installation requires [numpy](https://numpy.org/), [scipy](https://www.scipy.org/), [pandas](https://pandas.pydata.org/), and [matplotlib](https://matplotlib.org/). Some functions will optionally use [statsmodels](https://www.statsmodels.org/) if it is installed.
Installation requires [numpy](https://numpy.org/), [pandas](https://pandas.pydata.org/), and [matplotlib](https://matplotlib.org/). Some functions will optionally use [scipy](https://www.scipy.org/) and/or [statsmodels](https://www.statsmodels.org/) if they are available.


Installation
------------

The latest stable release (and older versions) can be installed from PyPI:
The latest stable release (and required dependencies) can be installed from PyPI:

pip install seaborn

It is also possible to include the optional dependencies:

pip install seaborn[all]

You may instead want to use the development version from Github:

pip install git+https://github.com/mwaskom/seaborn.git#egg=seaborn
pip install git+https://github.com/mwaskom/seaborn.git

Seaborn is also available from Anaconda and can be installed with conda:

conda install seaborn

Note that the main anaconda repository typically lags PyPI in adding new releases.

Testing
-------
Expand Down
5 changes: 0 additions & 5 deletions ci/deps_latest.txt

This file was deleted.

4 changes: 0 additions & 4 deletions ci/deps_minimal.txt

This file was deleted.

2 changes: 1 addition & 1 deletion ci/deps_pinned.txt
@@ -1,5 +1,5 @@
numpy~=1.16.0
scipy~=1.2.0
pandas~=0.23.0
matplotlib~=3.0.0
scipy~=1.2.0
statsmodels~=0.9.0
21 changes: 12 additions & 9 deletions doc/installing.rst
Expand Up @@ -9,12 +9,17 @@ Installing and getting started

<div class="col-md-9">

Official releases of seaborn can be installed from `PyPI
<https://pypi.org/project/seaborn/>`_::
Official releases of seaborn can be installed from `PyPI <https://pypi.org/project/seaborn/>`_:

pip install seaborn

The library is also included as part of the `Anaconda <https://repo.anaconda.com/>`_ distribution::
The basic invocation of `pip` will install seaborn and, if necessary, its mandatory dependencies.
It is possible to include optional dependencies that give access to a few advanced features:

pip install seaborn[all]

The library is also included as part of the `Anaconda <https://repo.anaconda.com/>`_ distribution,
and it can be installed with `conda`:

conda install seaborn

Expand All @@ -26,15 +31,11 @@ Supported Python versions

- Python 3.7+

Required dependencies
Mandatory dependencies
^^^^^^^^^^^^^^^^^^^^^^

If not already present, these libraries will be downloaded when you install seaborn.

- `numpy <https://numpy.org/>`__

- `scipy <https://www.scipy.org/>`__

- `pandas <https://pandas.pydata.org/>`__

- `matplotlib <https://matplotlib.org>`__
Expand All @@ -44,7 +45,9 @@ Optional dependencies

- `statsmodels <https://www.statsmodels.org/>`__, for advanced regression plots

- `fastcluster <https://pypi.org/project/fastcluster/>`__, for clustering large matrices
- `scipy <https://www.scipy.org/>`__, for clustering matrices and some advanced options

- `fastcluster <https://pypi.org/project/fastcluster/>`__, faster clustering of large matrices

Quickstart
~~~~~~~~~~
Expand Down
6 changes: 5 additions & 1 deletion doc/releases/v0.12.0.txt
Expand Up @@ -2,4 +2,8 @@
v0.12.0 (Unreleased)
--------------------

- Following `NEP29 <https://numpy.org/neps/nep-0029-deprecation_policy.html>`_, dropped support for Python 3.6 and bumped the minimally-supported versions of the library dependencies.
- Made `scipy` an optional dependency and added `pip install seaborn[all]` as a method for ensuring the availability of compatible `scipy` and `statsmodels` libraries. This has a few minor implications for existing code, which are explained in the Github pull request (:pr:`2398`).

- Following `NEP29 <https://numpy.org/neps/nep-0029-deprecation_policy.html>`_, dropped support for Python 3.6 and bumped the minimally-supported versions of the library dependencies.

- Removed several previously-deprecated utility functions (`iqr`, `percentiles`, `pmf_hist`, and `sort_df`).
4 changes: 0 additions & 4 deletions requirements.txt

This file was deleted.

14 changes: 11 additions & 3 deletions seaborn/_statistics.py
Expand Up @@ -26,7 +26,12 @@ class instantiation.
"""
from numbers import Number
import numpy as np
from scipy import stats
try:
from scipy.stats import gaussian_kde
_no_scipy = False
except ImportError:
from .external.kde import gaussian_kde
_no_scipy = True

from .utils import _check_argument

Expand Down Expand Up @@ -61,7 +66,7 @@ def __init__(
clip : pair of numbers None, or a pair of such pairs
Do not evaluate the density outside of these limits.
cumulative : bool, optional
If True, estimate a cumulative distribution function.
If True, estimate a cumulative distribution function. Requires scipy.
"""
if clip is None:
Expand All @@ -74,6 +79,9 @@ def __init__(
self.clip = clip
self.cumulative = cumulative

if cumulative and _no_scipy:
raise RuntimeError("Cumulative KDE evaluation requires scipy")

self.support = None

def _define_support_grid(self, x, bw, cut, clip, gridsize):
Expand Down Expand Up @@ -129,7 +137,7 @@ def _fit(self, fit_data, weights=None):
if weights is not None:
fit_kws["weights"] = weights

kde = stats.gaussian_kde(fit_data, **fit_kws)
kde = gaussian_kde(fit_data, **fit_kws)
kde.set_bandwidth(kde.factor * self.bw_adjust)

return kde
Expand Down
26 changes: 17 additions & 9 deletions seaborn/categorical.py
@@ -1,18 +1,26 @@
from textwrap import dedent
from numbers import Number
import warnings
import colorsys
from functools import partial

import numpy as np
from scipy import stats
import pandas as pd
try:
from scipy.stats import gaussian_kde
_no_scipy = False
except ImportError:
from .external.kde import gaussian_kde
_no_scipy = True

import matplotlib as mpl
from matplotlib.collections import PatchCollection
import matplotlib.patches as Patches
import matplotlib.pyplot as plt
import warnings

from ._core import variable_type, infer_orient, categorical_order
from . import utils
from .utils import remove_na
from .utils import remove_na, _normal_quantile_func
from .algorithms import bootstrap
from .palettes import color_palette, husl_palette, light_palette, dark_palette
from .axisgrid import FacetGrid, _facet_docs
Expand Down Expand Up @@ -662,7 +670,7 @@ def estimate_densities(self, bw, cut, scale, scale_hue, gridsize):

def fit_kde(self, x, bw):
"""Estimate a KDE for a vector of data with flexible bandwidth."""
kde = stats.gaussian_kde(x, bw)
kde = gaussian_kde(x, bw)

# Extract the numeric bandwidth from the KDE object
bw_used = kde.factor
Expand Down Expand Up @@ -942,7 +950,7 @@ def draw_box_lines(self, ax, data, support, density, center):
"""Draw boxplot information at center of the density."""
# Compute the boxplot statistics
q25, q50, q75 = np.percentile(data, [25, 50, 75])
whisker_lim = 1.5 * stats.iqr(data)
whisker_lim = 1.5 * (q75 - q25)
h1 = np.min(data[data >= (q25 - whisker_lim)])
h2 = np.max(data[data <= (q75 + whisker_lim)])

Expand Down Expand Up @@ -1099,7 +1107,7 @@ def __init__(self, x, y, hue, data, order, hue_order,
jlim = float(jitter)
if self.hue_names is not None and dodge:
jlim /= len(self.hue_names)
self.jitterer = stats.uniform(-jlim, jlim * 2).rvs
self.jitterer = partial(np.random.uniform, low=-jlim, high=+jlim)

def draw_stripplot(self, ax, kws):
"""Draw the points onto `ax`."""
Expand All @@ -1120,7 +1128,7 @@ def draw_stripplot(self, ax, kws):

# Plot the points in centered positions
cat_pos = np.ones(strip_data.size) * i
cat_pos += self.jitterer(len(strip_data))
cat_pos += self.jitterer(size=len(strip_data))
kws.update(c=palette[point_colors])
if self.orient == "v":
ax.scatter(cat_pos, strip_data, **kws)
Expand All @@ -1138,7 +1146,7 @@ def draw_stripplot(self, ax, kws):
# Plot the points in centered positions
center = i + offsets[j]
cat_pos = np.ones(strip_data.size) * center
cat_pos += self.jitterer(len(strip_data))
cat_pos += self.jitterer(size=len(strip_data))
kws.update(c=palette[point_colors])
if self.orient == "v":
ax.scatter(cat_pos, strip_data, **kws)
Expand Down Expand Up @@ -1846,7 +1854,7 @@ def _lv_box_ends(self, vals):
elif self.k_depth == 'proportion':
k = int(np.log2(n)) - int(np.log2(n * p)) + 1
elif self.k_depth == 'trustworthy':
point_conf = 2 * stats.norm.ppf((1 - self.trust_alpha / 2)) ** 2
point_conf = 2 * _normal_quantile_func((1 - self.trust_alpha / 2)) ** 2
k = int(np.log2(n / point_conf)) + 1
else:
k = int(self.k_depth) # allow having k as input
Expand Down
7 changes: 4 additions & 3 deletions seaborn/distributions.py
Expand Up @@ -11,7 +11,6 @@
import matplotlib.transforms as tx
from matplotlib.colors import to_rgba
from matplotlib.collections import LineCollection
from scipy import stats

from ._core import (
VectorPlotter,
Expand All @@ -34,6 +33,7 @@
)
from .palettes import color_palette
from .external import husl
from .external.kde import gaussian_kde
from ._decorators import _deprecate_positional_args
from ._docstrings import (
DocstringComponents,
Expand Down Expand Up @@ -2395,7 +2395,8 @@ def _freedman_diaconis_bins(a):
a = np.asarray(a)
if len(a) < 2:
return 1
h = 2 * stats.iqr(a) / (len(a) ** (1 / 3))
iqr = np.subtract.reduce(np.nanpercentile(a, [75, 25]))
h = 2 * iqr / (len(a) ** (1 / 3))
# fall back to sqrt(a) bins if iqr is 0
if h == 0:
return int(np.sqrt(a.size))
Expand Down Expand Up @@ -2642,7 +2643,7 @@ def pdf(x):
gridsize = fit_kws.pop("gridsize", 200)
cut = fit_kws.pop("cut", 3)
clip = fit_kws.pop("clip", (-np.inf, np.inf))
bw = stats.gaussian_kde(a).scotts_factor() * a.std(ddof=1)
bw = gaussian_kde(a).scotts_factor() * a.std(ddof=1)
x = _kde_support(a, bw, gridsize, cut, clip)
params = fit.fit(a)
y = pdf(x)
Expand Down

0 comments on commit f1852b5

Please sign in to comment.