Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: roadmap update #12008

Merged
merged 5 commits into from
May 17, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 52 additions & 28 deletions doc/source/roadmap-detailed.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,32 +65,41 @@ Making this easier is a priority. In addition, we should run them in our CI
(gh-8779 is an ongoing attempt at this).


Other
`````

Use of Cython
`````````````
Regarding Cython code:

- It's not clear how much functionality can be Cythonized without making the
.so files too large. This needs measuring.
- Cython's old syntax for using NumPy arrays should be removed and replaced
with Cython memoryviews.

Regarding build environments:

- SciPy builds from source on Windows now with a MSVC + MinGW-w64 gfortran
toolchain, which we're using for official releases.
MSVC + Intel Fortran + MKL works as well, and is easier for users (as long
as they have access to ifort and MKL of course). This mainly needs better
documentation at the moment.
- We're aiming to gradually increase the minimum version of LAPACK that is
required, so we can use newer features. Support for Accelerate on macOS
has been dropped. We do rely quite heavily on OpenBLAS, and its stability
is a worry (often only one of the recent releases works without test
failures) - improvements in testing and build documentation at least are
needed.
Windows build issues
````````````````````
SciPy critically relies on Fortran code. This is still problematic on Windows.
There are currently only two options: using Intel Fortran, or using MSVC +
gfortran. The former is expensive, while the latter works (it's what we use
for releases) but is quite hard to do correctly. For allowing contributors and
end users to reliably build SciPy on Windows, using the Flang compiler looks
like the best way forward long-term.


Continuous integration is in good shape, it covers Windows, macOS and Linux, as well
as a range of versions of our dependencies and building release quality wheels.
Continuous integration
``````````````````````
Continuous integration is in good shape, it currently covers the Windows, macOS
and Linux, ARM64 and ppc64le platforms, as well as a range of versions of our
dependencies and building release quality wheels.


Size of binaries
````````````````
SciPy binaries are quite large (e.g. an unzipped manylinux wheel for 1.4.1 is
91 MB), and this can be problematic - for example for use in AWS Lambda, which
has a 250 MB size limit. We aim to keep binary size as low as possible; when
adding new compiled extensions, this needs checking. Stripping of debug symbols
in ``multibuild`` can likely be improved (see `this issue
<https://github.com/matthew-brett/multibuild/issues/162>`__).


Modules
Expand All @@ -108,12 +117,8 @@ This module is basically done, low-maintenance and without open issues.

fft
````
This module is in good shape.

Ideas for new features:

- Add a backend/plugin system. At the moment pyFFTW is monkeypatching SciPy,
and ``mkl_fft`` provides ``fftpack``-compatible functions as well. We should
provide a method to support such packages.

integrate
`````````
Expand Down Expand Up @@ -152,9 +157,7 @@ are in good shape.

linalg
``````
``scipy.linalg`` is in good shape. We have started requiring more recent
LAPACK versions (minimum version increases from 3.1.0 to 3.4.0 in SciPy 1.2.0);
we want to add support for newer features in LAPACK.
``scipy.linalg`` is in good shape.

Needed:

Expand All @@ -168,6 +171,27 @@ Ideas for new features:
- Add type-generic wrappers in the Cython BLAS and LAPACK
- Make many of the linear algebra routines into gufuncs

**BLAS and LAPACK**

The Python and Cython interfaces to BLAS and LAPACK in ``scipy.linalg`` are one
of the most important things that SciPy provides. In general ``scipy.linalg``
is in good shape, however we can make a number of improvements:

1. Library support. Our released wheels now ship with OpenBLAS, which is
currently the only feasible performant option (ATLAS is too slow, MKL cannot
be the default due to licensing issues, Accelerate support is dropped
because Apple doesn't update Accelerate anymore). OpenBLAS isn't very stable
though, sometimes its releases break things and it has issues with threading
(currently the only issue for using SciPy with PyPy3). We need at the very
least better support for debugging OpenBLAS issues, and better documentation
on how to build SciPy with it. An option is to use BLIS for a BLAS
interface (see `numpy gh-7372 <https://github.com/numpy/numpy/issues/7372>`__).

2. Support for newer LAPACK features. In SciPy 1.2.0 we increased the minimum
supported version of LAPACK to 3.4.0. Now that we dropped Python 2.7, we
can increase that version further (MKL + Python 2.7 was the blocker for
>3.4.0 previously) and start adding support for new features in LAPACK.


misc
````
Expand Down Expand Up @@ -414,12 +438,12 @@ The following improvements will help SciPy better serve this role.
- multivariate t distribution
- mixture distributions

- Improve the core calculations provided by SciPy's probability distributions
- Improve the core calculations provided by SciPy's probability distributions
so they can robustly handle wide ranges of parameter values. Specifically,
replace many of the PDF and CDF methods from the Fortran library CDFLIB
used in scipy.special with better code, perhaps ported from the Boost C++
library.

In addition, we should:

- Continue work on making the function signatures of ``stats`` and
Expand All @@ -432,5 +456,5 @@ In addition, we should:
example implement an exact two-sided KS test (see
`gh-8341 <https://github.com/scipy/scipy/issues/8341>`__) or a one-sided
Wilcoxon test (see `gh-9046 <https://github.com/scipy/scipy/issues/9046>`__).
- Address the various issues regarding ``stats.mannwhitneyu``, and pick up the
- Address the various issues regarding ``stats.mannwhitneyu``, and pick up the
stalled PR in `gh-4933 <https://github.com/scipy/scipy/pull/4933>`__.
111 changes: 51 additions & 60 deletions doc/source/roadmap.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,59 +8,10 @@ going forward. For a more detailed roadmap, including per-subpackage status,
many more ideas, API stability and more, see :ref:`scipy-roadmap-detailed`.


Evolve BLAS and LAPACK support
------------------------------

The Python and Cython interfaces to BLAS and LAPACK in ``scipy.linalg`` are one
of the most important things that SciPy provides. In general ``scipy.linalg``
is in good shape, however we can make a number of improvements:

1. Library support. Our released wheels now ship with OpenBLAS, which is
currently the only feasible performant option (ATLAS is too slow, MKL cannot be
the default due to licensing issues, Accelerate support is dropped because
Apple doesn't update Accelerate anymore). OpenBLAS isn't very stable though,
sometimes its releases break things and it has issues with threading (currently
the only issue for using SciPy with PyPy3). We need at the very least better
support for debugging OpenBLAS issues, and better documentation on how to build
SciPy with it. An option is to use BLIS for a BLAS interface (see `numpy
gh-7372 <https://github.com/numpy/numpy/issues/7372>`__).

2. Support for newer LAPACK features. In SciPy 1.2.0 we increased the minimum
supported version of LAPACK to 3.4.0. Now that we dropped Python 2.7, we can
increase that version further (MKL + Python 2.7 was the blocker for >3.4.0
previously) and start adding support for new features in LAPACK.


Implement sparse arrays in addition to sparse matrices
------------------------------------------------------

The sparse matrix formats are mostly feature-complete, however the main issue
is that they act like ``numpy.matrix`` (which will be deprecated in NumPy at
some point). What we want is sparse *arrays* that act like ``numpy.ndarray``.
This is being worked on in https://github.com/pydata/sparse, which is quite far
along. The tentative plan is:

- Start depending on ``pydata/sparse`` once it's feature-complete enough (it
still needs a CSC/CSR equivalent) and okay performance-wise.
- Add support for ``pydata/sparse`` to ``scipy.sparse.linalg`` (and perhaps to
``scipy.sparse.csgraph`` after that).
- Indicate in the documentation that for new code users should prefer
``pydata/sparse`` over sparse matrices.
- When NumPy deprecates ``numpy.matrix``, vendor that or maintain it as a
stand-alone package.


Fourier transform enhancements
------------------------------

The new ``scipy.fft`` subpackage should be extended to add a backend system with
support for PyFFTW and mkl-fft.


Support for distributed arrays and GPU arrays
---------------------------------------------

NumPy is splitting its API from its execution engine with
NumPy has split its API from its execution engine with
``__array_function__`` and ``__array_ufunc__``. This will enable parts of SciPy
to accept distributed arrays (e.g. ``dask.array.Array``) and GPU arrays (e.g.
``cupy.ndarray``) that implement the ``ndarray`` interface. At the moment it is
Expand All @@ -72,19 +23,28 @@ In addition to making use of NumPy protocols like ``__array_function__``, we can
make use of these protocols in SciPy as well. That will make it possible to
(re)implement SciPy functions like, e.g., those in ``scipy.signal`` for Dask
or GPU arrays (see
`NEP 18 - use outside of NumPy <http://www.numpy.org/neps/nep-0018-array-function-protocol.html#use-outside-of-numpy>`__).
`NEP 18 - use outside of NumPy <http://www.numpy.org/neps/nep-0018-array-function-protocol.html#use-outside-of-numpy>`__). NumPy's features in this areas are still evolving,
see e.g. `NEP 37 - A dispatch protocol for NumPy-like modules <https://numpy.org/neps/nep-0037-array-module.html>`__,
and SciPy is an important "client" for those features.


Performance improvements
------------------------

Speed improvements, lower memory usage and the ability to parallelize
algorithms are beneficial to most science domains and use cases. We have
established an API design pattern for multiprocessing - using the ``workers``
keyword - that can be adopted in many more functions.

Improve source builds on Windows
--------------------------------
Enabling the use of an accelerator like Pythran, possibly via Transonic, and
making it easier for users to use Numba's ``@njit`` in their code that relies
on SciPy functionality would unlock a lot of performance gain. That needs a
strategy though, all solutions are still maturing (see for example
`this overview <https://fluiddyn.bitbucket.io/transonic-vision.html>`__).

SciPy critically relies on Fortran code. This is still problematic on Windows.
There are currently only two options: using Intel Fortran, or using
MSVC + gfortran. The former is expensive, while the latter works (it's what we
use for releases) but is quite hard to do correctly. For allowing contributors
and end users to reliably build SciPy on Windows, using the Flang compiler
looks like the best way forward long-term. Until Flang support materializes,
we need to streamline and better document the MSVC + gfortran build.
Finally, many individual functions can be optimized for performance.
``scipy.optimize`` and ``scipy.interpolate`` functions are particularly often
requested in this respect.


Statistics enhancements
Expand All @@ -97,3 +57,34 @@ particularly high importance to the project.
- Expand the set of hypothesis tests. In particular, include all the basic
variations of analysis of variance.
- Add confidence intervals for all statistical tests.


Support for more hardware platforms
-----------------------------------

SciPy now has continuous integration for ARM64 (or ``aarch64``) and POWER8/9
(or ``ppc64le``), and binaries are available via
`Miniforge <https://github.com/conda-forge/miniforge>`__. Wheels on PyPI for
these platforms are now also possible (with the ``manylinux2014`` standard),
and requests for those are becoming more frequent.

Additionally, having IBM Z (or ``s390x``) in CI is now possible with TravisCI
but not yet done - and ``manylinux2014`` wheels for that platform are also
possible then. Finally, resolving open AIX build issues would help users.


Implement sparse arrays in addition to sparse matrices
------------------------------------------------------

The sparse matrix formats are mostly feature-complete, however the main issue
is that they act like ``numpy.matrix`` (which will be deprecated in NumPy at
some point). What we want is sparse *arrays* that act like ``numpy.ndarray``.
This is being worked on in https://github.com/pydata/sparse, which is quite far
along. The tentative plan is:

- Start depending on ``pydata/sparse`` once it's feature-complete enough (it
still needs a CSC/CSR equivalent) and okay performance-wise.
- Indicate in the documentation that for new code users should prefer
``pydata/sparse`` over sparse matrices.
- When NumPy deprecates ``numpy.matrix``, vendor that or maintain it as a
stand-alone package.