Skip to content

Commit

Permalink
Finish up NXEP 4 first draft (#5391)
Browse files Browse the repository at this point in the history
* Minor updates to wording.

* Summarize and add links to relevant sklearn discussions.

* Add an implementation section.

* Add suggestion from Dan re: pkg-level toggle.
  • Loading branch information
rossbar committed Mar 15, 2022
1 parent 51eb1ad commit c6e9065
Showing 1 changed file with 96 additions and 10 deletions.
106 changes: 96 additions & 10 deletions doc/developer/nxeps/nxep-0004.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Pseudo-random numbers play an important role in many graph and network analysis
algorithms in NetworkX.
NetworkX provides a :ref:`standard interface to random number generators <randomness>`
that includes support for `numpy.random` and the Python built-in `random` module.
`numpy.random` is used extensively within NetworkX and in most cases is the
`numpy.random` is used extensively within NetworkX and in several cases is the
preferred package for random number generation.
NumPy introduced a new interface in the `numpy.random` package in NumPy version
1.17.
Expand Down Expand Up @@ -171,10 +171,11 @@ This can be addressed with a compatiblity class similar to the
layer between `random` and `numpy.random.RandomState`.

`create_random_state` currently returns the global ``numpy.random.mtrand._rand``
`RandomState` instance when the input is `None` or the numpy.random module.
`RandomState` instance when the input is `None` or the ``numpy.random`` module.
By switching to `numpy.random.Generator`, this will no longer be possible as
there is no global, internal `Generator` instance in the `numpy.random` module.
This should have no effect on users.
This should have no effect on users, as ``seed=None`` currently does not
guarantee reproducible results.

Detailed description
--------------------
Expand All @@ -188,17 +189,80 @@ function is either an integer or `None`.
Related Work
------------

- NEP 19
- TODO
Scikit-learn has a similar pattern for imposing determinism on functions that
depend on randomness.
For example, many functions in ``scikit-learn`` have a ``random_state`` argument
that functions similarly to how ``seed`` behaves in many NetworkX function
signatures.
One difference between ``scikit-learn`` and ``networkx`` is that scikit-learn
**only** supports ``RandomState`` via the ``random_state`` keyword argument,
whereas NetworkX implicitly supports both the built-in `random` module, as well
as both the numpy ``RandomState`` and ``Generator`` instances (depending on
the type of ``seed``).
This is reflected in the name of the keyword argument as ``random_state``
(used by scikit-learn) is les ambiguous than ``seed`` (used by NetworkX).

There are multiple relevant discussions in the scikit-learn community about
potential approaches to supporting the new NumPy random interface:

- `scikit-learn/scikit-learn#16988 <sklearn16988>`_ covers strategies and concerns
related to enabling users to use the ``Generator``-based random number generators.
- `scikit-learn/scikit-learn#14042 <sklearn14042>`_ is a higher-level discussion
that includes additional information about the design considerations and constraints
related to scikit-learn's ``random_state``.
- There is also a releated `SLEP <slep011>`_.

.. _sklearn16988: https://github.com/scikit-learn/scikit-learn/issues/16988
.. _sklearn14042: https://github.com/scikit-learn/scikit-learn/issues/14042
.. _slep011: https://github.com/scikit-learn/enhancement_proposals/pull/24

Implementation
--------------


TODO: simple diff here

The implementation itself is quite simple. Most of the work will go into
improved/reorganized tests.
The implementation itself is quite simple. The logic that determines how
inputs are mapped to random number generators is encapsulated in the
`~networkx.utils.misc.create_random_state` function (and the related
`~networkx.utils.misc.create_py_random_state`).
Currently (i.e. NetworkX <= 2.X), this function maps inputs like ``None``,
``numpy.random``, and integers to ``RandomState`` instances::

def create_random_state(random_state=None):
if random_state is None or random_state is np.random:
return np.random.mtrand._rand
if isinstance(random_state, np.random.RandomState):
return random_state
if isinstance(random_state, int):
return np.random.RandomState(random_state)
if isinstance(random_state, np.random.Generator):
return random_state
msg = (
f"{random_state} cannot be used to create a numpy.random.RandomState or\n"
"numpy.random.Generator instance"
)
raise ValueError(msg)

This NXEP proposes to modify the function to produce ``Generator`` instances
for these inputs. An example implementation might look something like::


def create_random_state(random_state=None):
if random_state is None or random_state is np.random:
return np.random.default_rng()
if isinstance(random_state, (np.random.RandomState, np.random.Generator)):
return random_state
if isinstance(random_state, int):
return np.random.default_rng(random_state)
msg = (
f"{random_state} cannot be used to create a numpy.random.RandomState or\n"
"numpy.random.Generator instance"
)
raise ValueError(msg)


The above captures the essential change in logic, though implementation details
may differ.
Most of the work related implementing this change will be associated with
improved/reorganized tests; including adding tests rng-stream reproducibility.

Alternatives
------------
Expand All @@ -208,6 +272,28 @@ acceptable alternative.
``RandomState`` is not deprecated, and is expected to maintain its stream-compatibility
guarantee in perpetuity.

Another possible alternative would be to provide a package-level toggle that
users could use to switch the behavior the ``seed`` kwarg for all functions
decorated by ``np_random_state`` or ``py_random_state``.
To illustrate (ignoring implementation details)::

>>> import networkx as nx
>>> from networkx.utils.misc import create_random_state

# NetworkX 2.X behavior: RandomState by default

>>> type(create_random_state(12345))
numpy.random.mtrand.RandomState

# Change random backend by setting pkg attr

>>> nx._random_backend = "Generator"

>>> type(create_random_state(12345))
numpy.random._generator.Generator


Discussion
----------

Expand Down

0 comments on commit c6e9065

Please sign in to comment.