Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: rearrange the cython files in numpy.random #14608

Merged
merged 10 commits into from
Oct 17, 2019
Merged

Conversation

mattip
Copy link
Member

@mattip mattip commented Sep 27, 2019

This is the code part of #14604 to cleanup the api and allow numpy.random to be used from cython.

Principles:

  • since the pxd files define the cython API, any public except mtrand *.so should ship a *.pxd and any *.pxd should have a *.so. Any "private" *.so should begin with _. Edit: qualify to state "public except mtrand", which cannot be made private as per NEP 19. We only have private c-extension modules now, so there are no pxd files to ship.
  • move h files needed by the pxd files into numpy/random/include
  • ship the pxd files and headers Edit: there are no pxd or headers to ship

Cleanup

  • common -> _common
  • bound_integers -> _bound_integers ?
  • don't import *
  • remove libc imports from _common, let each pxd import on its own
  • move bitgen_t from common to bit_generator.pxd
  • fold distributions.pxd into generator.pxd, legacy_distributions.pxd into mtrans.pxd.

Renaming and removing unused functions in distributions.h and legacy_distributions.h

  • random_gauss_zig -> random_standard_normal
  • all the others should loose the zig suffix
    edit: except random_standard_exponential_zig
  • make the float versions consistent with _f suffix
  • random_double -> random_random for consistency with random.random random_standard_uniform
  • functions that are not used should be removed
  • random_geometric_* do not need to be exposed
  • cleanup gauss names

After this cleanup we can create some cython user stories and check that the API is sufficient.

@mattip
Copy link
Member Author

mattip commented Sep 27, 2019

The first changeset moves things around without any function renaming yet.

# config.add_data_files('mtrand.pxd')
# config.add_data_files('pcg64.pxd')
# config.add_data_files('philox.pxd')
# config.add_data_files('sfc64.pxd')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to create these pxd files so users can do from pcg64 cimport PCG64 to use the BitGenerator directly from cython

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep this in the main namespace? PCG64 is in numpy.random so I'd much prefer it not be in separate pxd files here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main namespace

This mimics the way we import the BitGenerators now.
Do you mean we should refactor the module so that __init__.py imports differently?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's already done remember? gh-14360

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still seeing the BitGenerators in both the c-extension module and as imported by __init__.py:

>>> np.random.mt19937.MT19937
<class 'numpy.random.mt19937.MT19937'>
>>> np.random.MT19937
<class 'numpy.random.mt19937.MT19937'>
>>> 

I am not sure we can flatten the namespace without unifying all the c-extension modules into one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, missing some more underscores.

I am not sure we can flatten the namespace without unifying all the c-extension modules into one

Of course we can, it's just a matter of adding underscores to the .pyx files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made private in 9bdf1749e. mtrand must remain public as per NEP 19

... the global RandomState instance MUST be accessible in this initial release by the name numpy.random.mtrand._rand ...

@rgommers
Copy link
Member

Those principles all sound good, thanks @mattip.

@rgommers
Copy link
Member

legacy_distributions.pxd into mtrans.pxd.

There was no reply yet to your mailing list question on whether these really need to be exposed. Your preference on the mailing list was "no" I believe. I'd like to see a good rationale too before we expose the legacy generator.

@mattip
Copy link
Member Author

mattip commented Sep 28, 2019

the next commit moves common, bounded_integers to _common, _bounded_integers and removes all * imports

@mattip
Copy link
Member Author

mattip commented Oct 7, 2019

I am going back and forth about shipping pxd files for the BitGenerators Philox, PCG64, MT19937, and SFC64. I am not sure I see the use case for cimport Philox. I think I would prefer to modify the goal to "any public so should have a pxd", which would be bit_generator.pxd, generator.pxd and mtrand.pxd. Those three already exist.

@rgommers
Copy link
Member

rgommers commented Oct 7, 2019

I think I would prefer to modify the goal to "any public

I think that would be better indeed.

so should have a pxd", which would be bit_generator.pxd, generator.pxd and mtrand.pxd. Those three already exist.

I still don't quite understand why generator and bit_generator should be public. From __init__.py:

from .generator import Generator, default_rng
from .bit_generator import SeedSequence

Is there anything else in those two submodules that is public but not imported into the numpy.random namespace?

@mattip
Copy link
Member Author

mattip commented Oct 8, 2019

it seems not. I would like to get more input though, @bashtage, @rkern? Can we prepend _ to bit_generator and _generator, leaving mtrand as the only "public" module? Would we then still ship _bit_generator.pxd, _generator.pxd?

@mattip mattip force-pushed the random-api branch 2 times, most recently from 082e17e to 724ec06 Compare October 11, 2019 12:06
@mattip mattip changed the title WIP, API: rearrange the cython files in numpy.random API: rearrange the cython files in numpy.random Oct 11, 2019
@mattip
Copy link
Member Author

mattip commented Oct 11, 2019

I think this is ready for review now. Note there are no public c-extension modules, all of them (except mtrand, which is private but cannot be renamed as per NEP 19) are private. Once this is merged, we can revisit #14604 to try to develop the correct C-API.

@rgommers
Copy link
Member

Awesome, thanks Matti.

Once this is merged, we can revisit #14604 to try to develop the correct C-API.

sounds like a good plan

bit_generator.ISeedSequence
bit_generator.ISpawnableSeedSequence
bit_generator.SeedlessSeedSequence
_bit_generator.ISeedSequence
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong. You should never have something starting with an underscore in a toctree listing (or anywhere in the docs really).

So when I asked if there was anything in bit_generator besides SeedSequence that needs to be public, I guess the right answer was not "no" but "ISeedSequence, ISpawnableSeedSequence and SeedlessSeedSequence" (these are needed for writing other bitgenerators outside of NumPy, right?)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they are needed to implement new SeedSequence interfaces as kind of an abstract base class. BitGenerators should use bit_generator.SeedSequence. I think we should remove these then, and maybe document them in the (as yet nonexistant) random C-API documentation, not in the random python reference documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they're not supposed to be used from Python at all, but only from Cython/C for new BitGenerator authors, then that sounds like the right course of action.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should we do with text like this "One may also pass in an implementor of the ISeedSequence interface like SeedSequence"? How about simplifying it to not mention ISeedSequence ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adopted the "don't mention" approach for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds good to me

@rgommers
Copy link
Member

API and docs part of this PR look quite good to me now, module the few typos I just pointed out.

@bashtage
Copy link
Contributor

I'm confused why numpy/random/generator.pyx → numpy/random/_generator.pyx has been marked private. Isn't this the main file that one would use to get low-level access to the C functions?

@mattip
Copy link
Member Author

mattip commented Oct 14, 2019

The public/private distinction is for python access. C and Cython access still need to be worked out, once we have solidified the python side, see #14604.

@mattip mattip mentioned this pull request Oct 15, 2019
@bashtage
Copy link
Contributor

Did you mean to commit numpy/random/_bounded_integers.pyx? The is a template-processed output from numpy/random/_bounded_integers.pyx.in.

@bashtage
Copy link
Contributor

Same comment for numpy/random/_bounded_integers.pxd which is dervied from numpy/random/_bounded_integers.pxd.in

Copy link
Contributor

@bashtage bashtage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very small points. Only worthwhile ones are the commits of the pyx and pxd that come from templates.

PCG64 <pcg64>
Philox <philox>
SFC64 <sfc64>
MT19937 <mt19937>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does 3 or 4 spaces matter? File seem to use both.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unified to 4, would be nice to do this gradually across the documentation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it only has to be consistent per file to work. Pretty sure the default is 3 spaces everywhere.

doc/source/reference/random/new-or-different.rst Outdated Show resolved Hide resolved
__all__ = []

np.import_array()

cdef extern from "include/distributions.h":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be in _bounded_integers.pxd.in?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, then we would have to ship distributions.h. But I think we should revisit this in terms of documented use-cases in #14604 after this goes in.

numpy/random/tests/test_seed_sequence.py Show resolved Hide resolved
@mattip mattip requested a review from rgommers October 17, 2019 13:19
@rgommers
Copy link
Member

Okay this seems good to go now. Let's get this in so we can start looking at the API part. Thanks @mattip!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants