-
-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add signature argument to vectorize for vectorizing like generalized ufuncs #8054
Conversation
@@ -1,3 +1,5 @@ | |||
.. _c-api.generalized-ufuncs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to add a reference to this doc page from the docstring for guvectorize
. Do we have a standard way to do that? I guess a URL under "References" is probably most robust, though I was thinking a sphinx reference might work...
I've added tests, so this is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like a nice feature
do we need a new api name guvectorize
? could this be put into vectorize
with an optional signature
argument?
core_dims : Tuple[str, ...] | ||
Core dimensions for this argument. | ||
""" | ||
if core_dims: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not core_dims: return
instead of an extra indent level for the whole function might be nicer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, done
self.pyfunc = pyfunc | ||
self.signature = signature | ||
|
||
if not re.match(_SIGNATURE, signature): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this part is just _parse_gufunc_signature
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, indeed, good catch
_SIGNATURE = '^%s->%s$' % (_ARGUMENT_LIST, _ARGUMENT_LIST) | ||
|
||
|
||
def _parse_gufunc_signature(signature): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't these functions just be private methods of the guvectorize class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer writing functions for isolated functionality. Among other virtues, it makes things easier to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could use @classmethod
, can be tested in isolation of creating the object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, could use @classmethod
or @staticmethod
here, but my feeling is that it's poor style to use a class merely as a namespace. (It's actually against our internal style guide.)
@juliantaylor This is a good point. A |
Example usage (from the docstring): Vectorized convolution: >>> convolve = np.vectorize(np.convolve, signature='(n),(m)->(k)') >>> convolve(np.eye(4), [1, 2, 1]) array([[ 1., 2., 1., 0., 0., 0.], [ 0., 1., 2., 1., 0., 0.], [ 0., 0., 1., 2., 1., 0.], [ 0., 0., 0., 1., 2., 1.]])
This is ready for further review. I rewrote this to simply add a |
Tuple of input and output core dimensions parsed from the signature, each | ||
of the form List[Tuple[str, ...]]. | ||
""" | ||
if not re.match(_SIGNATURE, signature): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this test carry much overhead? My always-try-to-optimize the normal path temptation would be to put the return statement inside a try/except
clause like
try:
return ...
except Exception:
if not re.match...
raise ValueError(...)
else:
raise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the parsing is not really performance relevant, it is done on the object creation, the time consuming part is executing it on the data
but if you want to improve it, precompiling the regex might be worthwhile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @juliantaylor that this is unlikely to be a performance bottleneck, given that it is done on construction. My timing shows it takes around 10us, which is really in micro-benchmark territory.
My understanding is that in most cases it isn't necessary to worry about precompiling a regex because the re
module maintains its own cache: https://docs.python.org/2/library/re.html#re.compile
@@ -2242,17 +2243,116 @@ def disp(mesg, device=None, linefeed=True): | |||
return | |||
|
|||
|
|||
# See http://docs.scipy.org/doc/numpy/reference/c-api.generalized-ufuncs.html | |||
_DIMENSION_NAME = r'\w+' | |||
_CORE_DIMENSION_LIST = '(?:%s(?:,%s)*)?' % (_DIMENSION_NAME, _DIMENSION_NAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below: being relative new to python, I got used to new-style formats immediately, and always found the old-style ones a bit clunky. Would it be an idea to try to use new-style?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's a little better. Neither is especially readable for building regular expressions, to be honest.
This looks really nice. Only two trivial comments, which you can feel free to ignore. |
Note to self: this would be a good time to fix #5868, by adding a sensible error message if |
CC @mforbes (the last person to significantly update |
r = f([0, 3, 6, 9], [1, 3, 5, 7]) | ||
assert_array_equal(r, [1, 6, 1, 2]) | ||
|
||
def test_siganture_mean_last(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
siganture
is a mispelling
I added the fix for size 0 inputs, in case anyone wants to take another look. |
I'd love to merge this in time for v1.12.0. @mhvk @juliantaylor would one of you mind taking another look at my revisions? Thanks. |
% (len(input_core_dims), len(args))) | ||
args = tuple(asanyarray(arg) for arg in args) | ||
|
||
# consider checking for size 0 inputs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment still necessary? You do seem to check for size 0 things at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this is outdated. Removed.
@shoyer - yes, indeed, this should be in 1.2, it is a very nice addition! I looked over the code fairly carefully again, and only had a a comment about a comment... |
@shoyer Sounds like time to cleanup the commit history. |
@charris how do you feel about GitHub's "Squash and merge" option? I think that can result in something that looks very similar to what I would make by hand here. |
@shoyer Don't like it, it squashes, then fast forwards the commit to master, so there is no record of the merge or the PR #. Some folks dislike the "bubble" git history resulting from lots of small merges, but for us I think it is a plus as both the authors and committers are documented along with the PR #, which makes it easy to automate part of the release documentation and link to the relevant PR. Maybe we could fix up the github function at some point if there is a hook. Don't remember what happens with the documentation. |
@charris OK, no problem I will rebase. I disabled "Allow squash merging" and "Allow rebase merging" in GitHub settings so those buttons no longer tempt me :). I also disabled force pushing to master while I was at it, since I'm pretty sure we don't want to allow that and this avoids accidental force pushes (which can happen). |
That said, I only ran the experiment once, so I might have missed some options along the way. Might be worth creating a test repo and experimenting with the buttons. |
"Squash and merge" puts everything in one commit, which by default has the PR number in the title, e.g., for this PR:
The bodies for each commit get combined into the body of the squashed commit. I like it because figuring out how to rebase to cleanup git history can be a burden for new committers. |
Hmm, so the commit message comes from the PR title rather than the first commit? What about the commit message body, was that an option to add? |
@charris You should try clicking the "Squash and merge" button once (I just re-enabled it). It doesn't merge until you click a second time to confirm, and it will show you an editable preview of the commit message. |
I think there is documentation saying that issue numbers should be part to the commit message summary line. We should change that if we allow squashing so we can reliably pull out the PR number as opposed to issues. See the |
A quick grep on the commit log exposes what looks like three uses of the squash button. One is mine, I suppose the other two are yours. |
Another solution would be to not bother squashing. My thinking about this is colored by trying to generate useful logs automatically as well as tracking committers. But I agree that the squashed commit message has much to recommend it and loss of merge tracking might not be so bad. |
Giving it a shot. I think the button needs to be used with discrimination. |
OK, one thing that is off is that the summary line of the commit, taken from the PR title, is too long at 88 characters. So one thing to do when using this option is to edit the title line to fix it up. Note that the messages from the consolidated commits also get indented an extra four spaces. All in all, I think manually squashing is a better option at this point but using the squash button is a possiblity. |
I brought this up on the mailing list today and received some support, so I
thought I would share a preliminary patch.
Still needs tests and more extensive documentation.Example usage (from the docstring):
A note on speed: a
vectorize
withsignature
has about 6x the fixed overhead ofvectorize
(60us vs 10us) and also a much larger overhead per loop (2us vs 100ns). I don't think these numbers will make any difference for the intended use case (only trivial Python functions runs in less than 10us) but I thought I would toss them out there anyways. I have yet to make any attempts at optimization.Also fixes #5868