ENH: Add kwarg support for vectorize for ticket #2100 and ticket #1156 #275

Closed
wants to merge 1 commit into
from

Projects

None yet

2 participants

mforbes commented May 10, 2012

Ticket #2100: kwarg support for vectorize

  • Added _get_argspec() to determine kwargs
  • API: Optional argspect argument to vectorize
  • API: Optional exclude argument to exclude some args from vectorization.
  • API: Optional get signature and doc from member .original_function
  • Added documentation, examples, and coverage tests
  • Added additional coverage test and base case for functions with no args
  • Factored original behaviour into _vectorize_call
  • Some minor documentation and error message corrections

Ticket #1156: Support vectorizing over instance methods

  • Don't vectorize over self if thefunc is a method and .im_self is not None.
Michael McNeil Forbes ENH: Add kwarg support for vectorize for ticket #2100 and ticket #1156
Ticket #2100: kwarg support for vectorize
- Added _get_argspec() to determine kwargs
- API: Optional argspect argument to vectorize
- API: Optional exclude argument to exclude some args from vectorization.
- API: Optional get signature and doc from member .original_function
- Added documentation, examples, and coverage tests
- Added additional coverage test and base case for functions with no args
- Factored original behaviour into _vectorize_call
- Some minor documentation and error message corrections

Ticket #1156: Support vectorizing over instance methods
- Don't vectorize over self if thefunc is a method and .im_self is not None.
d313d3b
@mforbes mforbes commented on the diff May 10, 2012
numpy/lib/function_base.py
self.lastcallargs = 0
- def __call__(self, *args):
+ if not exclude:
+ exclude = set()
+ else:
+ exclude = set(exclude)
mforbes
mforbes May 10, 2012

This might be somewhat redundant, but self.excluded needs membership tests in __call__ which could in principle be slow if a non-hashed sequence is passed.

@stefanv stefanv commented on the diff May 10, 2012
numpy/lib/function_base.py
@@ -1830,11 +1880,72 @@ class vectorize(object):
>>> type(out[0])
<type 'numpy.float64'>
+ The `exclude` argument can be used to prevent vectorizing over certain
+ variables
+
+ >>> def mypolyval(p, x):
+ ... _p = list(p)
+ ... res = _p.pop(0)
+ ... while _p:
+ ... res = res*x + _p.pop(0)
+ ... return res
+ >>> vpolyval = np.vectorize(mypolyval, exclude='p')
+ >>> vpolyval(p=[1, 2, 3], x=[0, 1])
+ array([3, 6])
+ >>> vpolyval(x=[0, 1], p=[1, 2, 3])
stefanv
stefanv May 10, 2012 Contributor

Why this second line that does the same as the previous?

mforbes
mforbes May 10, 2012

For some reason I had the idea that I should demonstrate that argument order does not matter -- there was an issue where something like this worked but in a non-intuitive way. This is of course pointless as written. I will remove (though it suggests that maybe vpolyval([1,2,3], x=[0,1]) should perhaps work...

@stefanv stefanv commented on the diff May 10, 2012
numpy/lib/function_base.py
+ >>> def mypolyval(p, x):
+ ... _p = list(p)
+ ... res = _p.pop(0)
+ ... while _p:
+ ... res = res*x + _p.pop(0)
+ ... return res
+ >>> vpolyval = np.vectorize(mypolyval, exclude='p')
+ >>> vpolyval(p=[1, 2, 3], x=[0, 1])
+ array([3, 6])
+ >>> vpolyval(x=[0, 1], p=[1, 2, 3])
+ array([3, 6])
+
+ In order for vectorize to be able to generate a function that processes
+ keyword arguments properly, it must know the original signature. If this
+ cannot be deduced from the function (for example if the original function is
+ wrapped)
stefanv
stefanv May 10, 2012 Contributor

This is also mentioned above in the parameters, but I think it could still be made clearer when exactly this would happen.

The example below is good. Maybe just emphasize that it is when **kwargs is used (unless there are other cases that I forgot about, maybe in compiled functions?).

@stefanv stefanv commented on the diff May 10, 2012
numpy/lib/function_base.py
+ >>> def f(a):
+ ... return 2*a
+ >>> def wrapped_f(*v, **kw):
+ ... return f(*v, **kw)
+ >>> np.vectorize(wrapped_f)(a=[1,2])
+ Traceback (most recent call last):
+ ...
+ TypeError: wrapped_f() go an unexpected keyword argument 'a'
+
+ then it can be specified using the `argspec` argument,
+
+ >>> import inspect
+ >>> np.vectorize(wrapped_f, argspec=inspect.getargspec(f))(a=[1,2])
+ array([2, 4])
+
+ or by including the original function as ``pyfunc.original_function``.
stefanv
stefanv May 10, 2012 Contributor

This feels like a hack.

How about rather suggesting another wrapper:

def upper_wrap(x, y, p=[1, 2]):
    x = do_something_with_p(x)
    return inner_func(x)
mforbes
mforbes May 10, 2012

I am not sure what you are suggesting here or how it helps provide the signature. The motivation for pyfunc.original_function is to facilitate the writing of decorators:

def mydecorator(f):
    def wrapped_f(*v, **kw):
        return f(*v, **kw)
    wrapped_f.original_function = f
    return wrapped_f

@vectorize 
@mydecorator
def f(x, p):
    return x**p

If we remove this feature, how would you write mydecorator?

@stefanv stefanv commented on the diff May 10, 2012
numpy/lib/function_base.py
"""
- def __init__(self, pyfunc, otypes='', doc=None):
+ def __init__(self, pyfunc, otypes='', doc=None,
+ argspec=None, exclude=None):
+
+ # Implements enhancement ticket #2100: kwarg support.
+
+ # Look for an attribute the user may have added to function
stefanv
stefanv May 10, 2012 Contributor

I'd prefer to remove this, but I'd like others to weigh in as well.

@stefanv stefanv commented on the diff May 10, 2012
numpy/lib/function_base.py
+ this apparently fails for some functions like `math.cos` so we
+ provide additional tests here.
+ """
+ import inspect
+ try:
+ argnames, vargs, kwargs, defaults = inspect.getargspec(obj)
+ if (inspect.ismethod(obj)
+ and obj.im_self is not None): # Fix #1156
+ argnames = argnames[1:] # Remove self
+ except:
+ # Fallback to _get_nargs if inspect fails.
+ nargs, ndefaults = _get_nargs(obj)
+ argnames = [None,] * nargs
+ vargs = None
+ kwargs = None
+ defaults = ndefaults
stefanv
stefanv May 10, 2012 Contributor

Shouldn't this be a list of defaults?

mforbes
mforbes May 10, 2012

Actually, it should be a dictionary of the defaults, but of what?

The problem here I am trying to solve is setting ndefaults when the actual defaults cannot be deduced properly. In principle, I could try to rewrite _get_nargs to return the dictionary of defaults but did not want to potentially introduce new bugs. This is related to the later check of defaults below.

stefanv
stefanv May 10, 2012 Contributor

If there is no valid value, then I guess None would suffice. It should just be clear what's happening.

mforbes
mforbes May 10, 2012

Correction: it should be a tuple of the defaults. The problem is that the length ndefaults is needed. This happens in the fallback where the information is extracted from the error message. I am thinking of returning (NotImplemented,)*ndefaults so the main code can just do ndefaults=len(defaults). Is there some "Unknown" type I could use instead? Maybe I should just use a class _Unknown: pass.

@stefanv stefanv commented on the diff May 10, 2012
numpy/lib/function_base.py
"""
- def __init__(self, pyfunc, otypes='', doc=None):
+ def __init__(self, pyfunc, otypes='', doc=None,
+ argspec=None, exclude=None):
+
+ # Implements enhancement ticket #2100: kwarg support.
+
+ # Look for an attribute the user may have added to function
+ # (see the example for argspec in the docstring)
+ original_function = getattr(pyfunc, 'original_function', pyfunc)
+ if not argspec:
stefanv
stefanv May 10, 2012 Contributor

The test for None is

if argspec is not None:

(This holds for all the None tests below, such as default etc.)

@stefanv stefanv commented on the diff May 10, 2012
numpy/lib/function_base.py
self.thefunc = pyfunc
self.ufunc = None
- nin, ndefault = _get_nargs(pyfunc)
+ nin = len(argnames)
+ if defaults:
+ ndefault = len(defaults)
+ self.defaults = dict(zip(argnames[-ndefault:], defaults))
+ else:
stefanv
stefanv May 10, 2012 Contributor

What does this else test for? defaults cannot be None, due to the definition of _get_argspec, so I assume zero? Either way, should be explicitly written as

if defaults == 0:

or

if defaults is None:
...
else:
...
mforbes
mforbes May 10, 2012

Will update _get_argspec so that defaults is always a list of length ndefaults so this logic will be removed.

@stefanv stefanv commented on the diff May 10, 2012
numpy/lib/function_base.py
self.lastcallargs = 0
- def __call__(self, *args):
+ if not exclude:
+ exclude = set()
+ else:
+ exclude = set(exclude)
+ for _k in exclude:
stefanv
stefanv May 10, 2012 Contributor
if not argnames.issuperset(exclude):
mforbes
mforbes May 10, 2012

Done. (Actually, will use if not exclude.issubset(argnames): since exclude is guaranteed to be a set at this point).

@stefanv stefanv commented on the diff May 10, 2012
numpy/lib/function_base.py
+ if kwargs:
+ # Process kwargs, appending them to args as positional arguments
+ argnames = self.argspec[0]
+ if self.excluded:
+ # Exclude variables by defining a new function akin to
+ # functools.partial, but backward compatible with 2.4. The new
+ # function calls the old function with all kwargs for
+ # simplicity.
+ old_args = list(args)
+ args = []
+ new_arg_names = []
+ constants = {}
+ for _n, _v in enumerate(argnames):
+ if _n < len(args):
+ if _v in self.excluded:
+ constants[_v] = args.pop(0)
stefanv
stefanv May 10, 2012 Contributor

Would this work if there are only positional arguments and the first is excluded?

The logic here is fairly convoluted; I wonder if it can't be cleaned up using set inclusion tests, etc. I.e., the current logic goes "for each item in the parameters list, do a check to see if it should be included, and execute some behaviour depending on the outcome of the test", whereas an alternative may be "for each of the parameters, add it to the list with the appropriate transformation".

mforbes
mforbes May 11, 2012

There are some issues with mixed positional and kw arguments. I will try to simplify this logic.

@stefanv stefanv commented on the diff May 10, 2012
numpy/lib/function_base.py
# Convert to object arrays first
- newargs = [array(arg,copy=False,subok=True,dtype=object) for arg in args]
- if self.nout == 1:
+ newargs = [array(arg,copy=False, subok=True, dtype=object)
stefanv
stefanv May 10, 2012 Contributor

Space after arg.

mforbes
mforbes May 10, 2012

Done. (And a few others found too).

@stefanv stefanv commented on the diff May 10, 2012
numpy/lib/function_base.py
# Convert to object arrays first
- newargs = [array(arg,copy=False,subok=True,dtype=object) for arg in args]
- if self.nout == 1:
+ newargs = [array(arg,copy=False, subok=True, dtype=object)
+ for arg in args]
+
+ # Check for a peculiar base case of no inputs
+ if 0 == nargs and not self.defaults:
+ _res = thefunc()
stefanv
stefanv May 10, 2012 Contributor

nargs == 0, if defaults is a None comparison not self.defaults is None.

mforbes
mforbes May 10, 2012

self.defaults is now a dictionary, so this is a test for and empty dict which I believe should be done this way (standard python idiom) when there is no possible array confusion.

@mforbes mforbes commented on the diff May 10, 2012
numpy/lib/function_base.py
self.lastcallargs = 0
- def __call__(self, *args):
+ if not exclude:
mforbes
mforbes May 10, 2012

This should be if exclude is not None.

mforbes commented May 11, 2012

@stefanv
You suggested that the logic is somewhat convoluted and I agree. After thinking about this a bit, I decided to try a rather radical solution -- remove the caching and introspection altogether:

https://github.com/mforbes/numpy/tree/new-vectorize

This makes the code and logic much simpler, and resolves several additional issues (tickets #2100, #1156, and #1487) without any special cases. Since there is no introspection, there is no need for the argspec stuff or the complicated regexp tests,

Performance will take a hit, but since vectorize is for convenience, not performance, this might be okay. If not, it should be possible to put caching back in, either with an option, or with a careful check of arguments etc. to determine when to recache.

Let me know what you think. All tests pass, and as far as I can tell, this should be functionally backward compatible, but much more general.

If performance is an issue, can you point me to some benchmarks so I can try to reimplement the caching?

Contributor
stefanv commented May 11, 2012

That code reads much better--thank you for considering the different options. The only concern is with the extra execution of the function to determine the output types. It may be worth caching the results of that operation, since it should always yield the same result (?).

mforbes commented May 11, 2012

Let me work a bit on caching to avoid the extra call and maybe caching the ufunc. I will reopen that as a PR when I am finished.

@mforbes mforbes closed this May 11, 2012
Contributor
stefanv commented May 11, 2012

Perfect, thanks a lot, Michael. While you are at it, feel free to rename variables--those legacy names don't help!

mforbes commented May 17, 2012

This is superseded by: #282

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment