BUG: let distribution methods raise an error if too many args given. #400

Merged
merged 4 commits into from Feb 3, 2013

2 participants

@rgommers
SciPy member

Closes #1815.

@josef-pkt
SciPy member

Thanks, looks good. (I guess we don't have a discrete distribution with numargs=0)

smaller extra change (if you want):
I think stats could use _fix_loc_scale, after poping moments from kwds, instead of duplicating the code

@rgommers
SciPy member

There indeed aren't any discrete distributions with zero shape params.

I don't directly see how to reuse _fix_loc_scale from the stats method; moments can be given as position arg so it can't just be popped.

@josef-pkt
SciPy member

Ok, I didn't see that, I never used moments as args, only as kwd.
docstring is not very explicit on args versus keyword, and optional indicates keyword.

The only other method, besides stats, with extra "keyword" is rvs
rvs : requires size as keyword, uses _fix_loc_scale, docstring also not explicit on keyword, except optional

@rgommers
SciPy member

Hmm, missed rvs:

>>> stats.gamma.rvs(3, 0, 1, 3)  # loc, scale, size
...
TypeError

That's the problem with this kind of stuff. It's also not impossible that someone has his own rv_generic subclass with some method that uses _fix_loc_scale (even though the underscore suggests that's a bad idea).

@josef-pkt
SciPy member

just realised: Why do you raise TypeError and not ValueError?

rvs requires size as keyword and is supposed to raise,
or do we want everywhere that keywords can be used as positional arguments in python? We didn't support this before, but IIRC the question showed up once (only) on the mailing list.

@josef-pkt
SciPy member

I don't think anyone subclasses rv_generic, it's rather small and useless on it's own. And, and I don't remember ever having seen it in a open source code.
(But why do you worry about that, I don't see any backwards compatibility problem if subclasses follow the scipy.stats pattern.)

@josef-pkt
SciPy member

Actually, now that I think about it, this PR also enables correct positional arguments for keywords. And should be enabled also for rvs.
(rvs also has the extra discrete keyword, which is not documented and that I never used. makes only sense for discrete rvs)

(no access to trac ticket right now, to add anything)
I didn't check positional arguments before, but the results were wrong if not the correct number of args was given.

>>> stats.gamma.pdf(2, 3, 1, 2)
0.037908166232039589
>>> stats.gamma.pdf(2, 3, 1, 2, 3)
0.2706705664732254
>>> stats.gamma.pdf(2, 3, 0, 1, 2)
0.2706705664732254
>>> stats.gamma.pdf(2, 3, 0, 5, 1, 2)
0.2706705664732254
>>> stats.gamma.numargs
1
>>> stats.gamma.pdf(2, 3, 0, 5, loc=1, scale=2)
0.037908166232039589
>>> stats.gamma.pdf(2, 3, 1, 2, 0, 5)
0.2706705664732254
>>> stats.gamma.pdf(2, 3)    # default loc=0, scale=1
0.2706705664732254
@rgommers
SciPy member

Why TypeError:

In [1]: f = lambda x: x

In [2]: f(1, 2)
---------------------------------------------------------------------------
TypeError  

You're right that size is keyword-only in rvs(), due to *args, **kwargs construction. So I didn't break anything. Will add this point to rvs docstring, since it's not obvious. moments in stats() is supported as positional argument; it's assumed in Python that kw-as-positional-arg will always work AFAIK.

Right, no need to worry about subclassing in the wild. All that will break is added methods with loc, scale and then another keyword that is then used as positional arg. Not likely to happen.

@rgommers
SciPy member

Positional args for loc/scale were always supported. I'm not sure what you mean exactly with the last comment. Is it easy to add to this PR?

@josef-pkt
SciPy member

Ok about TypeError, I never paid enough attention (and maybe didn't use it consistently)

It's good to merge this way, but the logical conclusion of this PR would be to make size in rvs also work as positional argument, in a similar way as the current handling of args, kwds in the stats method.

On the mailing list the idea just came up to add a dtype keyword for the return type. This would require all of the methods to handle an extra keyword. (But that's for a possible future.)

(My other examples were just to illustrate, that if loc and scale are used as positional arguments, and the number of arguments is wrong, then the old behavior silently returns unexpected results. If loc and scale are used as keywords, then the extra args are just ignored.)

@rgommers
SciPy member

OK, size in rvs can now be given as positional arg.

Not sure about the dtype idea. More args makes the wrong number of args go undetected more easily.

@josef-pkt josef-pkt and 1 other commented on an outdated diff Feb 2, 2013
scipy/stats/distributions.py
N = len(args)
if N > self.numargs:
if N == self.numargs + 1 and loc is None:
# loc is given without keyword
loc = args[-1]
- if N == self.numargs + 2 and scale is None:
+ elif N == self.numargs + 2 and scale is None:
@josef-pkt
SciPy member

this should also have loc:
elif N == self.numargs + 2 and loc is None and scale is None
otherwise loc could be keyword and we have one extra arg

@josef-pkt
SciPy member

because now we have elif, loc is not set before this condition anymore

@rgommers
SciPy member
rgommers added a note Feb 3, 2013

It wasn't set before, because the N == part can't be true twice. Anyway, fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@josef-pkt josef-pkt and 1 other commented on an outdated diff Feb 2, 2013
scipy/stats/distributions.py
return args, loc, scale
+ def _fix_loc_scale_kwarg3(self, args, loc, scale=1,
+ kwarg3=1, kwarg3_default=None):
+ """Parse args/kwargs input to methods with a third kwarg.
+
+ At the moment these methods are ``stats`` and ``rvs``.
+ """
+ N = len(args)
+ if N > self.numargs:
+ if N == self.numargs + 1 and loc is None:
+ # loc is given without keyword
+ loc = args[-1]
+ elif N == self.numargs + 2 and scale is None:
@josef-pkt
SciPy member

also loc is None, as above

@rgommers
SciPy member
rgommers added a note Feb 3, 2013

Correct, although then you already have 2 args too many. I copied that from the existing _fix_loc_scale. Will add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@josef-pkt josef-pkt commented on an outdated diff Feb 2, 2013
scipy/stats/distributions.py
return args, loc, scale
+ def _fix_loc_scale_kwarg3(self, args, loc, scale=1,
+ kwarg3=1, kwarg3_default=None):
+ """Parse args/kwargs input to methods with a third kwarg.
+
+ At the moment these methods are ``stats`` and ``rvs``.
+ """
+ N = len(args)
+ if N > self.numargs:
+ if N == self.numargs + 1 and loc is None:
+ # loc is given without keyword
+ loc = args[-1]
+ elif N == self.numargs + 2 and scale is None:
+ # loc and scale given without keyword
+ loc, scale = args[-2:]
+ elif N == self.numargs + 3 and kwarg3 is None:
@josef-pkt
SciPy member

same here I think, loc is None and scale is None is also required AFAICS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@josef-pkt josef-pkt commented on an outdated diff Feb 2, 2013
scipy/stats/distributions.py
return args, loc, scale
+ def _fix_loc_scale_kwarg3(self, args, loc, scale=1,
+ kwarg3=1, kwarg3_default=None):
@josef-pkt
SciPy member

outsourcing this is good.
if or when we need it in future, we can try to make this more generic to handle an arbitrary number of kwds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@josef-pkt josef-pkt commented on the diff Feb 2, 2013
scipy/stats/distributions.py
loc, moments = args[-2:]
+ else:
+ raise TypeError("Too many input arguments.")
@josef-pkt
SciPy member

is it possible to use the new _fix_loc_scale_kwarg3 now for this?

@rgommers
SciPy member
rgommers added a note Feb 3, 2013

On second thought, yes. Although it then needs a check

if kwarg3 is not None:
    # 3 positional args
    raise TypeError("Too many input arguments.")

because otherwise 3 positional args won't be detected correctly anymore in _fix_loc_scale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@josef-pkt josef-pkt commented on the diff Feb 2, 2013
scipy/stats/tests/test_distributions.py
@@ -954,5 +955,42 @@ def test_hypergeom_interval_1802():
assert_equal(stats.hypergeom.ppf(1, 100, 100, 8), 8)
+def test_distribution_too_many_args():
+ # Check that a TypeError is raised when too many args are given to a method
+ # Regression test for ticket 1815.
+ x = np.linspace(0.1, 0.7, num=5)
+ assert_raises(TypeError, stats.gamma.pdf, x, 2, 3, loc=1.0)
@josef-pkt
SciPy member

why does this raise, isn't 3 interpreted as scale? (python rules, if we want to follow them)

missing case: numargs+2 and loc=1 should raise, case in comments above

(I have a bit of a hard time thinking of all possible cases.)

@josef-pkt
SciPy member

I cannot find the python rules for argument parsing, but I think I mixed up something.

@rgommers
SciPy member
rgommers added a note Feb 3, 2013

Using pdf(x, 2, loc=1.0, 3) is invalid syntax, and if 3 would be an actual positional arg it would be loc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@rgommers
SciPy member

All comments addressed.

@josef-pkt
SciPy member

This looks good, ready to merge, AFAICS

@rgommers
SciPy member

Thanks Josef. Merging.

@rgommers rgommers merged commit ec9f6aa into scipy:master Feb 3, 2013
@rgommers rgommers deleted the rgommers:stats-ticket-1815 branch Feb 6, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment