Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: logit and expit ufuncs for scipy.special #52

Closed
wants to merge 5 commits into from

Conversation

chrisjordansquire
Copy link
Contributor

logit and expit ufuncs for scipy.stats. Logit and expit are commonly used functions in statistical modeling, and they do not appear to be anywhere in scipy or numpy.

Note: One of the tests in test_logit currently fails due to a bug in numpy. I'll be submitting a patch for the numpy bug in a day or so, and then it passes just fine. (The isnumber function in numpy.testing.utils doesn't recognize the half-float as a number.)

@rgommers
Copy link
Member

rgommers commented Aug 3, 2011

Tests look good. This still needs a complete docstring, and preferably one that is added with the add_newdocs mechanism instead of hidden in the C code.

More importantly, why write this in C? There's no C code in scipy.stats right now, and I don't see a reason to add some now.

For the failing test, it would be good to conditionally skip that for numpy < 1.7 because scipy 0.10 should work with numpy 1.6 (maybe even 1.5).

@chrisjordansquire
Copy link
Contributor Author

I wrote it in C because I wanted to make them ufuncs. Also to use them as a prototype for updating the numpy ufunc creation docs, which is something I'm currently working on. Of course, there's also a minor speed boost--the ufuncs run in about 75% of the time as a python level call np.log(x/(1-x)). The speed bump is important since expit and logit are commonly called in iterative fitting algorithms.

logit and expit could go in scipy.special instead, but there's nothing else in scipy.special that a statistician would care about as far as I can tell. It seems like they belong in scipy.stats so a stat user wouldn't have to constantly call
import scipy.stats
from scipy.special import logit, expit

The doc-string is unsatisfying, but I copied what's done in scipy.special for its ufuncs. Mark had mentioned the add_newdocs but I didn't pursue it. I'll take care of that later today.

@rgommers
Copy link
Member

rgommers commented Aug 3, 2011

Agreed that they do belong in stats. I'm just not convinced that 250 lines of C instead of a one-liner in Python for a 25% increase in speed is a good performance/maintainability trade-off. But let's see what other devs think, especially Josef since he's the stats maintainer.

@chrisjordansquire
Copy link
Contributor Author

In its defense, the ufuncs are very simple as C-Numpy goes, the docs are about to get a major update, and there's already cython c in scipy.stats. Most of the C code is boiler plate anyways. The C code that actually computes the logit is only around 10-20 lines. The rest of it is the standard way of defining a ufunc. If that part of the code isn't working then many more basic ufuncs in numpy will also not work. Also, as a ufunc, the user automatically get access to all the keywords for ufuncs. Things like out, where, setting the order, etc..

The ufuncs could be generated via cython, but that would also generate C code, and require anyone looking at the code to be conversant with cython's ins and outs.

@rgommers
Copy link
Member

rgommers commented Aug 3, 2011

Yes, as C code goes it's very clean. Cython is better still though - if you understand Python code you understand Cython code too. You normally don't have to look at the C generated from it. (not suggesting you rewrite this, if others think it's fine, then it's fine with me too)

@josef-pkt
Copy link
Member

I think this belongs in scipy.special, alongside expm1, log1p and similar. scipy.stats has no pure functions like this. scipy.stats imports a large number of functions from scipy.special, so the pattern to have the functions in scipy.special already exists.
And the implementation and c code should be looked at by Pauli, as the specialist for scipy.special functions.

I would also be in favor of cython code, for example special.orthogonal_eval.pyx looks much easier to read and has separation between ufunc boilerplate and cython implementation of the actual function.

In any case, I won't be maintaining the code, so I leave the decision to others.

Two asides:
for complex derivatives, I would prefer if these kind of functions are also implemented for complex

from scipy import special
special.expm1(5)
147.4131591025766
special.expm1(5+1j)
Traceback (most recent call last):
File "<pyshell#19>", line 1, in
special.expm1(5+1j)
TypeError: function not supported for these types, and can't coerce safely to supported types
np.exp(5+1j) -1
(79.187972084297229+124.88536714849616j)

What's the numpy requirement for the next scipy release? I don't think having the function in half floats would be worth increasing the version requirement for numpy. (I don't know the details, but I wonder whether it compiles against older numpy, not just whether we have to skip some tests).

They are useful functions, but in terms of how much they are used, I think xlogy would be more important.

@josef-pkt
Copy link
Member

One more comment about scipy.special docstrings: From what I have seen they are semiautomatically created. I think eventually we should switch to docstrings following the numpy standard. There are several functions where I would have liked to add more information. The current docstrings of many special functions are awfully uninformative about details.

@chrisjordansquire
Copy link
Contributor Author

I'll put them in scipy.special and remove the half-floats. Though scipy.stats currently isn't importing anything at the top level from scipy.special.

I poked around a bit in scipy.orthogonal_eval.pyx. It's largely not ufuncs or it's a very simple ufunc. The only ufunc defined there is _eval_chebyt, and it's only defined for the input dtypes of (long, double) and output of double. It does not take multiple dtypes and output the appropriate corresponding dtype.

The user calls eval_chebyt rather than _eval_chebyt. So the user doesn't actually call the ufunc, and instead calls a python wrapper. The user won't have access to various keyword arguments that ufuncs have, and in general it's not robust to future changes and additions to the ufunc. Similarly, all the other functions are just calling each other (they're convenience functions for different scalings) or calling ufuncs from the _cepehs library.

I think if orthongal_eval.pyx was written to support multiple dtypes and so that every function was a ufunc then the code would look essentially like the logit and expit C code. The eval_poly_chebyt would certainly have to be rewritten for each type to ensure that cython compiled it for all the types. (Unless cython supports templating of some form.) Some of the loop boilerplate could be eliminated by passing functions into things like _loop_1d_d for the other dtypes. But that could be done in C, too. They have many loops like that defined in numpy.core.src.umath in loops.c.src.

As for the comments...that's actually not as simple as it sounds. Or at least it wasn't. I just submitted a pull request to numpy to help change that. ufuncs are python objects whose docstrings are read-only. They are set at object creation. The either (1)must be passed in to the C function creating the ufunc (at which point you have to have the docs in some C header file or parse them in from a Python header file with the docs), or (2) you must reset what the ufunc doc pointer points to at the C level. The add_newdoc method doesn't work on ufuncs. (And in fact there are a number of calls to add_newdoc in numpy/add_newdocs.py that don't change the documentation. It isn't noticed because add_newdoc specifically catches all errors and does nothing.)

@chrisjordansquire
Copy link
Contributor Author

I moved logit and expit to scipy.special. It looks like all the tests work out, though the docs depend on the add_newdoc_ufunc that currently has a pull request in numpy. If add_newdoc_ufunc isn't in numpy then the docs simply aren't modified and remain empty.

@pv
Copy link
Member

pv commented Aug 27, 2011

I don't have a problem with adding new functions to scipy.special. However, I think there should be some reorganization, as it does not make sense to add a new extension module every time someone wants to contribute a new C/Cython function. The ufunc loops also look a bit repetitive, so using templating could help here.

@chrisjordansquire
Copy link
Contributor Author

Templating? In C? I don't think I follow.

I second the reorganization, but I'm unsure what the best way to do that would be. I had thought about trying to put logit/expit in the existing files, but it didn't seem worth the effort of figuring out how. It wasn't nearly as straight-forward as just creating a new file.

@pv
Copy link
Member

pv commented Aug 28, 2011

I think we can postpone the reorganization, so it does not have to be done in this pull request. We already have a sort of a mess there, but we can clean it up later. So let's just do it as you do now.

Templating: rename the .c file to .c.src and you can use the same built-in templating stuff that Numpy has. For example here: https://github.com/numpy/numpy/blob/master/numpy/core/src/npymath/npy_math.c.src#L317 -- if you only need to change the data type, you can replace the copied-and-pasted set of functions with a "for loop" where the type changes.

@pv
Copy link
Member

pv commented Aug 29, 2011

... like so: https://github.com/pv/scipy-work/commits/oth/chrisjordansquire/logit -- will try to think again tomorrow

@chrisjordansquire
Copy link
Contributor Author

Ah, thank you for the example. Changes are made and pushed.

I thought it was C preprocessor dark magic I hadn't seen before, but according to this
http://mail.scipy.org/pipermail/numpy-discussion/2009-March/040836.html
from C. Harris, it's simply a python script.

Since you showed this to me, I'm curious if

  1. This trick is perfectly fine to use for non-library functions, since it's just using numpy distutils?
  2. Can this same trick be used when creating ufuncs with cython?
  3. How would one use the generic loops in numpy/numpy/core/src/umath?

I'd like to know all that so I can incorporate it into the ufunc tutorial I wrote a few weeks ago.

@pv
Copy link
Member

pv commented Sep 1, 2011

Note also that you need to use npy_expf from npy_math.h instead of expf, because the f and l versions of floating point functions are standard only in C99, and we want Scipy to build on C89.

@chrisjordansquire
Copy link
Contributor Author

Done.

@rgommers
Copy link
Member

This is merged, so closing.

It's also segfaulting under python3 though. Did one of you try this with python3? This is the first time numpy-core-style templating is used in scipy, so there may be some problems. Should be solved asap, since this was just released in a beta...

@rgommers rgommers closed this Sep 13, 2011
@rgommers
Copy link
Member

That segfault may actually be fixed by a patch Christoph Gohlke sent; I'm going to test that now.

@rgommers
Copy link
Member

That did fix it, added it to #77

One failure left related to logit with numpy 1.5.1:

ValueError: 
Arrays are not almost equal
 x: array([  -inf, -2.079, -1.253, -0.693, -0.223,  0.223,  0.693,  1.253,  2.079,    inf])
 y: array([  -inf, -2.079, -1.253, -0.693, -0.223,  0.223,  0.693,  1.253,  2.079,    inf])

@rgommers
Copy link
Member

Failure fixed in commit b6728b0

@chrisjordansquire
Copy link
Contributor Author

Sorry. I thought I'd tested this under Python 3, but I confused it with another pull request.

@rgommers
Copy link
Member

No problem.

@pv
Copy link
Member

pv commented Sep 14, 2011

Mea culpa, I was in too much of a hurry to get it in and forgot to test it on Python 3.

WarrenWeckesser pushed a commit to WarrenWeckesser/scipy that referenced this pull request Jan 7, 2021
FIX: refactor so biasedurn works with numpy<1.17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants