New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: logit and expit ufuncs for scipy.special #52
Conversation
Tests look good. This still needs a complete docstring, and preferably one that is added with the add_newdocs mechanism instead of hidden in the C code. More importantly, why write this in C? There's no C code in scipy.stats right now, and I don't see a reason to add some now. For the failing test, it would be good to conditionally skip that for numpy < 1.7 because scipy 0.10 should work with numpy 1.6 (maybe even 1.5). |
I wrote it in C because I wanted to make them ufuncs. Also to use them as a prototype for updating the numpy ufunc creation docs, which is something I'm currently working on. Of course, there's also a minor speed boost--the ufuncs run in about 75% of the time as a python level call np.log(x/(1-x)). The speed bump is important since expit and logit are commonly called in iterative fitting algorithms. logit and expit could go in scipy.special instead, but there's nothing else in scipy.special that a statistician would care about as far as I can tell. It seems like they belong in scipy.stats so a stat user wouldn't have to constantly call The doc-string is unsatisfying, but I copied what's done in scipy.special for its ufuncs. Mark had mentioned the add_newdocs but I didn't pursue it. I'll take care of that later today. |
Agreed that they do belong in stats. I'm just not convinced that 250 lines of C instead of a one-liner in Python for a 25% increase in speed is a good performance/maintainability trade-off. But let's see what other devs think, especially Josef since he's the stats maintainer. |
In its defense, the ufuncs are very simple as C-Numpy goes, the docs are about to get a major update, and there's already cython c in scipy.stats. Most of the C code is boiler plate anyways. The C code that actually computes the logit is only around 10-20 lines. The rest of it is the standard way of defining a ufunc. If that part of the code isn't working then many more basic ufuncs in numpy will also not work. Also, as a ufunc, the user automatically get access to all the keywords for ufuncs. Things like out, where, setting the order, etc.. The ufuncs could be generated via cython, but that would also generate C code, and require anyone looking at the code to be conversant with cython's ins and outs. |
Yes, as C code goes it's very clean. Cython is better still though - if you understand Python code you understand Cython code too. You normally don't have to look at the C generated from it. (not suggesting you rewrite this, if others think it's fine, then it's fine with me too) |
I think this belongs in scipy.special, alongside expm1, log1p and similar. scipy.stats has no pure functions like this. scipy.stats imports a large number of functions from scipy.special, so the pattern to have the functions in scipy.special already exists. I would also be in favor of cython code, for example special.orthogonal_eval.pyx looks much easier to read and has separation between ufunc boilerplate and cython implementation of the actual function. In any case, I won't be maintaining the code, so I leave the decision to others. Two asides:
What's the numpy requirement for the next scipy release? I don't think having the function in half floats would be worth increasing the version requirement for numpy. (I don't know the details, but I wonder whether it compiles against older numpy, not just whether we have to skip some tests). They are useful functions, but in terms of how much they are used, I think xlogy would be more important. |
One more comment about scipy.special docstrings: From what I have seen they are semiautomatically created. I think eventually we should switch to docstrings following the numpy standard. There are several functions where I would have liked to add more information. The current docstrings of many special functions are awfully uninformative about details. |
I'll put them in scipy.special and remove the half-floats. Though scipy.stats currently isn't importing anything at the top level from scipy.special. I poked around a bit in scipy.orthogonal_eval.pyx. It's largely not ufuncs or it's a very simple ufunc. The only ufunc defined there is _eval_chebyt, and it's only defined for the input dtypes of (long, double) and output of double. It does not take multiple dtypes and output the appropriate corresponding dtype. The user calls eval_chebyt rather than _eval_chebyt. So the user doesn't actually call the ufunc, and instead calls a python wrapper. The user won't have access to various keyword arguments that ufuncs have, and in general it's not robust to future changes and additions to the ufunc. Similarly, all the other functions are just calling each other (they're convenience functions for different scalings) or calling ufuncs from the _cepehs library. I think if orthongal_eval.pyx was written to support multiple dtypes and so that every function was a ufunc then the code would look essentially like the logit and expit C code. The eval_poly_chebyt would certainly have to be rewritten for each type to ensure that cython compiled it for all the types. (Unless cython supports templating of some form.) Some of the loop boilerplate could be eliminated by passing functions into things like _loop_1d_d for the other dtypes. But that could be done in C, too. They have many loops like that defined in numpy.core.src.umath in loops.c.src. As for the comments...that's actually not as simple as it sounds. Or at least it wasn't. I just submitted a pull request to numpy to help change that. ufuncs are python objects whose docstrings are read-only. They are set at object creation. The either (1)must be passed in to the C function creating the ufunc (at which point you have to have the docs in some C header file or parse them in from a Python header file with the docs), or (2) you must reset what the ufunc doc pointer points to at the C level. The add_newdoc method doesn't work on ufuncs. (And in fact there are a number of calls to add_newdoc in numpy/add_newdocs.py that don't change the documentation. It isn't noticed because add_newdoc specifically catches all errors and does nothing.) |
I moved logit and expit to scipy.special. It looks like all the tests work out, though the docs depend on the add_newdoc_ufunc that currently has a pull request in numpy. If add_newdoc_ufunc isn't in numpy then the docs simply aren't modified and remain empty. |
I don't have a problem with adding new functions to |
Templating? In C? I don't think I follow. I second the reorganization, but I'm unsure what the best way to do that would be. I had thought about trying to put logit/expit in the existing files, but it didn't seem worth the effort of figuring out how. It wasn't nearly as straight-forward as just creating a new file. |
I think we can postpone the reorganization, so it does not have to be done in this pull request. We already have a sort of a mess there, but we can clean it up later. So let's just do it as you do now. Templating: rename the |
... like so: https://github.com/pv/scipy-work/commits/oth/chrisjordansquire/logit -- will try to think again tomorrow |
Ah, thank you for the example. Changes are made and pushed. I thought it was C preprocessor dark magic I hadn't seen before, but according to this Since you showed this to me, I'm curious if
I'd like to know all that so I can incorporate it into the ufunc tutorial I wrote a few weeks ago. |
Note also that you need to use |
Done. |
This is merged, so closing. It's also segfaulting under python3 though. Did one of you try this with python3? This is the first time numpy-core-style templating is used in scipy, so there may be some problems. Should be solved asap, since this was just released in a beta... |
That segfault may actually be fixed by a patch Christoph Gohlke sent; I'm going to test that now. |
That did fix it, added it to #77 One failure left related to logit with numpy 1.5.1:
|
Failure fixed in commit b6728b0 |
Sorry. I thought I'd tested this under Python 3, but I confused it with another pull request. |
No problem. |
Mea culpa, I was in too much of a hurry to get it in and forgot to test it on Python 3. |
FIX: refactor so biasedurn works with numpy<1.17
logit and expit ufuncs for scipy.stats. Logit and expit are commonly used functions in statistical modeling, and they do not appear to be anywhere in scipy or numpy.
Note: One of the tests in test_logit currently fails due to a bug in numpy. I'll be submitting a patch for the numpy bug in a day or so, and then it passes just fine. (The isnumber function in numpy.testing.utils doesn't recognize the half-float as a number.)