Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add convenience functions nans, infs, nans_like, infs_like #2875

Merged
merged 21 commits into from
Jun 30, 2013

Conversation

ahojnnes
Copy link
Contributor

@ahojnnes ahojnnes commented Jan 1, 2013

I find myself often in the situation where I type the long version of this:

b = np.empty_like(a, dtype=np.double)
b[:] = np.inf

Please let me know if you find this useful as well and I will add test cases for those functions.

@njsmith
Copy link
Member

njsmith commented Jan 2, 2013

Hmm, this is creating something of a proliferation of different functions. I think the usual way to do this is to just write

b = np.ones_like(a, dtype=np.double) * np.inf

but obviously that is substantially less efficient than the multi-line version you suggested (it creates and initializes two separate arrays).

Maybe we should add np.filled and np.filled_like functions instead, which take an argument specifying the fill value? Still not entirely sure whether it'd be worth adding to the numpy api, but it does seem like a useful convenience and less conceptually cluttered than the current patch. Possibly we could also simplify numpy's code by reimplementing zeros, zeros_like, etc., in terms of these new functions.

@ewmoore
Copy link
Contributor

ewmoore commented Jan 5, 2013

I think that this is unnecessary. Adding np.filled or np.filled_like would be better. I think the idiomatic way to do this right now is:
b = np.empty_like(a, dtype=np.double).fill(np.inf)
This is substantially faster than the technique shown above.

In [11]: timeit np.empty(4096).fill(np.inf)
10000 loops, best of 3: 29.2 us per loop

In [12]: timeit np.ones(4096) * np.inf
1000 loops, best of 3: 1.08 ms per loop

@njsmith
Copy link
Member

njsmith commented Jan 5, 2013

That can't be the idiomatic way, b/c .fill doesn't return self :-)

In [10]: b = np.empty_like(a, dtype=np.double).fill(np.inf)

In [11]: b is None
True

So I think right now the * is the only single-statement way to do this,
even though it's dumb and slow.

I guess I'm +0.5, maybe even +1, on np.filled/np.filled_like.

On Sat, Jan 5, 2013 at 9:00 PM, Eric Moore notifications@github.com wrote:

I think that this is unnecessary. Adding np.filled or np.filled_likewould be better. I think the idiomatic way to do this right now is:
b = np.empty_like(a, dtype=np.double).fill(np.inf)
This is substantially faster than the technique shown above.

In [11]: timeit np.empty(4096).fill(np.inf)
10000 loops, best of 3: 29.2 us per loop

In [12]: timeit np.ones(4096) * np.inf
1000 loops, best of 3: 1.08 ms per loop


Reply to this email directly or view it on GitHubhttps://github.com//pull/2875#issuecomment-11919794.

@ewmoore
Copy link
Contributor

ewmoore commented Jan 5, 2013

Touche. That said, my point stands. Using a multiply to fill an array is a terrible idea. Even if the better way takes two lines rather than one.

b = np.empty_like(a, dtype=np,double)
b.fill(np.inf)

Given this I think filled/filled_like are an okay idea.

@ahojnnes
Copy link
Contributor Author

ahojnnes commented Jan 6, 2013

I agree, filled and filled_like are the best option. Give me some days and I will implement those two functions as part of this PR.

@ahojnnes
Copy link
Contributor Author

filled and filled_like including tests added.

@njsmith
Copy link
Member

njsmith commented Jan 13, 2013

I see tests for filled_like, but not for filled. Am I missing something, or do you need to add more tests? :-)

@ahojnnes
Copy link
Contributor Author

Yes, I forgot about that because I couldn't find tests for ones and zeros. Do you know where I can find those? If those do not yet exist, I could implement those as well, of course.

@njsmith
Copy link
Member

njsmith commented Jan 13, 2013

I'm not finding any tests for the basic ones/zeros functionality either.
They get some coverage from being used in other tests, but it's probably
not thorough WRT things like different shapes, dtypes, orders, etc. If
you'd like to add tests for all of them at once that would be great :-)

(And it's possible that we're both just missing them, but a little
duplication in the test suite is far better than having nothing in the test
suite!)

On Sun, Jan 13, 2013 at 6:34 PM, Johannes Schönberger <
notifications@github.com> wrote:

Yes, I forgot about that because I couldn't find tests for ones and zeros.
Do you know where I can find those? If those do not yet exist, I could
implement those as well, of course.


Reply to this email directly or view it on GitHubhttps://github.com//pull/2875#issuecomment-12197751.

@ahojnnes
Copy link
Contributor Author

Added test cases.

@njsmith
Copy link
Member

njsmith commented Jan 13, 2013

TestCreationFuncs is failing on Python 3: https://travis-ci.org/numpy/numpy/builds/4128136

@ahojnnes
Copy link
Contributor Author

I don't have a running Python 3 environment here, so I skip all string/unicode comparisons. Maybe someone else can have a look into this. Maybe this is a bug?

@seberg
Copy link
Member

seberg commented Jan 14, 2013

The difference is:

Python 2.7:

In [8]: np.dtype('S1').type(0)
Out[8]: '0'

In [9]: np.dtype('S0')
Out[9]: dtype('S')

Python 3.2:

In [8]: np.dtype('S1').type(0)
Out[8]: b''

In [9]: np.dtype('S0')
TypeError: data type not understood

The second difference does not really matter much I believe, 'S0' is not really a sensible type anyway. But the first one is an inconsistency between the two and I believe as such a bug in python 3 string handling.

@ahojnnes
Copy link
Contributor Author

Maybe someone with more knowledge regarding the specifics of Python 3 string handling can have another look at this and tell what he/she thinks?

@ahojnnes
Copy link
Contributor Author

Has someone had time to look into this yet? If not I'll tackle this in the coming days.

@charris
Copy link
Member

charris commented May 4, 2013

In Python3.3:

>>> np.dtype('S1').type(0)
b''
>>> np.dtype('S0')
dtype('S')

So the problem looks to be only with 3.2. Now that we are not supporting Python < 2.6 the b prefix can be used, in Python2 it just gets ignored.

@charris
Copy link
Member

charris commented May 4, 2013

@seberg Looks like you thought this was good apart from the errors.

@ahojnnes
Copy link
Contributor Author

ahojnnes commented May 4, 2013

Rebased on current master and fixed the Python 3 test cases.

Tell me if there is anything else you need me to do.

@charris
Copy link
Member

charris commented May 4, 2013

Don't know what's going on with the Travis bot notification here, all the tests seem to have passed.

The new functions need to be mentioned in doc/release/1.8.0-notes.rst.


Please refer to the documentation for `zeros` for further details.

Other parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just copy the documentation from zeros, no point in sending people around the 'see blah' loop. I think the main justification for that practice was keeping things in sync but I don't think the documentation gets updated that often.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I also do it for ones, … etc.?

@ahojnnes
Copy link
Contributor Author

ahojnnes commented May 6, 2013

@charris Doc strings updated with separate parameter description and short example sections.

@ahojnnes
Copy link
Contributor Author

ahojnnes commented May 6, 2013

I guess all remaining tasks should be addressed now. Please, check if the release notes are appropriate?

@njsmith
Copy link
Member

njsmith commented May 6, 2013

Looks good to me, but one more subtle API question: should the default dtype for filled be np.float64, or np.array(fill_value).dtype?

@ahojnnes
Copy link
Contributor Author

ahojnnes commented May 6, 2013

That's actually a good question. On the one hand np.float64 is more consistent with the existing zeros etc. functions, on the other hand np.array(fill_value).dtype is definitely applicable to more types of input.

I'm +1 on changing it to np.array(fill_value).dtype. Any other opinions?

@ahojnnes
Copy link
Contributor Author

I checked the behavior of the current implementation and it is already the case that np.array(fill_value).dtype is used the dtype for the filled function.

@njsmith
Copy link
Member

njsmith commented May 22, 2013

I like np.array(fill_value).dtype better too.

The docs currently say that filled uses float64 by default, so that would still need to be changed.

You have a merge conflict that needs resolving.

@ahojnnes
Copy link
Contributor Author

Doc string updated and rebased on current state of master branch.

@ahojnnes
Copy link
Contributor Author

ahojnnes commented Jun 6, 2013

/ping

@charris
Copy link
Member

charris commented Jun 6, 2013

Needs a rebase,

@ahojnnes
Copy link
Contributor Author

ahojnnes commented Jun 6, 2013

@charris Rebased on current master.

@ahojnnes
Copy link
Contributor Author

/ping

@charris
Copy link
Member

charris commented Jun 12, 2013

@njsmith Is this ready?

@njsmith
Copy link
Member

njsmith commented Jun 12, 2013

Sigh.

The code looks great to me and I like the API. But I just went back and looked at the mailing list discussion:
http://thread.gmane.org/gmane.comp.python.numeric.general/52763 archives link

We're going to have a mess if we just declare this done and merge it as is, because now that I remind myself, the general opinion in that thread seems to have been:

  • You can't possibly call this thing filled because that conflicts with np.ma.filled. But there are no better names.
  • Therefore the best solution is to call it is np.empty(shape, fill=value) instead.

My personal opinion is that this argument is completely wrong. Making np.ma's API clean is not as important as making numpy proper's API clean, esp. since np.ma already is already inconsistent with the word 'fill' being used for totally different things. So we can just live with the inconsistency between np.filled and np.ma.filled, or can deprecate np.ma.filled and tell people to use the method version instead (which already exists). And the np.empty idea is just horribly ugly, will confuse newbies and everyone else forever, and I'm 100% convinced we would regret it.

But, just ignoring the discussion and merging anyway will do more harm than good, even if everyone does eventually agree the PR is totally awesome...

I guess there's not really anything we can do except send another email and re-open the debate.

Sigh.

@ahojnnes
Copy link
Contributor Author

If naming is an issue, how about np.vals(…) and np.vals_like(…)?

@njsmith
Copy link
Member

njsmith commented Jun 12, 2013

I think that suggestion did come up in the last thread, though I don't
remember the conclusion off-hand.

New thread: http://thread.gmane.org/gmane.comp.python.numeric.general/54467 archive link

On Wed, Jun 12, 2013 at 2:42 PM, Johannes Schönberger <
notifications@github.com> wrote:

If naming is an issue, how about np.vals(…) and np.vals_like(…)?


Reply to this email directly or view it on GitHubhttps://github.com//pull/2875#issuecomment-19325972
.

@ahojnnes
Copy link
Contributor Author

Nevertheless, I can bring up this topic on the mailing list again.

@njsmith
Copy link
Member

njsmith commented Jun 12, 2013

Might be better to wait a day or so to see if the current 'filled' names go
through this time and only bring that up if not... thread length tends to
be exponential in the number of ideas suggested, and probability of
successful resolution is inversely proportional to thread length.

On Wed, Jun 12, 2013 at 2:50 PM, Johannes Schönberger <
notifications@github.com> wrote:

Nevertheless, I can bring up this topic on the mailing list again.


Reply to this email directly or view it on GitHubhttps://github.com//pull/2875#issuecomment-19326572
.

@ahojnnes
Copy link
Contributor Author

@njsmith Sorry, it is hard for me as an outside person to follow the discussion, but we do not seem to approach any consensus as far as I can see, do we? ;-)

@njsmith
Copy link
Member

njsmith commented Jun 30, 2013

@ahojnnes - Phew. At long last, it looks like people are all okay with full and full_like -- want to do a quick search/replace and get this merged before someone changes their mind? :-)

(Sorry this has been such a hassle. Thanks for sticking with it so far!)

@ahojnnes
Copy link
Contributor Author

@njsmith No problem. Now, we only need Travis to be happy with the changes... ;-)

njsmith added a commit that referenced this pull request Jun 30, 2013
Add convenience functions nans, infs, nans_like, infs_like
@njsmith njsmith merged commit 29dcc54 into numpy:master Jun 30, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants