mean() (and median()) should work with "object" arrays #4063

Closed
lebigot opened this Issue Nov 19, 2013 · 11 comments

Comments

Projects
None yet
5 participants
@lebigot
Contributor

lebigot commented Nov 19, 2013

With NumPy 1.8, mean() started to break when calculating the (global) mean of an array that contains objects (arrays with an object dtype). This also breaks median() on such arrays. Here is an example:

>>> numpy.arange(10).astype(object).mean()
Traceback (most recent call last):
  File "<ipython-input-11-782b7c0104c3>", line 1, in <module>
    numpy.arange(10).astype(object).mean()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/_methods.py", line 67, in _mean
    ret = ret.dtype.type(ret / rcount)
AttributeError: 'int' object has no attribute 'dtype'

Another example is case of numbers with uncertainties from the uncertainties package (lebigot/uncertainties#22).

I think that it would be better if NumPy did not assume that scalar results have a dtype, since arrays can contain objects that have a meaningful mean. I believe that such objects should not be forced to have a dtype, which is obviously NumPy specific (they even can't, for Python scalars like floats). Furthermore, a dtype is in principle not necessary for the calculation of the mean of such objects, so it would look strange if they had to have one.

The problem is that numpy.mean() assumes that the intermediate result obtained has a dtype (with a type attribute).

Therefore, I suggest that NumPy's mean() also handle arrays of objects that are not of the standard NumPy types (their dtype is object, and they contain objects that have a meaningful mean, like ints, floats, numbers with uncertainties, etc.).

@lebigot lebigot referenced this issue in lebigot/uncertainties Nov 19, 2013

Closed

NumPy 1.8 breaks mean() in arrays #22

@juliantaylor

This comment has been minimized.

Show comment Hide comment
@juliantaylor

juliantaylor Nov 19, 2013

Contributor

introduced in f16b12e by @charris

Contributor

juliantaylor commented Nov 19, 2013

introduced in f16b12e by @charris

@seberg

This comment has been minimized.

Show comment Hide comment
@seberg

seberg Nov 20, 2013

Member

Hmmm, this is annoying. It seems to me like the only way to fix this is probably to see if dtype is given, and then use np.dtype(dtype).type(...) and otherwise just do the plain operation?

Member

seberg commented Nov 20, 2013

Hmmm, this is annoying. It seems to me like the only way to fix this is probably to see if dtype is given, and then use np.dtype(dtype).type(...) and otherwise just do the plain operation?

@charris

This comment has been minimized.

Show comment Hide comment
@charris

charris Nov 20, 2013

Member

Yeah, I was thinking along the same lines.

Member

charris commented Nov 20, 2013

Yeah, I was thinking along the same lines.

@lebigot

This comment has been minimized.

Show comment Hide comment
@lebigot

lebigot Nov 21, 2013

Contributor

I would be curious to see what the issue that prompted the change was, to see if I can come up with any kind of better suggestion (who knows): what was the problem, exactly?

Contributor

lebigot commented Nov 21, 2013

I would be curious to see what the issue that prompted the change was, to see if I can come up with any kind of better suggestion (who knows): what was the problem, exactly?

@charris

This comment has been minimized.

Show comment Hide comment
@charris

charris Nov 21, 2013

Member

The scalar returns didn't preserve type, i.e., float32 would go to float64. That was on account of type precedence between scalars being different than type precedence between scalars and arrays.

Member

charris commented Nov 21, 2013

The scalar returns didn't preserve type, i.e., float32 would go to float64. That was on account of type precedence between scalars being different than type precedence between scalars and arrays.

@seberg

This comment has been minimized.

Show comment Hide comment
@seberg

seberg Nov 21, 2013

Member

@charris do we care even about that? Or is it enough if the passed in dtype actually gets honored?

Member

seberg commented Nov 21, 2013

@charris do we care even about that? Or is it enough if the passed in dtype actually gets honored?

@seberg

This comment has been minimized.

Show comment Hide comment
@seberg

seberg Nov 25, 2013

Member

I honestly have troubles to figure out a good method of preserving the type quite right for the scalar result. I now think we may have to just check for object dtype input (or passed in dtype). The most secure method I can think of would be a new keyword argument to the ufuncs to skip PyArray_Return (would probably be slower though), but unless that is useful elsewhere it is not worth the trouble either.

Member

seberg commented Nov 25, 2013

I honestly have troubles to figure out a good method of preserving the type quite right for the scalar result. I now think we may have to just check for object dtype input (or passed in dtype). The most secure method I can think of would be a new keyword argument to the ufuncs to skip PyArray_Return (would probably be slower though), but unless that is useful elsewhere it is not worth the trouble either.

@jbzdak

This comment has been minimized.

Show comment Hide comment
@jbzdak

jbzdak Feb 4, 2014

Any progress on this one? It costed me a hour of debugging today. If doing this properly is hard, please consider fixing the error message so it is obvious what's wrong.

jbzdak commented Feb 4, 2014

Any progress on this one? It costed me a hour of debugging today. If doing this properly is hard, please consider fixing the error message so it is obvious what's wrong.

@juliantaylor

This comment has been minimized.

Show comment Hide comment
@juliantaylor

juliantaylor Feb 4, 2014

Contributor

@charris do you have time to have a look at this?

I also think we accumulated enough fixes to warrant a 1.8.1 release if we add this and the C99 windows fix.
thoughts?

Contributor

juliantaylor commented Feb 4, 2014

@charris do you have time to have a look at this?

I also think we accumulated enough fixes to warrant a 1.8.1 release if we add this and the C99 windows fix.
thoughts?

@charris

This comment has been minimized.

Show comment Hide comment
@charris

charris Feb 4, 2014

Member

I'll get it done today sometime. Agree on 1.8.1, I came to that conclusion this morning. We should also fix the divide and true_divide ufuncs when the dtype is given.

Member

charris commented Feb 4, 2014

I'll get it done today sometime. Agree on 1.8.1, I came to that conclusion this morning. We should also fix the divide and true_divide ufuncs when the dtype is given.

@charris

This comment has been minimized.

Show comment Hide comment
@charris

charris Feb 5, 2014

Member

I need to think about this a bit more before putting up a fix.

Member

charris commented Feb 5, 2014

I need to think about this a bit more before putting up a fix.

charris added a commit to charris/numpy that referenced this issue Feb 10, 2014

BUG: Fix mean, var, std methods for object arrays.
This takes care to preserve the object type for scalar returns and
fixes the error that resulted when the scalar did not have a dtype
attribute.

Closes #4063.

charris added a commit to charris/numpy that referenced this issue Feb 10, 2014

BUG: Fix mean, var, std methods for object arrays.
This takes care to preserve the object type for scalar returns and
fixes the error that resulted when the scalar did not have a dtype
attribute.

Closes #4063.

juliantaylor added a commit to juliantaylor/numpy that referenced this issue Feb 15, 2014

BUG: Fix mean, var, std methods for object arrays.
This takes care to preserve the object type for scalar returns and
fixes the error that resulted when the scalar did not have a dtype
attribute.

Closes #4063.

Conflicts:
	numpy/core/tests/test_multiarray.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment