# mean() (and median()) should work with "object" arrays #4063

Closed
opened this Issue Nov 19, 2013 · 11 comments

Projects
None yet
5 participants
Contributor

### lebigot commented Nov 19, 2013

 With NumPy 1.8, `mean()` started to break when calculating the (global) mean of an array that contains objects (arrays with an object `dtype`). This also breaks `median()` on such arrays. Here is an example: ``````>>> numpy.arange(10).astype(object).mean() Traceback (most recent call last): File "", line 1, in numpy.arange(10).astype(object).mean() File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/_methods.py", line 67, in _mean ret = ret.dtype.type(ret / rcount) AttributeError: 'int' object has no attribute 'dtype' `````` Another example is case of numbers with uncertainties from the uncertainties package (lebigot/uncertainties#22). I think that it would be better if NumPy did not assume that scalar results have a `dtype`, since arrays can contain objects that have a meaningful mean. I believe that such objects should not be forced to have a `dtype`, which is obviously NumPy specific (they even can't, for Python scalars like floats). Furthermore, a `dtype` is in principle not necessary for the calculation of the mean of such objects, so it would look strange if they had to have one. The problem is that `numpy.mean()` assumes that the intermediate result obtained has a `dtype` (with a `type` attribute). Therefore, I suggest that NumPy's `mean()` also handle arrays of objects that are not of the standard NumPy types (their `dtype` is object, and they contain objects that have a meaningful mean, like ints, floats, numbers with uncertainties, etc.).

Closed

Contributor

### juliantaylor commented Nov 19, 2013

 introduced in f16b12e by @charris
Member

### seberg commented Nov 20, 2013

 Hmmm, this is annoying. It seems to me like the only way to fix this is probably to see if `dtype` is given, and then use `np.dtype(dtype).type(...)` and otherwise just do the plain operation?
Member

### charris commented Nov 20, 2013

 Yeah, I was thinking along the same lines.
Contributor

### lebigot commented Nov 21, 2013

 I would be curious to see what the issue that prompted the change was, to see if I can come up with any kind of better suggestion (who knows): what was the problem, exactly?
Member

### charris commented Nov 21, 2013

 The scalar returns didn't preserve type, i.e., float32 would go to float64. That was on account of type precedence between scalars being different than type precedence between scalars and arrays.
Member

### seberg commented Nov 21, 2013

 @charris do we care even about that? Or is it enough if the passed in dtype actually gets honored?
Member

### seberg commented Nov 25, 2013

 I honestly have troubles to figure out a good method of preserving the type quite right for the scalar result. I now think we may have to just check for object dtype input (or passed in dtype). The most secure method I can think of would be a new keyword argument to the ufuncs to skip PyArray_Return (would probably be slower though), but unless that is useful elsewhere it is not worth the trouble either.

Closed

### jbzdak commented Feb 4, 2014

 Any progress on this one? It costed me a hour of debugging today. If doing this properly is hard, please consider fixing the error message so it is obvious what's wrong.
Contributor

### juliantaylor commented Feb 4, 2014

 @charris do you have time to have a look at this? I also think we accumulated enough fixes to warrant a 1.8.1 release if we add this and the C99 windows fix. thoughts?
Member

### charris commented Feb 4, 2014

 I'll get it done today sometime. Agree on 1.8.1, I came to that conclusion this morning. We should also fix the `divide` and `true_divide` ufuncs when the `dtype` is given.
Member

### charris added a commit to charris/numpy that referenced this issue Feb 10, 2014

``` BUG: Fix mean, var, std methods for object arrays. ```
```This takes care to preserve the object type for scalar returns and
fixes the error that resulted when the scalar did not have a dtype
attribute.

Closes #4063.```
``` ee2ddbf ```

Merged

### charris added a commit to charris/numpy that referenced this issue Feb 10, 2014

``` BUG: Fix mean, var, std methods for object arrays. ```
```This takes care to preserve the object type for scalar returns and
fixes the error that resulted when the scalar did not have a dtype
attribute.

Closes #4063.```
``` 4ebf25e ```

### juliantaylor added a commit to juliantaylor/numpy that referenced this issue Feb 15, 2014

``` BUG: Fix mean, var, std methods for object arrays. ```
```This takes care to preserve the object type for scalar returns and
fixes the error that resulted when the scalar did not have a dtype
attribute.

Closes #4063.

Conflicts:
numpy/core/tests/test_multiarray.py```
``` e1fc6bd ```

Closed