COMPAT: np 1.18 wants explicit dtype=object) #30035

jbrockmendel · 2019-12-04T05:47:02Z

xref #30030, xref numpy/numpy#15041, numpy/numpy#15045

ATM we have 85 tests in the npdev build failing, this gets it down to 14 (at least for me locally).

jreback · 2019-12-04T12:42:17Z

pandas/core/strings.py

+        with warnings.catch_warnings():
+            # See https://github.com/numpy/numpy/issues/15041
+            warnings.filterwarnings("ignore", ".*with automatic object dtype.*")
+            out = np.sum(list_of_columns, axis=0)


can these be replaced with a sep.join(...) with a list-comprehension?

maybe? ATM im just trying to get a handle on the scope of the problem

We will revert this for 1.18. PRobably also on master (but probably only temporarily to pacify downstream testsuits for a bit!). The interesting thing would be if there are cases where it cannot be fixed easily. I.e. adding a warning filter should not be necessary except for specific tests. (I am sure you are aware, but the filter context manager is not a good solution in this code, since it is not thread safe at all.)

jreback · 2019-12-04T12:42:33Z

pandas/io/formats/format.py

@@ -1482,6 +1482,11 @@ def _format_strings(self) -> List[str]:
        if is_categorical_dtype(values.dtype):
            # Categorical is special for now, so that we can preserve tzinfo
            array = values._internal_get_values()
+        elif values.dtype.kind == "O":


prefer to use is_object_dtype

i didnt check whether that works on JSONArray, just that values.dtype == object doesnt

jreback

small comments, ideally you can xfail the other instances that are not fixed here?

jreback · 2019-12-04T12:42:57Z

pandas/tests/io/json/test_ujson.py

-        arr = np.array(arr_list)
-        tm.assert_numpy_array_equal(np.array(ujson.decode(ujson.encode(arr))), arr)
+        arr = np.array(arr_list, dtype=object)
+        result = np.array(ujson.decode(ujson.encode(arr)), dtype=object)


can you create an expected here

…fix5

eric-wieser · 2019-12-04T23:29:07Z

pandas/core/arrays/numpy_.py

+        with warnings.catch_warnings():
+            # See https://github.com/numpy/numpy/issues/15041
+            warnings.filterwarnings("ignore", ".*with automatic object dtype.*")
+            result = np.asarray(scalars, dtype=dtype)


This should be fixed at the caller, not here

yah, pretty much every usage where we're just suppressing the warnings was a kludge

My thought for a way forward with the new sentinel in numpy/numpy#15119 is that you would leave this unchanged. When needed you would add the sentinel as the dtype in the call to _from_sequence (or even to whoever is calling _from_sequence. During the deprecation period, try to rework the use case that justifies the inefficient ragged array, and get back to NumPy with the use case for such a construct so we can stop or rethink the deprecation.

eric-wieser · 2019-12-04T23:30:51Z

pandas/core/internals/blocks.py

+        if self.dtype.kind == "O":
+            # See https://github.com/numpy/numpy/issues/15041
+            return np.asarray(self.values, dtype=object)
        return np.asarray(self.values)


Since this is documented as being 1d only, this should be:

x = np.empty(len(self.values), dtype=self.dtype) x[:] = self.values return x

which will work correctly even if values == [[1, 2], [3, 4]]

eric-wieser · 2019-12-04T23:38:46Z

I think we (cc @mattip) might not have communicated how to fix this problem well enough in NEP 34. As I see it, there are two cases when dealing with coercing object arrays:

You know exactly what shape or dimension you want the output to be, so pre-allocate with np.empty, then fill:
```
def as_1d(x):
    arr = np.empty(len(x), dtype=object)
    arr[:] = x
    return arr
```
Now admittedly, it's not very easy to write such a function for anything other than 0d and 1d. Maybe we need to solve that before we undo the reversion.
You don't know what shape the array is supposed to be. The caller is responsible for specifying this clearly, by performing the ambiguous conversion for you. Silencing the warning defeats the point of the deprecation:
```
def do_some_things(arr, obj):
    arr = np.asarray(arr)   # if this emits a warning, it is the callers fault, not yours
```
It's unfortunate that python doesn't make it easy to append context to a warning and propagate it, so you can explain to pandas users that they need to fix the warning.

jreback · 2019-12-08T17:40:16Z

@jbrockmendel can you rebase

mattip · 2019-12-08T17:57:14Z

NumPy is considering adding something like np.array(vals, dtype=np.legacy_behaviour) so libraries (like pandas) could use that to keep the old behaviour. The idea would be that NumPy could add an additional API as mentioned in the Alternatives section in the NEP, or libraries could work out strategies that avoid using NumPy for this kind of container, and until then you could continue using the old behaviour.

jreback · 2019-12-08T18:03:56Z

NumPy is considering adding something like np.array(vals, dtype=np.legacy_behaviour) so libraries (like pandas) could use that to keep the old behaviour. The idea would be that NumPy could add an additional API as mentioned in the Alternatives section in the NEP, or libraries could work out strategies that avoid using NumPy for this kind of container, and until then you could continue using the old behaviour.

thanks @mattip

yeah we likely need a combination of strategies here; e.g. to fix where we can be sure of what the input is and suppress / remove where needed.

…fix5

jreback · 2019-12-10T13:11:10Z

pandas/core/dtypes/concat.py

@@ -183,6 +185,15 @@ def concat_categorical(to_concat, axis: int = 0):
    return result


+def _safe_array(x):


so i actually like this as a real function; can you move to pandas.compat.numpy and use everywhere?

can you do this one

jbrockmendel · 2019-12-12T21:44:00Z

pandas/tests/extension/test_numpy.py

@@ -78,7 +78,7 @@ def data_for_sorting(allow_in_pandas, dtype):
    if dtype.numpy_dtype == "object":
        # Use an empty tuple for first element, then remove,
        # to disable np.array's shape inference.
-        return PandasArray(np.array([(), (2,), (3,), (1,)])[1:])
+        return PandasArray(np.array([(), (2,), (3,), (1,)], dtype=object)[1:])


@TomAugspurger does the comment above about shape inference have any bearing on the inference here?

I think using object dtype is still correct, and I think the comment is still correct.

If we want to get rid of the empty tuple / shape stuff, we could probably allocate the array with the right shape and set the values later

In [6]: a = np.empty((3,), dtype=object) In [7]: a[:] = (2,), (3,), (1,)

jreback · 2019-12-12T23:10:10Z

@jbrockmendel can you rebase again

jreback · 2019-12-15T22:16:12Z

needs a rebase again

…fix5

seberg · 2019-12-16T19:04:27Z

Just to make sure you guys are up to date. It is very likely that the Opt-In to the old behaviour will not be object, but rather a new sentinel object.

mattip · 2019-12-16T19:09:18Z

Just to make sure you guys are up to date. It is very likely that the Opt-In to the old behaviour will not be object, but rather a new sentinel object.

There is a difference between np.array([1, 2], dtype=object) and np.array([1, 2], dtype=np.sentinel_that_means_legacy_behaviour): the first will result in dtype=object, the second in dtype=int See numpy/numpy#15083

jbrockmendel · 2019-12-16T19:14:18Z

Just to make sure you guys are up to date. It is very likely that the Opt-In to the old behaviour will not be object, but rather a new sentinel object.

@seberg @mattip thanks. Is there a way to re-enable the warning so I can see where we need to update things?

mattip · 2019-12-16T19:33:33Z

You could build numpy with numpy/numpy#15119. I just pushed it so you might want to wait to see if it passes all tests before trying it out.

…fix5

jbrockmendel · 2019-12-18T21:26:14Z

Closing, will work on this locally.

seberg · 2019-12-23T21:30:51Z

@jbrockmendel did you already make progress on this? I am curious if an np.allow_object flag to opt-in to the old behaviour solves the problem (also, assuming that we may want to remove it eventually). E.g. I could imagine that it may be convenient to be able to opt-in to the future error immediately.

In the best case, the cases that are not easily fixed are cases where np.allow_object can be pushed to the end-user as a (potentially) intermediate solution.

jbrockmendel · 2019-12-23T21:36:07Z

did you already make progress on this? I

We've updated most of the places that were easy to just add an explicit "dtype=object".

I haven't gotten around to work through this locally as suggested here. Do you agree that would be a good next step (and if so, is that still the right branch to use?)?

seberg · 2019-12-23T21:50:20Z

Good question, mostly good to know for us right now. If you (and we) are fine with dtype=object allowing ragged object arrays and for you using object indiscriminately is actually OK we are fine (we may have to think about it again at NumPy). In most cases that can run into this (and the warning/error is incorrect), I would hope you can even do np.empty(..., dtype=object) and set the shape manually.
We need to figure out what downstream requires here to know what our realistic options are.

jbrockmendel · 2019-12-23T22:26:18Z

and for you using object indiscriminately is actually OK

No, there are still many places where we currently call np.array without passing a dtype and depend on numpy's type inference. In many/most of those cases, we won't have pre-checked for non-raggedness.

There are a good chunk of those cases where we know we want 1D output, so as soon as we see a listlike element we know it'll end up object-dtype. Would passing ndim=1 in some cases where we can't specify dtype make this any easier?

seberg · 2019-12-23T22:49:19Z

@jbrockmendel adding an ndim keyword argument is one of the options yes. We would prefer to add only the things that are really needed though. We could also add the opposite (do not allow ragged) although that forbids using the allow_object name for other things in the future. In which case your pattern would be to use a try/except. The try/except idea seems only really useful though if: 1. You want to deprecate this also and 2. not doing anything (forcing the user to fix it up) is not an option.

Do you have a start with the particularly tough spots? Maybe we have to look at it from numpy to get a better feel for the pain in pandas. I would be happy if pandas is just fine with np.allow_object/ragged, but even then it would be nice if at some far future we could move away from supporting it entirely.

jbrockmendel · 2019-12-24T00:41:26Z

Do you have a start with the particularly tough spots?

I just cloned this branch and will test against it shortly.

jbrockmendel · 2019-12-24T01:41:49Z

i didnt get any of the expected errors on that branch. is there another one i should try?

mattip · 2020-01-13T22:02:44Z

xref numpy/numpy#15119

COMPAT: np 1.18 wants explicit dtype=object)

bd1b261

jreback added the Compat pandas objects compatability with Numpy or Python functions label Dec 4, 2019

jreback added this to the 1.0 milestone Dec 4, 2019

jreback reviewed Dec 4, 2019

View reviewed changes

jreback requested changes Dec 4, 2019

View reviewed changes

jbrockmendel added 3 commits December 4, 2019 11:28

suppress more

af573eb

Merge branch 'master' of https://github.com/pandas-dev/pandas into ci…

fa5df88

…fix5

reenable npdev build

1c6c7ff

mattip mentioned this pull request Dec 4, 2019

REV: Revert "Merge pull request #14794 from mattip/nep-0034-impl" numpy/numpy#15053

Merged

Merge branch 'master' of https://github.com/pandas-dev/pandas into ci…

8f10b6a

…fix5

jbrockmendel mentioned this pull request Dec 4, 2019

BUG: ufunc on ragged object array emits a warning numpy/numpy#15045

Closed

eric-wieser reviewed Dec 4, 2019

View reviewed changes

alfaro96 mentioned this pull request Dec 8, 2019

[MRG] Explicit conversion of ndarray to object dtype scikit-learn/scikit-learn#15832

Merged

Merge branch 'master' of https://github.com/pandas-dev/pandas into ci…

4be57a6

…fix5

jreback requested changes Dec 10, 2019

View reviewed changes

jbrockmendel commented Dec 12, 2019

View reviewed changes

jbrockmendel mentioned this pull request Dec 13, 2019

CLN: Assorted cleanups #30260

Merged

jbrockmendel added 3 commits December 16, 2019 07:03

Merge branch 'master' of https://github.com/pandas-dev/pandas into ci…

77c592f

…fix5

fix merge mixup

1ba69b1

post-merge fixups

5ebff2e

jbrockmendel added 2 commits December 17, 2019 12:01

Merge branch 'master' of https://github.com/pandas-dev/pandas into ci…

52ae9df

…fix5

revert kludges

3584dab

jbrockmendel closed this Dec 18, 2019

jbrockmendel deleted the cifix5 branch December 18, 2019 21:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COMPAT: np 1.18 wants explicit dtype=object) #30035

COMPAT: np 1.18 wants explicit dtype=object) #30035

jbrockmendel commented Dec 4, 2019

jreback Dec 4, 2019

jbrockmendel Dec 4, 2019

seberg Dec 4, 2019

jreback Dec 4, 2019

jbrockmendel Dec 4, 2019

jreback left a comment

jreback Dec 4, 2019

eric-wieser Dec 4, 2019

jbrockmendel Dec 4, 2019

mattip Dec 16, 2019

eric-wieser Dec 4, 2019 •

edited

eric-wieser commented Dec 4, 2019

jreback commented Dec 8, 2019

mattip commented Dec 8, 2019

jreback commented Dec 8, 2019

jreback Dec 10, 2019

jreback Dec 12, 2019

jbrockmendel Dec 12, 2019

TomAugspurger Dec 13, 2019

jreback commented Dec 12, 2019

jreback commented Dec 15, 2019

seberg commented Dec 16, 2019

mattip commented Dec 16, 2019

jbrockmendel commented Dec 16, 2019

mattip commented Dec 16, 2019

jbrockmendel commented Dec 18, 2019

seberg commented Dec 23, 2019

jbrockmendel commented Dec 23, 2019

seberg commented Dec 23, 2019

jbrockmendel commented Dec 23, 2019

seberg commented Dec 23, 2019

jbrockmendel commented Dec 24, 2019

jbrockmendel commented Dec 24, 2019

mattip commented Jan 13, 2020

		@@ -183,6 +185,15 @@ def concat_categorical(to_concat, axis: int = 0):
		return result


		def _safe_array(x):

COMPAT: np 1.18 wants explicit dtype=object) #30035

COMPAT: np 1.18 wants explicit dtype=object) #30035

Conversation

jbrockmendel commented Dec 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-wieser Dec 4, 2019 • edited

Choose a reason for hiding this comment

eric-wieser commented Dec 4, 2019

jreback commented Dec 8, 2019

mattip commented Dec 8, 2019

jreback commented Dec 8, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 12, 2019

jreback commented Dec 15, 2019

seberg commented Dec 16, 2019

mattip commented Dec 16, 2019

jbrockmendel commented Dec 16, 2019

mattip commented Dec 16, 2019

jbrockmendel commented Dec 18, 2019

seberg commented Dec 23, 2019

jbrockmendel commented Dec 23, 2019

seberg commented Dec 23, 2019

jbrockmendel commented Dec 23, 2019

seberg commented Dec 23, 2019

jbrockmendel commented Dec 24, 2019

jbrockmendel commented Dec 24, 2019

mattip commented Jan 13, 2020

eric-wieser Dec 4, 2019 •

edited