Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: .get_values() #19617

Closed
jreback opened this issue Feb 9, 2018 · 6 comments

Comments

@jreback
Copy link
Contributor

commented Feb 9, 2018

this was always meant as a private implementation detail. deprecate it now!

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Feb 9, 2018

To be clear: the only difference with .values is that for sparse .values gives a sparse array and .get_values() a dense array? So for that case we can recommend .values.to_dense() ?

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Feb 9, 2018

that's about right. or really _ndarray_values is the right thing. I think the SparseEA should just be able to handle this properly.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Feb 9, 2018

Series[Categorical].get_values() returns an ndarray. So in that case we'd recommend np.array(series).

Really, np.array(thing) is maybe the best way of saying "give me an ndarray". We can implement __array__ to do the right thing if necessary.

In general, I think we'll want two public APIs to mirror our two internal ones

  1. A way to get the data as an ndarray, even if it means losing information (DatetimeTZ) or it's expensive (Categorical)
  2. A way to get the data backing the Series/Index, even if it isn't an ndarray (the extension array).
@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Feb 9, 2018

It's not fully a mirror, as the internal _ndarray_values is different and not meant to public I think? (in that sense I actually wanted to comment on the other PR that that name can still be somewhat confusing, not that I have a (short) alternative in mind)

even if it means losing information (DatetimeTZ) or it's expensive (Categorical)

Here you actually still have multiple possibilities for DatetimeTZ, because you could also go the "expensive" way (object array of timestamps), like we do for unique

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Feb 9, 2018

Here you actually still have multiple possibilities for DatetimeTZ, because you could also go the "expensive" way (object array of timestamps), like we do for unique

True... Even with something like PeriodIndex._ndarray_values, there are different use-cases (indexing expects integer ordinals, but no user would expect that).

Having the user do np.array(series[datetime-with-tz], dtype=object / 'datetime64[ns]) seems like the most explicit way of doing things. But the properties that "do the right thing" are also convenient.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented May 15, 2019

The problem with removing get_values is that we ourselves use it in a bunch of places (eg values_from_object which is massively used).

The current situation:

  • DataFrame.get_values simply points to .values so is (I think?) always a numpy ndarray.
  • Series.get_values points to _data.get_values() (the SingleBlockManager), which in turn does np.array(self._block.to_dense(), copy=False).
    So the Block.to_dense() in general returns a view on the array, although for ExtensionBlock it does np.array(self.values), with the exception for DatetimeTZBlock, which ensures a non-object array with np.asarray(self.values, dtype=_NS_DTYPE), and for CategoricalBlock, it returns the Categorical.get_values which can be a DatetimeIndex.
  • Index.get_values points to .values, so is a numpy ndarray with the exception of:
    • newer Index objects which are already backed by EA: IntervalIndex returns IntervalArray here (but eg DatetimeIndex with tz aware data does return a numpy array)
    • CategoricalIndex overrides the parent method to return the get_values of the underlying Categorical, which in turn mostly returns an ndarray, but an EA for interval-backed dtypes and DatetimeIndex for datetimetz aware backed categorical.

Summary: a complete mess :-)

And there is not a simple replacement internally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.