Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: .get_values() #19617

Closed
jreback opened this issue Feb 9, 2018 · 6 comments · Fixed by #26409
Closed

DEPR: .get_values() #19617

jreback opened this issue Feb 9, 2018 · 6 comments · Fixed by #26409
Labels
Blocker Blocking issue or pull request for an upcoming release Deprecate Functionality to remove in pandas
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Feb 9, 2018

this was always meant as a private implementation detail. deprecate it now!

@jreback jreback added the Deprecate Functionality to remove in pandas label Feb 9, 2018
@jreback jreback added this to the 0.23.0 milestone Feb 9, 2018
@jreback jreback mentioned this issue Feb 9, 2018
34 tasks
@jorisvandenbossche
Copy link
Member

To be clear: the only difference with .values is that for sparse .values gives a sparse array and .get_values() a dense array? So for that case we can recommend .values.to_dense() ?

@jreback
Copy link
Contributor Author

jreback commented Feb 9, 2018

that's about right. or really _ndarray_values is the right thing. I think the SparseEA should just be able to handle this properly.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Feb 9, 2018

Series[Categorical].get_values() returns an ndarray. So in that case we'd recommend np.array(series).

Really, np.array(thing) is maybe the best way of saying "give me an ndarray". We can implement __array__ to do the right thing if necessary.

In general, I think we'll want two public APIs to mirror our two internal ones

  1. A way to get the data as an ndarray, even if it means losing information (DatetimeTZ) or it's expensive (Categorical)
  2. A way to get the data backing the Series/Index, even if it isn't an ndarray (the extension array).

@jorisvandenbossche
Copy link
Member

It's not fully a mirror, as the internal _ndarray_values is different and not meant to public I think? (in that sense I actually wanted to comment on the other PR that that name can still be somewhat confusing, not that I have a (short) alternative in mind)

even if it means losing information (DatetimeTZ) or it's expensive (Categorical)

Here you actually still have multiple possibilities for DatetimeTZ, because you could also go the "expensive" way (object array of timestamps), like we do for unique

@TomAugspurger
Copy link
Contributor

Here you actually still have multiple possibilities for DatetimeTZ, because you could also go the "expensive" way (object array of timestamps), like we do for unique

True... Even with something like PeriodIndex._ndarray_values, there are different use-cases (indexing expects integer ordinals, but no user would expect that).

Having the user do np.array(series[datetime-with-tz], dtype=object / 'datetime64[ns]) seems like the most explicit way of doing things. But the properties that "do the right thing" are also convenient.

@jorisvandenbossche
Copy link
Member

The problem with removing get_values is that we ourselves use it in a bunch of places (eg values_from_object which is massively used).

The current situation:

  • DataFrame.get_values simply points to .values so is (I think?) always a numpy ndarray.
  • Series.get_values points to _data.get_values() (the SingleBlockManager), which in turn does np.array(self._block.to_dense(), copy=False).
    So the Block.to_dense() in general returns a view on the array, although for ExtensionBlock it does np.array(self.values), with the exception for DatetimeTZBlock, which ensures a non-object array with np.asarray(self.values, dtype=_NS_DTYPE), and for CategoricalBlock, it returns the Categorical.get_values which can be a DatetimeIndex.
  • Index.get_values points to .values, so is a numpy ndarray with the exception of:
    • newer Index objects which are already backed by EA: IntervalIndex returns IntervalArray here (but eg DatetimeIndex with tz aware data does return a numpy array)
    • CategoricalIndex overrides the parent method to return the get_values of the underlying Categorical, which in turn mostly returns an ndarray, but an EA for interval-backed dtypes and DatetimeIndex for datetimetz aware backed categorical.

Summary: a complete mess :-)

And there is not a simple replacement internally.

@jreback jreback added the Blocker Blocking issue or pull request for an upcoming release label Jun 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocker Blocking issue or pull request for an upcoming release Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants