Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: let's deprecate #18262

Open
jreback opened this issue Nov 13, 2017 · 37 comments

Comments

Projects
None yet
@jreback
Copy link
Contributor

commented Nov 13, 2017

xref #18202

We have some cruft, let's deprecate it (I have noted some which already have an issue associated).

From ndarray

  • Series.compress() (#21930)
  • Series.imag / Series.real (#27106)
  • Series.item()
  • Series.nonzero() (#24048)
  • Series.put() (#27106)
  • Series.itemsize
  • Series.flags
  • Series.strides

timeseries specific

  • Series.first()
    let head / tail take a timedelta
  • Series.last()

other

  • Series.swapaxes()
    • No reason once panel is gone.

non-controversial

  • MultiIndex.to_hierarchical (only used for Panel) (#21613)
  • Series/DataFrame.compound() (#26405)
  • Series.ptp() (#21614)
  • Series.from_array (#18213)
  • Series.valid() (#18800)
  • Series/DataFrame.slice_shift()
  • Series/DataFrame.tshift()
  • Series/DataFrame.get_values() (#19617)
  • Index.dtype_str (#27106)
  • Index.summary() (#18217)
  • .get_ftype_counts (#18243) (#20404)
  • .get_dtype_counts #27145
  • Index/Series.asobject (#18237) (#18572)
  • Index.to_native_types() (make private)
  • DataFrame/Series.as_matrix (#18458)
  • .clip_upper/.clip_lower (replace by .clip) (#24203)

Potentially

  • .ftypes (#18243) (#26744)
  • .xs() (#6249)
  • .iat/.at
  • .take
  • .lookup (non-trivial if you actually need this though)
    Think about this one. Maybe a standalone function somewhere.
  • DataFrame.from_items
  • Series/DataFrame.add_prefix/add_suffix (#18347)
    • Maybe add suffix / prefix to concat?
  • NDFrame.filter

@jreback jreback added this to the 0.22.0 milestone Nov 13, 2017

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Nov 13, 2017

cc @jorisvandenbossche @TomAugspurger if any comments / objections pls note and I will update the top section.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Nov 13, 2017

I think I'm -1 on deprecating xs.

-0 on deprecating ptp

What's the alternative to tshift? I think that's sometimes useful when a shift won't quite work.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Nov 13, 2017

I was just writing up a similar issue :-) (but only for Series, as by fixing the api docs and docstrings, I bumped into quite some methods unknown to me)

Overview of methods that could be considered for removal (note that this list is very long, and are methods that I would not miss if they are gone, which does not mean that they are not useful to others, it's just for stirring discussion):

  • Related to the original ndarray subclassing:

    • Series.compress()
    • Series.flags
    • Series.imag / Series.real
    • Series.item()
    • Series.itemsize
    • Series.nonzero()
    • Series.put()
    • Series.strides
    • Series.ptp
  • Time series specific ones (the question here is if they are all worth it as method, while very specific in application):

    • Series.at_time()
    • Series.first()
    • Series.last()
    • Series.between_time()
    • Series.tshift()
  • Finance? specific

    • Series.compound()
  • Other:

    • Series.asobject
    • Series.as_matrix
    • Series.between()
    • Series.first_valid_index() / Series.last_valid_index()
    • Series.from_array()
    • Series.slice_shift()
    • Series.swapaxes()
    • Series.truncate()
    • Series.valid()
@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Nov 13, 2017

What's the alternative to tshift? I think that's sometimes useful when a shift won't quite work.

shift already seems to have a freq keyword as well, and it dispatches to tshift if freq is specified

Note that my above list is very long. The more obvious ones to me that are not yet in the list in the top post are: as_matrix (and maybe swapaxes ?)

@topper-123

This comment has been minimized.

Copy link
Contributor

commented Nov 16, 2017

Could .add_prefix and .add_suffix be added to the deprecation list?

The dataframe/Series namespace is huge and cutting down can make the API easier to grasp. I would also think it more logical and idiomatic to operate directly on the columns, rather than on the dataframe.

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Nov 16, 2017

@topper-123 added, feel free to submit PR's for any of these!

@manrajgrover

This comment has been minimized.

Copy link
Contributor

commented Nov 16, 2017

@jorisvandenbossche @jreback I would love to submit PRs for this issue. How can I go about in deprecating these APIs?

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Nov 16, 2017

see for example #18258

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Nov 17, 2017

@manrajgrover Best first post a comment here with which one you would start doing, as I think not all those listed above are uncontroversial.

@topper-123

This comment has been minimized.

Copy link
Contributor

commented Nov 17, 2017

I think .compound, while not very useful ATM, could be more useful if it was cumulative, ie. just use .cumprod instead of .prod and return a series:

>>> s = pd.Series([0.2, 0.2, 0.2])
>>> s.compound()  # essentialy the same as (s+1).cumprod() - 1
0    0.2000
1    0.440
2    0.728
dtype: float64

The above would play excellently together with .pct_change, so data.pct_change().compound() would read really well and be very useful in many use cases.

any opinions if .compound could be changed like above rather than deprecated? If .compound returns a scalar as today, I agree it should be deprecated.

@topper-123

This comment has been minimized.

Copy link
Contributor

commented Nov 17, 2017

I've started a PR for add_prefix and add_postfix.

I will take on Series.asobject and NDFrame.as_matrix next, unless @manrajgrover wants to to them, in which case you'll be welcome.

And yes, I like to remove superfluous methods that start with a, as these are so visible then tab-completing in the REPL :-)

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Nov 18, 2017

Although I never use add_prefix / add_suffix, I think they are quite used a bit (looking at the number of stackoverflow questions), so I am not yet fully convinced they are ok to deprecate.
So I would rather already start with the others like asobject, as_matrix, valid, tshift, ..

as_matrix is also used a bit (more than the others mentioned in the list above), but because it is a very confusing name for what it does, I think it would be good to deprecate.

@manrajgrover

This comment has been minimized.

Copy link
Contributor

commented Nov 18, 2017

@jorisvandenbossche I can start with Index.summary() for now and pick the next one from the following list:

  • Index.dtype_str
  • .ftypes/.get_ftype_counts (#18243)
  • Index/Series.asobject (#18237)
  • Index.to_native_types() (make private)
@topper-123

This comment has been minimized.

Copy link
Contributor

commented Nov 18, 2017

@jorisvandenbossche, what about removing a prefix/suffix or doing any other transformation you'd want to do on a index? My point is that .add_prefix/.add_suffix are way too specialized methods, and pandas should have methods that are more generally useful. .rename is great in that respect, and should be the canonical method for changing axis values.

I've already made a proposal for .add_prefix/.add_suffix (#18347), so that can wait to see what the agreement will be on that. I would appreciate input though, if you see anything obvoíous, as that is the first deprerecation PR I've made.

In the same vein, the difference between .as_matrix and .values is miniscule to the point where df[columns].values is the same is df.as_matrix(columns). pandas will be cleaner and leaner with only one way to archieve such a common result (obviously so, IMO...).

I'll make a PR for as_matrix, as there seems to be agreement on that.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Nov 22, 2017

My point is that .add_prefix/.add_suffix are way too specialized methods, and pandas should have methods that are more generally useful. .rename is great in that respect, and should be the canonical method for changing axis values.

I completely agree with this. But, you also have the fact that people are using it and thus a removal will cause inconvenience / break code. So it is always a balance between both.

I am certainly +1 on deprecating as_matrix. As you say this is almost exactly the same as .values, and although I think this method is also used quite a bit (the argument I use for add_prefix ..), it's an awfully confusing name, so that's for me an extra reason to deprecate it.

@tdpetrou

This comment has been minimized.

Copy link
Contributor

commented Nov 22, 2017

Here are my deprecation suggestions:

  • I'd like to see read_table deprecated. Its the exact same as read_csv with tab delimiter
  • remove get_dtype_counts/get_ftype_counts - these are just convenience for DataFrame.dtypes.value_counts
  • remove the indexers iat/at. They give a small performance boost for an increase in API complexity
  • Use one of iterrows/itertuples to iterate over rows
  • remove lookup and take - other indexers do the same thing
  • remove combine - never used it and almost no use on SO. Looks to do nothing more than DataFrame.add
  • Probably get rid of applymap - should do the same thing with apply and then map inside of it
  • Use only agg not aggregate
  • Remove clip_upper and clip_lower and keep DataFrame.clip for both
  • Combine add_prefix/add_suffix into one method
  • One of the biggest issues are the methods that work only with DataFrames with a DatetimeIndex - first, last, truncate, at_time, between_time, to_period, to_timestamp - These could be removed or put in an accessor
  • Remove reindex_axis and reindex_like in favor of just reindex
  • remove isna - its an alias to isnull
@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Nov 22, 2017

remove the indexers iat/at. They give a small performance boost for an increase in API complexity

these are convenience methods, not sure they add much to API burden

take

this is a very common notion and is a very array-like method

remove isna - its an alias to isnull

this was just added for compat with dropna, fillna, see the pattern :>, so if anything we would remove isnull, but that has been in the API so long that it may well nigh be impossible to actually remove (and more to the point very annoying).

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Nov 23, 2017

Remove reindex_axis and reindex_like in favor of just reindex

reindex_axis is already deprecated.

One of the biggest issues are the methods that work only with DataFrames with a DatetimeIndex - first, last, truncate, at_time, between_time, to_period, to_timestamp - These could be removed or put in an accessor

to_period and to_timestamp on a series do something else than the methods in the .dt accessor. The former work on the index, the latter on the values. So it's not possible to just move them.
But on the other datetime-related I agree, I also find it a bit unfortunate that those exist (certainly first and last are very confusing in naming)

@tdpetrou

This comment has been minimized.

Copy link
Contributor

commented Nov 23, 2017

@jreback There are only a total of 13 occurrences of df.take in all of Stack Overflow and in my opinion should never be used.

df.iat/.at are probably too entrenched in legacy code to remove but they provide no extra functionality. Indexing is the most confusing aspect to pandas and the less the better. Maybe a better design would have been to do df.loc(type='scalar')['row', 'col']

I guess there is no going back on isna/isnull but I really dislike having methods that are aliases of one another.

@jorisvandenbossche I wasn't being clear, but all those DataFrame/Series methods that only work on DateTimeIndexes could be put in their own accesor (not .dt) but I don't think even that would be a good idea. Perhaps just deprecating all of them would be best.

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Nov 23, 2017

@jreback There are only a total of 13 occurrences of df.take in all of Stack Overflow and in my opinion should never be used.

.take() is a common name for array-like things.

df.iat/.at are probably too entrenched in legacy code to remove but they provide no extra functionality. Indexing is the most confusing aspect to pandas and the less the better. Maybe a better design would have been to do df.loc(type='scalar')['row', 'col']

your suggestion is much less readable

I guess there is no going back on isna/isnull but I really dislike having methods that are aliases of one another.

sure, but isnull is even more entrenched than anything else.

@max-sixty

This comment has been minimized.

Copy link
Contributor

commented Nov 23, 2017

df.iat/.at are probably too entrenched in legacy code to remove but they provide no extra functionality. Indexing is the most confusing aspect to pandas and the less the better. Maybe a better design would have been to do df.loc(type='scalar')['row', 'col']

As indexing is simplified & improved, the speed diff between .loc and .at should fall (a lot of the time it's a very similar function). Then deprecating .at will cause less strife

@topper-123

This comment has been minimized.

Copy link
Contributor

commented Nov 26, 2017

DataFrames have a method .boxplot. I would assume this should be deprecated and people should use .plot.box instead?

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Nov 26, 2017

yes i think there is an issue about this

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Nov 26, 2017

@jreback jreback added this to the 0.24.0 milestone Apr 14, 2018

@topper-123

This comment has been minimized.

Copy link
Contributor

commented Apr 16, 2018

IMO DataFrame.filter is confusingly named, and is easily confused with the similarly named DataFrame.groupby(...).filter when googling etc.

I propose that DataFrame.filter be deprecated and a similarly functioning DataFrame.select_filter be added. By doing this rename, the relation to select_dtypes is emphasized.

It could also be named just select (shorter), but that means that the change will have to wait until the current deprecated select method is removed.

@h-vetinari

This comment has been minimized.

Copy link
Contributor

commented Jul 17, 2018

I found the OP from @TomAugspurger in #21894 quite important, and since that issue is closed now, I quote it here:

As a reminder, the plan is to have no new deprecations in 0.25.x and 1.0.0. So this [v0.24] is the last round of deprecations before 1.0.

In this context, I'd like to bring up for discussion the following two issues: #21950 #21951

Finally, l'll repeat a comment I made in the other thread:

Most likely too late to the game, but for completeness I'd like to add: if #21855 #21858 are solved for v0.24, then combine_first could be deprecated at the same time, see #21859.

jzwinck added a commit to jzwinck/pandas that referenced this issue Jul 28, 2018

MAINT: refactor from_items() using from_dict(). Fixes pandas-dev#21850
This removes the deprecation warnings introduced in pandas-dev#18262,
by reimplementing DataFrame.from_items() in the recommended
way using DataFrame.from_dict() and collections.OrderedDict.

This eliminates the maintenance burden of separate code for
from_items(), while allowing existing uses to keep working.

A small cleanup can be done once pandas-dev#8425 is fixed.

jzwinck added a commit to jzwinck/pandas that referenced this issue Jul 28, 2018

MAINT: refactor from_items() using from_dict(). Fixes pandas-dev#21850
This removes the deprecation warnings introduced in pandas-dev#18262,
by reimplementing DataFrame.from_items() in the recommended
way using DataFrame.from_dict() and collections.OrderedDict.

This eliminates the maintenance burden of separate code for
from_items(), while allowing existing uses to keep working.

A small cleanup can be done once pandas-dev#8425 is fixed.

@jzwinck jzwinck referenced this issue Jul 28, 2018

Closed

MAINT: refactor from_items() using from_dict() #22094

4 of 4 tasks complete

jzwinck added a commit to jzwinck/pandas that referenced this issue Jul 28, 2018

MAINT: refactor from_items() using from_dict(). Fixes pandas-dev#21850
This removes the deprecation warnings introduced in pandas-dev#18262,
by reimplementing DataFrame.from_items() in the recommended
way using DataFrame.from_dict() and collections.OrderedDict.

This eliminates the maintenance burden of separate code for
from_items(), while allowing existing uses to keep working.

A small cleanup can be done once pandas-dev#8425 is fixed.
@D-K-E

This comment has been minimized.

Copy link

commented Jul 29, 2018

Hi, I am also -1 for deprecating xs().
I read most of the discussion in the related issue.
But as someone whose using pandas from time to time, the sheer amount of capacity of loc is kind of confusing me.
Especially when I am using a multi index with different levels. I tend to remember the levels not in their order of hierarchy but by their names. Selection by loc becomes a hassle when I want to slice third and fifth level in the hierarchy, because most of the time I confuse the third with the second one, or fourth with the fifth etc.
I don't loose much time on it, but still it is a little inefficient compared to where I can simply pass a value and the name of the level to a function.

@jimmywan

This comment has been minimized.

Copy link

commented Jan 25, 2019

Why on earth would you deprecate read_table? That makes no damn sense.
The suggested change is to call read_csv to read things that are not comma-separated? This is 100% backwards.

@st-bender

This comment has been minimized.

Copy link

commented Feb 1, 2019

One could argue why not deprecate read_csv() instead of read_table() since table sounds more flexible.

Edit:
I have to agree with @jimmywan here, and if they are basically the same, why not at least keep it as an alias? One could always wrap it, but people would not be confused or avoid updating.

@pilkibun

This comment has been minimized.

Copy link
Contributor

commented Jun 26, 2019

DataFrame.where and DataFrame.mask are duals, but their names don't indicate that. perhaps deprecate mask? since mask is just where(~cond), IIUC. alternatively rename to where_not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.