DEPR: filter & select #12401

jreback · 2016-02-20T18:23:15Z

do we need label selectors? we should for sure just have a single method for this. maybe call it query_labels? to be consistent with .query as the workhorse for data selection.

.select (DEPR: deprecate .select() #17633)
.filter

xref #6599

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2017-02-14T23:19:21Z

I personally find filter a useful function (at least I have used it to good purpose in my own work) to select certain columns. See also the examples added in #12399. Although it should rather be called select ...

Less sure about select. That seems less useful, certainly now loc accepts a function.

jreback · 2017-02-14T23:30:51Z

I think I have revised my thoughts here.

we should promote (in the doc / the-one-way-to-do-it), .select as the main label filtering function, and deprecate .filter (which ATM serve the same purpose). Maybe needs some API tweaks.

.filter is traditionally a data selection / filtering function.

jorisvandenbossche · 2017-02-14T23:34:19Z

They are quite different at the moment:

filter:
- acts on columns by default (for dataframe)
- can select based on list, or simple 'like'/more advanced regex
select:
- acts on index by default
- selects based on function applied to index labels

jreback · 2017-02-14T23:48:52Z

further: .filter uses .select for regex matching in its implementation.

jreback · 2017-02-14T23:51:17Z

further: we use .filter() in .groupby() to allow a filter for group inclusion (boolean return)

shoyer · 2017-02-15T14:28:23Z

I have found DataFrame.filter to be useful, especially with like or regex. I have never used DataFrame.select, which feels very non-idiomatic to me.

So I would be happy to deprecate select. It's also highly confusing how GroupBy.filter works like DataFrame.select, not .filter.

dkasak · 2017-06-29T15:40:17Z

It's also highly confusing how GroupBy.filter works like DataFrame.select, not .filter.

I agree this is highly confusing. Is renaming one of those out of the question? filter is a common name for a higher-order function which filters elements based on the result of a Boolean-valued function that was passed in, exactly like GroupBy.filter, so that seems like an appropriate name for what is currently DataFrame.select. There's also Python's builtin filter function.

Another option might be merging the functionality of select and filter under one name, so it supports both list-like and function arguments.

jreback · 2017-07-15T16:58:44Z

so the problem as highlited by @jorisvandenbossche is that .select acts on the index (which is what groubpy.filter and boolean selection does). so it is a highly confusing name.

.filter is also a confusing name as it acts on the labels of columns.

We need a combined functionaility of the current DataFrame.select/filter (IOW to select labels from an axis and should accept a list-like, scalar and callable, like most other functions)

signature should be something like this (default for most functions is axis=0)

def select_labels(arraylike or scalar or callable, axis=0, regex=False)

now as to what to do:

select_labels I think is a nice name (open to suggestions), though other systems (spark & sql), use .select to mean label/column selection.
deprecate .select in favor of .select_labels
deprecate .filter in favor of select_labels

@dkasak interested in taking this on?

shoyer · 2017-07-15T18:41:54Z

I would suggest simply deprecating/removing select without making a replacement. Indexing is a fine alternative.

DataFrame.filter() is useful. I wish it were called select instead, both because that matches SQL and filter suggests filtering rows with a boolean expression (like filter in dplyr or Ibis), but I don't think changing the name is worth the hassle.

In general, I think we should avoid making small changes in the API for the basic grammar of data manipulation in pandas, unless we rethink things more broadly for a larger, breaking change (e.g., in pandas2).

jreback · 2017-07-15T18:50:08Z

but I don't think changing the name is worth the hassle

sure it is - pandas is going to exist for 1.x for quite some time

better to make changes to the right spelling sooner rather than later

I am all for deprecating filter and calling it select (or select_labels)

dkasak · 2017-07-16T12:13:05Z

I don't have time to handle this at the moment, but I may be interested in doing it when time permits if it hasn't been done already by then.

FWIW, upon some thought, I still think changing the name of .filter to .select* would be best. I don't feel strongly about .select vs .select_labels. I generally prefer shorter names, but the added verbosity here might make things clearer. Calling it .select has the benefit that only one name is deprecated, not two.

I'm not so sure about dropping the current behaviour of .select entirely because I have a use case which I'm not sure how to implement without it (and without resorting to things like .reset_index() to regain the ability to select by using a function).

In particular, I have a MultiIndex with 2 levels, each of which has elements of type str. In other words, each index value is conceptually a pair of strings. Currently I'm doing something like

df.select(lambda x: condition1(x[0]) and condition2(x[1]))

and similar to select particular rows. How could this be implemented without current .select functionality?

jreback · 2017-07-16T13:46:36Z

can u show a complete example of how using select

closes pandas-dev#12401

jorisvandenbossche · 2017-12-05T16:48:46Z

On the pandas-dev mailing list concern was raised about the the deprecation of select, see https://mail.python.org/pipermail/pandas-dev/2017-November/000649.html

I think the example makes a point. For me the alternative like .loc[:, lambda df: complex_fxn_that_selects_a_few_cols(df.columns)] is harder to read and to teach as .select(complex_fxn_that_selects_a_few_cols(). Which makes the deprecation of select a step backwards for those cases.

jondo · 2018-01-16T10:31:26Z

The deprecation message currently only suggests a replacement for the case axis=0.

I suggest to expand this to:

use df.loc[df.index.map(crit)] to select labels, df.loc(axis=1)[df.columns.map(crit)] to select columns.

smcinerney · 2018-09-07T22:31:35Z

I only just found out about this change and the doc still doesn't give guidance. For actual selection by column value, people also use numpy operators np.select(condlist, choicelist, ...) (for multiple values) and np.where(cond, [valTrue, valFalse]) for two values. Is that good/bad/another alternative? Witness the confusion on SO. I think the root of the issue is that pandas select verb disagreed with what numpy and SQL select do, hence created confusion.

There's still a docbug needed on this, but first we need to know what you actually recommend.

jreback added Indexing Related to indexing on series/frames, not to indexes themselves API Design Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action labels Feb 20, 2016

jreback added this to the 0.19.0 milestone Feb 20, 2016

jreback mentioned this issue Jun 3, 2016

Add example usage to DataFrame.filter #12399

Closed

jreback mentioned this issue Sep 15, 2016

DEPR: 0.21 deprecations master issue #14220

Closed

8 tasks

jreback modified the milestones: 0.20.0, 0.21.0 Mar 29, 2017

TomAugspurger mentioned this issue Sep 11, 2017

RLS: 1.0 #17287

Closed

6 tasks

jreback added a commit to jreback/pandas that referenced this issue Sep 22, 2017

DEPR: deprecate .select() in favor of .loc(axis=)[]

1581a05

closes pandas-dev#12401

jreback mentioned this issue Sep 22, 2017

DEPR: deprecate .select() #17633

Merged

jreback added a commit to jreback/pandas that referenced this issue Sep 29, 2017

DEPR: deprecate .select() in favor of .loc(axis=)[]

437db04

closes pandas-dev#12401

jreback added a commit to jreback/pandas that referenced this issue Sep 29, 2017

DEPR: deprecate .select() in favor of .loc(axis=)[]

9c6734a

closes pandas-dev#12401

jreback added a commit to jreback/pandas that referenced this issue Oct 1, 2017

DEPR: deprecate .select() in favor of .loc(axis=)[]

9c2b402

closes pandas-dev#12401

jreback added a commit to jreback/pandas that referenced this issue Oct 1, 2017

DEPR: deprecate .select() in favor of .loc(axis=)[]

5a9bc70

closes pandas-dev#12401

jreback added a commit to jreback/pandas that referenced this issue Oct 2, 2017

DEPR: deprecate .select() in favor of .loc(axis=)[]

c8dd389

closes pandas-dev#12401

jreback modified the milestones: 0.21.0, 1.0 Oct 2, 2017

jreback added a commit to jreback/pandas that referenced this issue Oct 3, 2017

DEPR: deprecate .select() in favor of .loc(axis=)[]

dbd2473

closes pandas-dev#12401

jreback added a commit to jreback/pandas that referenced this issue Oct 3, 2017

DEPR: deprecate .select() in favor of .loc(axis=)[]

ef031c1

closes pandas-dev#12401

jorisvandenbossche closed this as completed in 48d0460 Oct 4, 2017

ghost pushed a commit to reef-technologies/pandas that referenced this issue Oct 16, 2017

DEPR: deprecate .select() in favor of .loc[] (pandas-dev#17633)

433a7f9

closes pandas-dev#12401

alanbato pushed a commit to alanbato/pandas that referenced this issue Nov 10, 2017

DEPR: deprecate .select() in favor of .loc[] (pandas-dev#17633)

5b83dac

closes pandas-dev#12401

No-Stream pushed a commit to No-Stream/pandas that referenced this issue Nov 28, 2017

DEPR: deprecate .select() in favor of .loc[] (pandas-dev#17633)

011e85c

closes pandas-dev#12401

topper-123 mentioned this issue Jun 4, 2019

Rename NDFrame.filter to .select? #26642

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEPR: filter & select #12401

DEPR: filter & select #12401

jreback commented Feb 20, 2016 •

edited

Loading

jorisvandenbossche commented Feb 14, 2017

jreback commented Feb 14, 2017

jorisvandenbossche commented Feb 14, 2017

jreback commented Feb 14, 2017

jreback commented Feb 14, 2017

shoyer commented Feb 15, 2017

dkasak commented Jun 29, 2017 •

edited

Loading

jreback commented Jul 15, 2017

shoyer commented Jul 15, 2017 •

edited

Loading

jreback commented Jul 15, 2017

dkasak commented Jul 16, 2017

jreback commented Jul 16, 2017

jorisvandenbossche commented Dec 5, 2017 •

edited

Loading

jondo commented Jan 16, 2018 •

edited

Loading

smcinerney commented Sep 7, 2018

DEPR: filter & select #12401

DEPR: filter & select #12401

Comments

jreback commented Feb 20, 2016 • edited Loading

jorisvandenbossche commented Feb 14, 2017

jreback commented Feb 14, 2017

jorisvandenbossche commented Feb 14, 2017

jreback commented Feb 14, 2017

jreback commented Feb 14, 2017

shoyer commented Feb 15, 2017

dkasak commented Jun 29, 2017 • edited Loading

jreback commented Jul 15, 2017

shoyer commented Jul 15, 2017 • edited Loading

jreback commented Jul 15, 2017

dkasak commented Jul 16, 2017

jreback commented Jul 16, 2017

jorisvandenbossche commented Dec 5, 2017 • edited Loading

jondo commented Jan 16, 2018 • edited Loading

smcinerney commented Sep 7, 2018

jreback commented Feb 20, 2016 •

edited

Loading

dkasak commented Jun 29, 2017 •

edited

Loading

shoyer commented Jul 15, 2017 •

edited

Loading

jorisvandenbossche commented Dec 5, 2017 •

edited

Loading

jondo commented Jan 16, 2018 •

edited

Loading