New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2 filtering_settings #169

Open
chriddyp opened this Issue Oct 24, 2018 · 19 comments

Comments

9 participants
@chriddyp
Copy link
Member

chriddyp commented Oct 24, 2018

a placeholder issue to discuss the next version of the filtering_settings property. In particular:

  • Its name
  • Its structure: is it a string that the user has to parse or a more complex object?
  • What does the grammar look like? country eq("Canada") or country is Canada or country == Canada or country = Canada, etc

Related issues
#252
#301

@chriddyp chriddyp added this to Needs triage in Dash Table via automation Oct 24, 2018

@chriddyp chriddyp moved this from Needs triage to Priority #2 - Beta in Dash Table Oct 24, 2018

chriddyp added a commit that referenced this issue Oct 24, 2018

📝 prop type comments
Many of these props will change in the future, I’ve linked properties
to the following issues:
- #166
- #167
- #168
- #168
- #167
- #169
- #169
- #169
- #170
- #171

chriddyp added a commit that referenced this issue Oct 25, 2018

📝 prop type comments
Many of these props will change in the future, I’ve linked properties
to the following issues:
- #166
- #167
- #168
- #168
- #167
- #169
- #169
- #169
- #170
- #171

chriddyp added a commit that referenced this issue Oct 25, 2018

Docstrings (#173)
Many of these props will change in the future, I’ve linked properties
to the following issues:
- #166
- #167
- #168
- #168
- #167
- #169
- #169
- #169
- #170
- #171
@chriddyp

This comment has been minimized.

Copy link
Member Author

chriddyp commented Nov 2, 2018

How can we make it easier for backend authors to write queries? num() and string casting (") is more complex to write and so in the docs, we don't support them. This makes the front-end filtering different than the back-end filtering, which is confusing.

For example, in the docs, front-end filtering only supports:

  • Enter eq "Asia" in the "continent" column
  • Enter > num(5000) in the "gdpPercap" column
  • Enter < num(80) in the lifeExp column

whereas backend filtering only supports:

  • Enter eq Asia in the "continent" column
  • Enter > 5000 in the "gdpPercap" column
  • Enter < 80 in the lifeExp column
@jonmmease

This comment has been minimized.

Copy link

jonmmease commented Nov 3, 2018

Could we provide some way to automatically convert these expressions into expressions in the NumExpr DSL (https://numexpr.readthedocs.io/en/latest/user_guide.html#casting-rules)?

This DSL is directly support by pandas in the pd.query operation (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.query.html), making backend filtering an efficient one-liner. It also opens up the possibility to filter on expression involving multiple columns.

Update:
For that matter, any reason to not adopt the expression language itself for the front-end as well?

@chriddyp

This comment has been minimized.

Copy link
Member Author

chriddyp commented Nov 3, 2018

That's a really great idea 👍

@chriddyp

This comment has been minimized.

Copy link
Member Author

chriddyp commented Nov 3, 2018

In addition to the NumExpr DSL, I'd like for us to consider how this would scale for additional datatypes beyond str and numbers, including:

  • dates
  • times
  • IP addresses
  • categories vs freeform strings
  • numbers that could be considered strings (like zip codes)

Also, we should consider what the default "operator" is in the UI. Users shouldn't have to always specify an operator like contains or eq or ==: they should be able to just type some text (e.g. canada or 15) and a reasonable filtering operation should occur.

Here's a proposal that I worked on previously. Perhaps this can be updated to be more compatible with the NumExpr DSL


With the introduction of datatypes, the following changes could be made:

  • string and enum
    • Default operator is still "contains"
    • >, >=, <, <= will filter alphanumerically
    • Regex will be supported
  • number
    • Default operator is =, contains is not available
    • >, >=, <, <= will filter numerically
    • Decimal places will be supported
  • date
    • Default operator is "contains" and will filter all dates at a lower resolution. For example:
      • Searching 2015 will filter all values with the year 2015, regardless of the month, day, or time
      • Searching 2015-10-05 will filter all values on the day 2015-10-05, regardless of the time
      • In this mode, it will not be possible to search just "January" across all years
      • >, >=, <, <= will filter by date, starting at the lowest date (e.g. > 2015 is equivalent to > 2015-01-01 00:00:00)
      • = won't be supported. The user can search a specific date with the default and implicit "contains" operator
@eliasdabbas

This comment has been minimized.

Copy link

eliasdabbas commented Nov 3, 2018

I've used RStudio's implementation of DataTable, and I think they have a very nice solution for filtering visually, especially for a non-tech audience

Here's quickly how it works.

I think it's a great idea to provide filters for different data types, especially dates.

If this solution makes sense, then it might probably boil down to utilizing dash-core-components elements within the filtering cells somehow, and that would make a beautiful easy-to-use solution.

By automatically detecting the data type of the column, the appropriate filter would be provided.

Or maybe even give the developer the option to determine which filter they want to add per column.

dcc.RangeSlider: for numerics
dcc.Checklist OR dcc.Dropdown with multi=True: for categorical values when there is a small number of unique options
dcc.DatePickerRange: for dates

@jonmmease

This comment has been minimized.

Copy link

jonmmease commented Nov 4, 2018

It doesn't sound like the exact NumExpr/df.query DSLs are going to cover exactly what you're talking about here, especially when considering the idea of a default operator. So it probably does make sense to still define a dash table DSL, but it would be nice to do it in such a way that the combination of all filtering operations can be transformed into a single pandas query string.

Here are a few example expressions that I was playing around with:

For the numerical column a

df.query('a == 1')
df.query('a > 0')
df.query('0 < a and a < 2')

For the string or categorical column c

df.query('c.str.contains("Tw")')
df.query('c.str.startswith("T")')
df.query('c.str.match("\w*o")')
df.query('c.str.count("([a-g])") > 1')

For the datetime column d

df.query('d.dt.year == 2018')
df.query('d >= "2018-04"')
df.query('d.dt.day_name() == "Monday"')
df.query('d.dt.is_month_start')

So my thought is that if a table has filter expressions on columns a, b, and c then the combination of expressions gets transformed into a query string like

df.query('a > 0 and c.str.match("\w*o") and d >= "2018-04"')
@chriddyp

This comment has been minimized.

Copy link
Member Author

chriddyp commented Nov 7, 2018

Another thing to consider is how this relates to conditional formatting. For conditional formatting, we use the same filtering DSL (see https://dash.plot.ly/datatable/style). It's not clear to me if pandas uses the NumExpr/df.query DSL for their conditional formatting, their docs mostly show an imperative workflow (https://pandas.pydata.org/pandas-docs/stable/style.html)

@gaoy400

This comment has been minimized.

Copy link

gaoy400 commented Nov 20, 2018

@eliasdabbas I am also looking forward to have a filter with RangeSlider and/or dropdown. The datatable video in your comment is of great interest to me. One thing that I am looking for beyond the video is a backend filtering option even when using dropdown/RangeSlider filter.

Really looking forward to have this feature in the dash table in the future!

@eliasdabbas

This comment has been minimized.

Copy link

eliasdabbas commented Nov 21, 2018

@gaoy400 That would also be very nice.
This might be related somehow to #213.

@rbolado

This comment was marked as off-topic.

Copy link

rbolado commented Nov 21, 2018

Hello,

I recently started using the new 'DataTable' function. I am trying to implement a simple conditional formatting applied to a column. My issue is with the 'style_data_conditional' under the 'filter': option. It looks like the num() function rounds to the nearest integer not allowing us to applied <> to decimal numbers in the column. Is this correct? Any help is very much appreciated.

Many thanks

@marfel

This comment has been minimized.

Copy link

marfel commented Nov 30, 2018

I appreciate the discussion here, but implementing all this really looks like an epic endeavor, and I wonder if its really necessary for the average user? When I first stumbled on upon the dash_table package, ran the example and it took me a while to figure out how to specify filter expressions correctly. I found it not very intuitive.

Wouldn't it cover about 95% of all cases, if any column (even numerical ones) could be filtered by simply typing a few letters or numbers into their filter field, which are then matched anywhere and without case. In this datatable implementation
https://datatables.net/manual/styling/jqueryui
there is even only one global search field! If you get too many results, you just add another word. Aside from that, the "<" and ">" operators for numerical columns can simply be achieved by sorting by the respective columns.

IMHO, simplicity, intuitiveness and responsiveness are more valuable than being able to "program" a filter with lots of conditions and operators. Perhaps such a simple, alternative filtering interface could be implemented quickly, as an option? The RStudio example is certainly fancy, but as an interface for the typical non-programmer user it's probably a little bit over the top.

Just my two cents.

@eliasdabbas

This comment has been minimized.

Copy link

eliasdabbas commented Nov 30, 2018

Thanks @marfel
I fully agree that having the user "program" filters is difficult, even for programmers, because they would still have to learn the syntax.
Incidentally, the RStudio example is build with datatables that you link to.
The default is one search field and the others are optional if the developer thinks they are necessary.
I don't see why they are difficult to use. I think they are very easy and intuitive. They are visual controls, and once you click on them you immediately see the max and min numbers and a slider, (or text filters) so they seem, to me at least, very easy to use.

@gaoy400

This comment has been minimized.

Copy link

gaoy400 commented Dec 3, 2018

I totally agree with @marfel that the current filtering feature in dash_table is not quite user-friendly. But the one in the video shared by @eliasdabbas seems really intuitive to me. It does not require users to type any programming-like strings or so.

Still, implementing a similar feature as the one in the video would be an epic endeavor for sure...

@JeroneGagliano

This comment has been minimized.

Copy link

JeroneGagliano commented Jan 16, 2019

I agree with @marfel
I love what you all are doing with Dash but this feature is really needed. I had been using dash-table-experiments and chose it because of the easy filtering. Why not implement the "Filter Rows" that was in dash-table-experiments? It is so easy to use and intuitive!!! It was dynamic text filtering and numbers only required an intuitive <, >, <= etc operator. The current filtering in the currently Dash DataTable is not useful. One has to type the full text string to filter which is not useful when strings are long or have changing case.

Is the code used in dash-table-experiments not compatible with the new DashTable?

@chriddyp

This comment has been minimized.

Copy link
Member Author

chriddyp commented Jan 16, 2019

Why not implement the "Filter Rows" that was in dash-table-experiments? It is so easy to use and intuitive!

That's what I've proposed in this issue (#169 (comment)) and what we're planning on doing. we're working on it!

@JeroneGagliano

This comment has been minimized.

Copy link

JeroneGagliano commented Jan 16, 2019

Thanks, great to hear that! Sorry I did not see that in your previous post.
I see that this issue is a Priority#2. Does this mean in the next few months?

@chriddyp

This comment has been minimized.

Copy link
Member Author

chriddyp commented Jan 16, 2019

Does this mean in the next few months?

That's the goal. Also, as always, if your organization has budget for software, then certain features (like this one) can be prioritized and sponsored (https://plot.ly/products/consulting-and-oem/). It's really just a question of bandwidth :)

@JeroneGagliano

This comment has been minimized.

Copy link

JeroneGagliano commented Jan 16, 2019

I hear you. I am an independent consultant but would like to contribute financially to plotly dash development once my cashflow is up as a way of thanking you all.

@Harsha-Nori

This comment has been minimized.

Copy link

Harsha-Nori commented Feb 15, 2019

Same story here as @JeroneGagliano , I just wanted to chime in and say that the Dash-Table-Experiments was a real joy to use for filtering / sorting. It'd be awesome if there's any current way of getting DataTable to operate like that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment