Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lens] [discuss] What kind of nested aggregations should the XY chart allow users to configure? #40823

Closed
wylieconlon opened this issue Jul 10, 2019 · 7 comments

Comments

@wylieconlon
Copy link
Contributor

wylieconlon commented Jul 10, 2019

When using the aggregated index pattern data source, what kind of queries should we allow users to build with drag & drop?

For the purpose of this discussion, we'll talk about some specific examples. The most simple is is a Date Histogram query, split by Top Destinations, showing the count of matching documents. Most users will prefer the nesting order implied in that query, which could be described as "Comparing the same top 5 destinations across all the days". This is roughly:

{
  "terms": {
    "date_histogram": {
      "count": 5
    }
  }
}

Some users will not want to see only the top 5 destinations across the entire range, but instead they want to see the top 5 every day. This would change the nesting order:

{
  "date_histogram": {
    "terms": {
      "count": 5
    }
  }
}

This presents a visual and UX challenge for any chart editor because we have to present these choices to the user visually. In the situation above, I would expect that the "date_histogram" is mapped to the X axis, the "terms" is mapped to the split, and "count" is the Y axis.

There are at least four options we could consider for answering this question, each detailed below. Each of them has significant UX tradeoffs. The four options ranked in terms of "powerfulness", or how much flexibility users have:

  1. What is shown in last year's proof of concept using eui-charts
  2. What Visualize supports today
  3. What our feature/lens branch has so far
  4. What is shown in the user-tested proof of concept from May

To go in reverse from least to most powerful, these are the options:

4. The user-tested proof of concept from May

This version is restricted to exactly one X, Y, and Split. The Split is always the top level of aggregation, and users can't customize this at all. The UI looks like:

Screenshot 2019-07-10 17 34 10

Pros:

  • Predictable behavior
  • Easy to build a good suggestion algorithm

Cons:

  • Does not support multiple metrics on the Y axis, although this could be done
  • No ability for users to change the nesting order

3. What our feature/lens branch has so far

This version supports multiple Y axes and Split series. The nesting order is basically the same as what Visualize supports, but with more opinionated defaults. "Buckets" are always first, then "Metrics", although users don't see those terms. Our opinionated default is that split series are always at the top level of aggregations.

If we continue implementing this option, we could build a "tree sorter" using drag & drop, but that shows users the nesting level better than Visualize.

Our default is:

Screenshot 2019-07-10 18 05 35

Pros:

  • Can create similar graphs to Visualize
  • Better defaults let us guide users more than Visualize

Cons:

  • Having a separate UI for changing nesting order is complicated for users to understand

2. What Visualize supports today

Visualize supports a single X axis and split chart, with multiple split series. The rearranging is done by rearranging the expanded panels:

Screenshot 2019-07-10 17 36 24

One of the restrictions this editor has is that all metrics are shown at the lowest level of nesting:

Screenshot 2019-07-10 18 05 43

Pros:

  • Very clear separation between "buckets" and "metrics" which is familiar to Elasticsearch DSL users
  • Flexible for experienced users

Cons:

  • Requires familiarity with how Elasticsearch works to use

1. What is shown in last year's proof of concept using eui-charts

This proof of concept does nesting in a totally different way from the above. It has a system of "layers" that are merged using a single X axis.

Screenshot 2019-07-10 17 43 32

To visualize how these layers interact:

Screenshot 2019-07-10 18 05 50

This is the most powerful model because it gives users the ability to visualize a Split metric on Y, comparing directly against an un-split metric. A query that is possible in this model, but not in the others, is comparing the Average of Bytes from all sources against the Average of Bytes split by source. The nesting for this might look different, where we would have to merge the dates:

{
  "terms": {
    "date_histogram": {
      "average": 5
    },
  },
  "date_histogram": {
    "average": 10,
  }
}

Pros:

  • Concept of "layers" should be easy to teach, and each layer maps well to the concept in elastic-charts
  • Even more flexible than current Visualize
  • Can potentially support queries on multiple index patterns

Cons:

  • Potentially slower if it involves creating multiple data tables
  • Might require users to manually create join-like connections between data
  • Changing nesting order with layers is confusing

Personally, I would be happiest as a user with an option using Layers like the mocks from last year.

Which one should we go with for our beta target?

@chrisdavies @flash1293 @cchaos @AlonaNadler @timroes @rayafratkina

@AlonaNadler
Copy link

@wylieconlon excellent way to describe the options
To me split series default should always be a top-level aggregation, that is most commonly and expected behavior.
Regarding option 4, the more common request to split based on the top level is optimized in this option.
You named very good pros. Regarding the cons,

Cons:
Does not support multiple metrics on the Y axis, although this could be done

we can allow users to add multiple metrics and split series for every y-axis in addition we can provide users with the ability to add another separate y-axis which will not be split (that was the thinking when we did the mockups)

No ability for users to change the nesting order

In the future, we can have a setting in the split series configuration that reverses the order

In option 3, why can't we use the model in 3 to support the UI in option 4, what are your concerns?

Regarding option2, the ui, terminology and experience of Visualise serve us only when users are either familiar with ES or learn how to use kibana and practice it and even then it is not graunted they remember how it works when they need to use it. In Lens we are trying to improve the experience, doing split series charts and changing the order of the aggregations is one of the most confusing aspects of visualizing today where even power users fall into that trap. In addition metric and buckets is a terminology we are trying to move away from since it requires our users to start understanding how ES works before they can do visualizations, Kibana becomes so popular not all users are even aware it runs on ES.

I personally think the combination of 4 and 1 might be the best approach, by default we should always aggregate the split series before the x-axis or y-axis (btw it doesn't mean we need to support it from day1) .
options 1 allows us to, later on, support multiple index patterns better, and combine splitted and non splitted y-axis
In my mind the combination part can work like in this draw:

image

@cchaos
Copy link
Contributor

cchaos commented Jul 15, 2019

Of course I'd push for option 4 .😉 I'm not totally following about what's going on in the background and I'm just going to focus my comment on the UI as that's my area of concern.

The UI from the original POC is mostly what I'd expect in terms of "layering" and originally where I thought we'd end up in the new feature branch but just haven't gotten there yet. Essentially I look at it as one (cartesian/histogram) chart having:

  • A single x-axis that is shared by all y-axis
  • 1+ y-axes that are completely unaware of each other and house their own aggregations, splits, and styles
  • Has the ability to split itself once

When it comes to the order of splits and aggregations, however it goes, all I ask is that it's very clear to the user that order matters and there is a way to change this order.

Here is an example mock:

  1. The shared X-axis
  2. The separate Y-axes
  3. Indicate with language that there is an order of operations. "This then that". Perhaps give them a quick way to reverse the direction.
  4. I think this is what you're getting at about splitting first then aggregation
  5. House possible splits within the axis/series/layer that it can be applied to
  6. Split chart should be moved outside of the layers since you can only do this once and doesn't pertain to any special series or layer.

Q: Re-reading though is there also an order to the x-axis aggregation vs the y-axis not just with splitting?

Caveat: This is how I see it. Most of the responses I get from people are "that's not how it works". So if that's true, how would you modify the concept and mock above to align it with how it works.

@flash1293
Copy link
Contributor

A small comment to option 4:

I'm not sure whether we should only have a single x axis config panel - I imagine this being part of each "layer" (defaulting to the same operation as the existing ones when creating a new layer) enabling the user to change it on a per layer basis. This has two reasons:

  • If we are going to support multiple index patterns in a single chart, the user has to specify the field for the x axis for each participating index pattern.
  • There are valid use cases to use different fields for the x axis for different layers even in a single index pattern (an index pattern which includes different time fields, for example start and end time - in this case one layer would use the start_date field on the x axis and another layer would use the end_date field on the x axis)
  • (Not really a big use case) A user would probably want to use different date histogram intervals for different layers - with a single config for the x axis this is not possible.

From the technical side this wouldn't be a problem, we would just have to make sure each x axis dimension panel uses the same data type (dates don't mix well with strings).

@cchaos One thing your mock up doesn't capture is that a split series can happen before or after the x axis which is hard to make clear if it is configured in a completely separate panel. This would become simpler if there is a x axis config for each layer.

@timroes
Copy link
Contributor

timroes commented Jul 16, 2019

I think we should rule out Option 2. One of the goals of the new editor was to get rid of Elasticsearch knowledge for users. Especially around the nesting thing we've seen in the past that it's very hard to understand for users why it matter at all, and how it influences the visualizations, so we should try not to rely on that ES nesting understanding as a UX "feature".

Despite the fact that Option 4 is super nice to implement and the most simple version I don't think that it will satisfy our needs in the long run (I will leave aside whether we need this in the first release or not).

To me split series default should always be a top-level aggregation, that is most commonly and expected behavior.

I don't agree with that statement, and that's also why I think Option 4 won't be enough for us. I think the actual nesting order users "expect" actually depends on the visualization (and parameters of it), and maybe even on the actual selected aggregation. Let me give a quick example here:

If you have a line chart where you want to draw the Count of requests per top 5 countries over time, you would expect an order of Split (terms) -> X-axis (date histogram) -> Count, because without that you'll end up in that weird you got way more then 5 lines scenario that we see users to be very often confused about.

If you want to visualize the Top 3 visited domains of the top 4 countries users came from in a bar chart, you would most likely expect the opposite order: X-Axis (terms countries) -> Split (terms domains) -> Count
If you apply the split-first order here, you will very easily end up with a bar chart like the following:

screenshot-20190716-164845

That's most likely not what you wanted to see, but you wanted to see also the top 3 visited domains from the other 3 countries, but if you first do the split we'll only get the overall top domains in whatever countries those are.

I think that the order here will in general follow the chart type: if you use a continuous chart type (Line, Area) you most likely want the split to be done first, because you want in every bucket on the x axis to be the same lines/areas present over the chart. While if you do a non-continuous chart like a bar chart you most likely want the top x values per bar and not overall and leave half of the x-axis slots/bars empty.

Given that I think we a) are not enough by doing an "always split aggregation then the rest" and b) should let the visualization decide this and do sensible defaults based on different scenarios like the above.

Unfortunately there might be scenarios where those "defaults" won't really satisfy the user, because without knowing about the actual semantic content of the data, we can never make the 100% right decision as the user can, why I think we need some way to let the user customize this. How we design this will imho be one of the very crucial UX parts of the new editor, with which we can make a lot of right, but unfortunately also destroy a lot of good UX if we get this wrong.

As @cchaos already mentioned above we would need to somehow let the user specify this order explicitly between different type of aggregation (x-axis, y-axis, split), which I have currently absolutely no good idea how that would look. My worry with the approach of "ordering" as in the above screenshot (if we would extend this also to the different type of aggregations) would be, that I've the feeling it's not necessarily clear to the user, what is "after" something else.

So for the above example I think a user has a good time still imagine in the bar chart example, that they first want the top countries then the top domains. For the date histogram it's not as intuitive anymore I fear. Wouldn't a user think they want to do the timefield then the top countries? I have the feeling to understand that then relation in that context will require you to understand nesting in ES again, which we are trying to get away from.

With regards to the first MvP: If we could implement that a visualization can set the order as it's reasonable for it (bar chart x-axis first, line/area split first) I think that would be enough for the very first release and leave us with a bit more time figuring out a really good UX around how we want to solve that in a generic way.

@cchaos
Copy link
Contributor

cchaos commented Jul 16, 2019

More based off of @flash1293's comments

The mock I posted is then probably trying to optimize too much by "sharing" an X-axis.

We can follow on this idea of "layers" and say that users can add multiple layers but they have to re-indicate an x-axis. On creation of this new layer, Lens can default to auto-filling the x-axis with the field and aggregation from the previous layer, but the user can also change it.

  1. Unique data source for each layer
  2. The UI will now need to indicate at the aggregation level what part of the viz the field has been added to
  3. The x-axis is the same now, but users can change this (and it's order of op)
  4. Each panel is now a full "layer"

@flash1293
Copy link
Contributor

flash1293 commented Jul 16, 2019

That looks good to me, @cchaos

@timroes about this paragraph

So for the above example I think a user has a good time still imagine in the bar chart example, that they first want the top countries then the top domains. For the date histogram it's not as intuitive anymore I fear

I understand your concerns here. We also talked doing this never makes sense for line charts, so what about having the following UI for each layer:

-----------------------------------------
|  X-Axis                               |
|  ------------------------------------ |
|  [ ] (split each bar individually)    |
|  Split Series 1                       |
|  then                                 |
|  Split Series 2                       |
|  then                                 |
|  Split Series 3                       |
|  (Add split series)                   |
|  ------------------------------------ |
|  Y Axis 1                             |
|  (Add y axis)                         |
-----------------------------------------

The checkbox applies the split series aggregations after the x axis aggregation and is only shown for bar charts, not for line charts. You can re-arrange the splits among each other, but not change the sorting for y axes and x axis (y axes can't be mixed because metrics always have to be completely nested inside bucket aggregations)

@wylieconlon
Copy link
Contributor Author

I believe we have improved this enough that this is no longer an open discussion. If more discussion is needed we can open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants