Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to rename Kibana "Index Patterns" #44955

Open
sqren opened this issue Sep 5, 2019 · 20 comments

Comments

@sqren
Copy link
Member

commented Sep 5, 2019

Today in Kibana there are two very different things which are both called "index patterns".
There is the string based index pattern which is used to match indices. A string based index pattern can be apm-*.
Then there is also the object based index pattern. This is mostly used by core Kibana plugins like Discover to provide metadata about one or more indices and is built as an abstraction on top of the field capabilities API.

The string based index pattern and object based index pattern are two very different things, and have very different use cases. Yet they are related and it is easy to mix the two up in conversations and in documentation.

The string based index pattern is very aptly named: it is a pattern that matches indices. Object based index patterns OTOH are not. They are not patterns, rather descriptions of fields in an index.

I therefore propose that the object based index patterns are renamed. Non-exhaustive list of suggestions for names (feel free to come up with others - this is just a starting point):

  • "Index Mappings"
  • "Kibana Mapping"
  • "Field Capabilities Mapping"
  • "Index Capabilities Mapping"

Isn't this just a waste of time?
My first thought was yes. Then two years passed and I'm now confident that the answer is no.
Naming frames how we think about things, and how others perceive them. Misleading names gives incorrect mental models, and results in wrong assumptions and misunderstandings.
I think index pattern is one such thing where the poor choice of naming has caused them to be misused and misunderstood for a long time. The fact that the name collides with another concept only makes this worse. From a personal experience it took a long time for me to grok the fact that it was two separate concepts - and only much later did I understand what they each actually were. Asking people familiar with Kibana I can understand that I'm not the only one, and I've yet to meet someone who said they weren't confused by index patterns - initially at least.

I must admit: I don't know exactly how much work this change requires. I can imagine there is a tonne of places in the product that needs to be changed. On top of that there is documentation. And then we need to re-educate the community. I can completely understand if this feels like too much work for too little value. I've been thinking that for two years - until now :)

Related

@elasticmachine

This comment has been minimized.

Copy link
Contributor

commented Sep 5, 2019

@ogupte

This comment has been minimized.

Copy link
Contributor

commented Sep 5, 2019

This makes a lot of sense. When I started contributing to Kibana, I was very confused about what was meant by the term "index pattern" since it seemed to have multiple meanings based on context. Glad to see that I'm not alone in that thinking.

@mattkime

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2019

There's two sides to this - there's the UI that users interact with and there's the code. Would renaming object index patterns create a mismatch with the user interface?

@sqren

This comment has been minimized.

Copy link
Member Author

commented Sep 6, 2019

Would renaming object index patterns create a mismatch with the user interface?

For this change to be effective we should change it both from a developer perspective and from an end user perspective. I think it's equally confusing for both.

@jasonrhodes

This comment has been minimized.

Copy link
Member

commented Sep 6, 2019

I agree with @sqren that I'm not really sure how difficult this work would be to rename, but overall I'm a huge +1 for clearing up this naming confusion. It makes talking about ES queries from a Kibana context so difficult and comes up in almost every project I've worked on here.

@jasonrhodes

This comment has been minimized.

Copy link
Member

commented Sep 6, 2019

I think this is probably also related to this discussion: #35481

Notably pain point number 2 from that issue:

User must understand difference between indices and index patterns, understand how wildcards work

@monfera

This comment has been minimized.

Copy link
Member

commented Sep 6, 2019

tl;dr from this one person:

  • yes for a new name
  • issues with proposed names
  • there should first be a super clear concept of what the thing is, before naming it, and it's not available yet

I agree with the reasons why the current name is confusing. Also, that names, esp. for such central things, are important to get right. names for central structures are preferably held constant - it helps when people run into docs, forum and SO discussions and Elastic versions over many years. By extension I also agree that now is the best time to discuss misnomers since we haven't yet invented time travel 😄

Re the suggested names: it doesn't feel like a mapping. What is being mapped, from what to what? Also even if it can be seen as a mapping, it might not be the most specific term. We tend to use Mapping for various things already in the entire stack so it has a somewhat diluted meaning.

Maybe someone could define, as briefly yet fully accurately, what these things are, and what goals they serve, or point to an existing one?

The term "object based index pattern" is something Google couldn't find. But our doc did mention about rollup index patterns. Also, the string-based index pattern is not quite equal to "apm-*" if I understand it properly - it is also a representation and configuration of a subset of the fields that are found among the indices. Also, so many things are latched onto index patterns - field formatters, scripted fields, source filters, and field popularity data - that it stops becoming an entity with clear, well-defined meaning and starts to become an operational entity to which various things have been historically linked as functionality grew (gorilla, banana and the entire jungle).

So I'd be in favor of not looking at the name in isolation, as name should be apt; which requires a clear concept and definition; which, if my reading of eg. the above #35481 and my confusion is any hint, isn't yet worked out.

Sticking to just the string pattern based index patterns, there seem to be other potential issues:

  1. They don't have a name - their definition (eg. "apm-*") is their identifier
  2. They are not just a shorthand for a subset of the Elastic store - they're a first class object to which other things eg. field formatters, scripted fields, source filters, and field popularity data are referring to - it feels unusual that if the definition of some kind of subset of the index list is deleted, all these go away
  3. The combination of points 1. and 2. have the consequence that one can't even redefine (amend) what indices are part of an index pattern and what aren't, because their name is the pattern string. It doesn't feel ideal that even slightly changing the set of participating indexes leads to loss of field formatters, scripted fields etc.
@sqren

This comment has been minimized.

Copy link
Member Author

commented Sep 6, 2019

there should first be a super clear concept of what the thing is, before naming it, and it's not available yet

You are completely right, and it was probably not a good idea of me to suggest names this early in the process. The important thing is that we can start the conversation now about what index patterns are, and then afterwards we can discuss alternative names.

it doesn't feel like a mapping. What is being mapped, from what to what?

The reason I suggested "mapping" is because the object based index pattern is similar to elasticsearch's concept of Mapping which is a description of the fields in an index.

the string-based index pattern is not quite equal to "apm-*" if I understand it properly - it is also a representation and configuration of a subset of the fields that are found among the indices.

I'm not sure what you mean here. Isn't that exactly the problem that the term "index pattern" has more than one meaning? I'm proposing that we stop using the term "index pattern" for anything but the string based form.

Also, so many things are latched onto index patterns - field formatters, scripted fields, source filters, and field popularity data - that it stops becoming an entity with clear, well-defined meaning and starts to become an operational entity to which various things have been historically linked as functionality grew (gorilla, banana and the entire jungle).

++ this definitely adds to the confusion. When I realized that it was mostly built as an extension of the field capabilities API it became a little less blurry but there are still a lot of things I do not understand.

So I'd be in favor of not looking at the name in isolation, as name should be apt; which requires a clear concept and definition;

I was hoping for this reaction: "We can't rename it because we don't know what it is". If we make it a goal to find a more appropriate name, and this forces us to define (and perhaps re-define) what Kibana index patterns inherently are, that might be a very good side-effect.

You list some issues with string based index patterns:

  1. They don't have a name - their definition (eg. "apm-*") is their identifier

I don't see how that is a problem

  1. They are not just a shorthand for a subset of the Elastic store - they're a first class object to which other things eg. field formatters, scripted fields, source filters, and field popularity data are referring to - it feels unusual that if the definition of some kind of subset of the index list is deleted, all these go away

Isn't this conflating string based index patterns with object based index patterns again? If we agree that (string based) index patterns are just strings, then we can come up with other abstractions that do what you suggest. For instance rich structures that have stable identifiers orthogonal to the underlying indices that it references via string based index patterns. The important thing is that we don't call this rich structure an "index pattern" like today.

@elasticmachine

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2019

@mattkime

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2019

IMO, this is highly intertwined with #35481

@sqren

This comment has been minimized.

Copy link
Member Author

commented Sep 6, 2019

IMO, this is highly intertwined with #35481

You might be right. I am trying to decrease the scope by not solving all the pain points related to index patterns but just defining what it is an thereafter renaming it to avoid the ambiguity that currently exists.

@monfera

This comment has been minimized.

Copy link
Member

commented Sep 7, 2019

there should first be a super clear concept of what the thing is, before naming it, and it's not available yet

You are completely right, and it was probably not a good idea of me to suggest names this early in the process. The important thing is that we can start the conversation now about what index patterns are, and then afterwards we can discuss alternative names.

I 👍-d your OP as there's a ton of value in thinking about the name concurrently, and feel that continuously keeping in mind the naming problem is a control/constraint that helps us avoid defining a newer, but still fuzzy (underdefined, unintuitive or highly coupled) concept.

it doesn't feel like a mapping. What is being mapped, from what to what?

The reason I suggested "mapping" is because the object based index pattern is similar to elasticsearch's concept of Mapping which is a description of the fields in an index.

Yes I thought so, and appreciate the appeal to analogy or familiarity, though

  • it doesn't feel like an ideal name there either, though that fits in the regular meaning of "mapping" to a larger extent than what we call index patterns (which is a bag of only loosely related concerns, some of them not at all mapping-like)
  • not sure if the analogy/familiarity benefits trump the benefits of meaning-based naming and the avoidance of intermingling it with the ES field mapping
  • maybe I misread it but you seem to imply below that the notion of "index pattern", at least for the string index patterns, should be just the selection of indices via a pattern string eg. "apm-*" - in this case, it's more of a subset of the ES indices, and the word "mapping" doesn't cover it; "index set", or a less precise but more sellable "index group" would (I don't necessarily suggest these names btw.)

the string-based index pattern is not quite equal to "apm-*" if I understand it properly - it is also a representation and configuration of a subset of the fields that are found among the indices.

I'm not sure what you mean here. Isn't that exactly the problem that the term "index pattern" has more than one meaning? I'm proposing that we stop using the term "index pattern" for anything but the string based form.

I agree with what you say here, my note you cite was in response to "The string based index pattern is very aptly named" - it's aptly named only in a superficial sense, precisely because of what you say here, and what I mentioned later in my comment (it's also a hub for a LOT of things eg. scripted fields, besides just being a pattern that matches indices)

Also, so many things are latched onto index patterns - field formatters, scripted fields, source filters, and field popularity data - that it stops becoming an entity with clear, well-defined meaning and starts to become an operational entity to which various things have been historically linked as functionality grew (gorilla, banana and the entire jungle).

++ this definitely adds to the confusion. When I realized that it was mostly built as an extension of the field capabilities API it became a little less blurry but there are still a lot of things I do not understand.

Haha I looked at index patterns in terms of what they currently are (rather than assuming that first it was just proper matching patterns only and then more functions became coupled to it) and thought of it as a non-separable part of the confusion, rather than an add-on aspect :-) So to me, what latches on is an integral part of the problem, even if the solution might be the very uncoupling (or cohesion/integration) of these things.

So I'd be in favor of not looking at the name in isolation, as name should be apt; which requires a clear concept and definition;

I was hoping for this reaction: "We can't rename it because we don't know what it is".

Indeed, looks like others' and my responses are in effect, what you expected :-)

If we make it a goal to find a more appropriate name, and this forces us to define (and perhaps re-define) what Kibana index patterns inherently are, that might be a very good side-effect.

💯

You list some issues with string based index patterns:

  1. They don't have a name - their definition (eg. "apm-*") is their identifier

I don't see how that is a problem

I tried to address it - it couples the name (and id/reference) with the content, which is a constraint that has no benefit that I know of, but excludes valid use cases, as a "schema" evolves over time:

  1. You can't change the name. Maybe even initially, you can come up with, and want to use a name more descriptive than "apm-*" or "some-techy-index-name-*; you might even want to refine or change the name later
  2. You can't leave the name in place while changing the set of indices it covers (eg. a new index is added and you want to cover that; why should the name change if the meaning of the thing didn't change
  3. You mention about the "object" index patterns, for which no kind of such pattern names would be suitable anyway, yet it's good to have one naming concept, rather than a separate one, depending on the type of the thing (it feels it's in the spirit of your post also)
  4. The "index pattern" is currently not just a subset (selection) of the indices, so it's a misleading name (it makes it feel like it's about a subset of indices, but as we discussed, a lot more things hang on this, eg. scripted fields)
  5. I guess it would be totally legitimate to create two "index patterns" of the same index subsetting (eg. "apm-*") while naming them differently and hanging different things onto them (such as scripted fields), or just selecting a different subset of fields, or formatting them differently
  6. Even putting aside all the above points, there's something accidental about using the contents as the name just because it's a string an can fit on a line, and it's not just because it can contain stats, commas and other decidedly non-name like things

Also see the linked issue comment where @ruflin mentions about the problem of confounding the name with the content:

"[wishlist] Giving index patterns a name: It's possible to use multiple patterns like filebeat-,metricbeat-,apm-* for one index pattern. It would be nice if then a name like o11y could be given to it instead of showing up like the above in the Discovery UI"

  1. They are not just a shorthand for a subset of the Elastic store - they're a first class object to which other things eg. field formatters, scripted fields, source filters, and field popularity data are referring to - it feels unusual that if the definition of some kind of subset of the index list is deleted, all these go away

Isn't this conflating string based index patterns with object based index patterns again?

I don't think so (although as I mentioned I'm not 100% clear on what you mean by the object based index patters so apologies if I misspeak), you can create a string based index pattern and then go ahead and associate them with field formatters, scripted fields etc. Please correct me if a string based index pattern is just exclusively a subset of the set of indices in ES. Eg. this writes about field formatting, which decidedly does not fit into the concept of "subset of indices" that a pattern like "apm-*" implies. Even the selection of the fields is not much to do with selecting a subset of indices other than the fact that the union of the index fields is offered as the starting position.

If we agree that (string based) index patterns are just strings

As I tried to imperfectly convey, I don't identify with this definition for the current situation, as they are a lot more. Again, it's from the viewpoint of looking at things as they currently are. The specification of the index subset (eg. "apm-*") is but the first step in a sequece of steps that assigns various semi-related definitions either during the creation of the index pattern, or subsequently (eg. scripted fields).

But it's possible, and feels totally appealing, that the rethinking of the concepts leads to a much less tightly coupled set of relations, where indeed, the subsetting of the indices is orthogonal to, or at least properly separated from, the concepts of field selection, field formatting, scripted fields etc.

In this case, I think the meaning still shouldn't be conflated with the means. The meaning (value) is a subset of ES indices. The means of subsetting the ES indices currently is that the user can specify a matcher string. But let's keep these two things separate: it may well be the case in the future that on the Kibana side, or even in ES, we introduce other ways for specifying sets of indices.

then we can come up with other abstractions that do what you suggest. For instance rich structures that have stable identifiers orthogonal to the underlying indices that it references via string based index patterns. The important thing is that we don't call this rich structure an "index pattern" like today.

Totally! I feel like we're pretty much on the same page in all aspects that matter here, and any clarification from me may be due to the fact that you're probably ahead in thinking about how to disassociate (extract out) currently attached things from "index patterns" while I try to avoid implying a solution as your issue (including its title) is about naming the thing, while the linked issue is about what it should stand for.

@monfera

This comment has been minimized.

Copy link
Member

commented Sep 7, 2019

... maybe I'm getting too much into the details here, a video call could be more efficient going forward

@cjcenizal

This comment has been minimized.

Copy link
Contributor

commented Sep 9, 2019

I think this is long overdue @sqren. Thank you for raising this!

As a user, I define a Kibana's index pattern based on its properties and behavior:

  • It has a "name", which happens to be an Elasticsearch index pattern
  • It has a list of fields and their types, which is an aggregation of the Elasticsearch mappings of all indices captured by the ES index pattern
  • It has a list of scripted fields
  • It has a list of source filters
  • It provides the data that's consumed/visualized by a few of the core original apps in Kibana

When I look at this list of properties and behaviors, the one essential characteristic that jumps out at me is the last one. These apps depend on the Kibana index pattern to provide the data that they consume and visualize. And "search" is the mechanism for retrieving data. Because this thing is the source of this search capability, I tentatively offer "search source" as a name.

"Search source" seems to me that it would be specific enough to be distinct from other terms and support SEO. It also seems like it would be flexible enough to allow us to add new capabilities to it over time without requiring a name change. We already support a special type of rollup index pattern for searching rollup data; this would become a "rollup search source". Maybe we'll add other types of data in the future, and the term "search source" would accommodate that.

One minor concern is that "search source" is already a term used in the Courier code, but that shouldn't be difficult to rename and I think we're moving away from using Courier in general anyway.

@sqren

This comment has been minimized.

Copy link
Member Author

commented Sep 9, 2019

maybe we're getting quite a bit into the details here, perhaps a zoom call among those involved could be more efficient going forward

Yeah, we went over quite a few things. A zoom call sounds like a good way to move forward. I'll set something up.

I tentatively offer "search source" as a name

I can definitely see "search source" work as a name. It is a little generic/vague which might both be an advantage (it's easy to evolve the concept over time without another rename) however it might still be difficult to grasp for new-comers. Either way it's much better than status quo since it removes the ambiguity around "index pattern". So I'm tentatively +1 :)

@ruflin

This comment has been minimized.

Copy link
Contributor

commented Sep 9, 2019

As this is going in the direction of "rethinking" Index Patterns I would like to throw this ES issue in here: elastic/elasticsearch#33267 It might be that some of the data above actually belongs in Elasticsearch directly which could simplify things and we need less names.

@monfera

This comment has been minimized.

Copy link
Member

commented Sep 9, 2019

Good call @ruflin—the field/dimension metadata question also came up when discussing future Lens evolution, I wonder if there are other issues to cross-link @AlonaNadler @cchaos ?

@monfera

This comment has been minimized.

Copy link
Member

commented Sep 9, 2019

I like the name "search source" as long as we want to bake in the word "search" (it feels like a response to eg. a SQL query can be seen as search, but maybe it's a slight broadening of the concept, still perfectly OK though). As a non-native speaker it's a bit hard for me to pronounce it, any opinion on "data source"? Vega and other tools use it.

Related: it's not just pointing to source data; it's also an augmentation of source data with metadata that ES doesn't manage as part of indices or related structures. In a more traditional, eg. relational database, many of the schema info (metadata) would be an integral part of the database itself.

An interesting question is whether everything that is needed is ideally resided within Kibana, or would better belong into ES. So it's worth checking if the current boundaries between ES and Kibana are considered optimal.

I assume it could work such that calculated fields ("scripted" fields), field formatters etc. could be shared among data / search sources.

@sqren

This comment has been minimized.

Copy link
Member Author

commented Sep 17, 2019

@ruflin Moving the concept of index patterns from Kibana to ES would be a great way to make it more consistent across the stack 👍

@sqren

This comment has been minimized.

Copy link
Member Author

commented Sep 17, 2019

FYI: I've created an item on the Dev Calendar for discussing index patterns. Friday at 9AM EDT / 15.00 CEST.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.