Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System Indices #50251

Open
17 of 23 tasks
Tracked by #104987
jaymode opened this issue Dec 16, 2019 · 12 comments
Open
17 of 23 tasks
Tracked by #104987

System Indices #50251

jaymode opened this issue Dec 16, 2019 · 12 comments
Labels
:Core/Infra/Core Core issues without another label Meta Team:Core/Infra Meta label for core/infra team Top Ask

Comments

@jaymode
Copy link
Member

jaymode commented Dec 16, 2019

This issue serves as an overarching issue for what we call System Indices and
details our plan to implement this feature.

As Elasticsearch and the rest of the stack has evolved, there is an increasing
reliance on indices as an implementation detail of features. Certain features
are critical, like Security, and this implementation detail should not be
necessary for a user to know or think about. Unfortunately, this implementation
detail has been exposed to users in various ways.

In general, most of the indices that would be considered system indices have
names that start with a . and are referred to as dot indices. This is modeled
after dot files to suggest that the index is an implementation detail that
should not be interacted with. Searches over all indices, such as those done by
Kibana by default, include dot indices, while to a user all indices means all
indices they created. Snapshots include dot indices by default, but restoring
dot indices does not work since the intended index already exists and
should be wholly replaced. And upgrades of dot indices are managed with
complicated aliases and versioned names, where each dot index has its own
versioning and upgrade logic that must be run externally.

Dot indices will be superseded by system indices that are registered through
a module or plugin. A special plugin type will be added that defines the system
indices that the plugin provides along with the mappings and APIs necessary to
access these indices. The existing search API (and other data APIs) will not be
able to search a system index. APIs used for monitoring and troubleshooting will
continue to operate as they do today and include system indices in their output.

There are some dot indices that do not necessarily fit the mold of a normal
system index; instead they store data that the system produces with the intent
that users can also query against this data. For these indices we will move to
adding a hidden index that will not be resolved by default for wildcards.
These indices can be specifically requested or an IndicesOption can be specified
so that these indices are not ignored during wildcard resolution. An index may
only be marked as hidden at creation time and will be done so through the use
of an index setting.

Hidden indices will not may have names that are prefixed by a dot. Hidden indices will not
inherit from a global index template.

In order to facilitate the migration from accessing dot indices directly the following
phased approach will be taken.

The initial work is the introduction of the infrastructure to define hidden indices and
system index plugins with dedicated APIs.

Tasks

The above steps will get us to a point where other teams can begin migrating from direct access to their system index and make use of the new APIs. The majority of the internal (to Elasticsearch) work will still need to be implemented.

System Index Patterns

IngestGeoIpPlugin

  • .geoip_databases

KibanaPlugin

  • .kibana_* (alias: .kibana)
  • .reporting-*
  • .apm-agent-configuration
  • .apm-custom-link

AsyncResultsIndexPlugin

  • .async-search

EnrichPlugin

  • .enrich-*

Fleet

  • .fleet-actions~(-results*) (alias: .fleet-actions)
  • .fleet-agents* (alias: .fleet-agents)
  • .fleet-enrollment-api-keys* (alias: .fleet-enrollment-api-keys)
  • .fleet-policies-[0-9]+* (alias: .fleet-policies)
  • .fleet-policies-leader* (alias: .fleet-policies-leader)
  • .fleet-servers* (alias: .fleet-servers)
  • .fleet-artifacts* (alias: .fleet-artifacts)
  • .fleet-actions-results

Logstash

  • .logstash

MachineLearning

  • .ml-meta*
  • .ml-config*
  • .ml-inference-*
    [associated indices]
  • .ml-anomalies-*
  • .ml-state*
  • .ml-stats-*
  • .ml-notifications*
  • .ml-annotations*

SearchableSnapshots

  • .snapshot-blob-cache

Security

  • .security-[0-9]+ (alias: .security)
  • .security-tokens-[0-9]+ (alias: .security-tokens)

Transforms

  • .transform-internal-*
    [associated indices]
  • .transform-notifications-*

Watcher

  • .watches*
  • .triggered_watches*

APIs allowed to access system indices

The APIs that we currently plan to allow system index access for are:

  • GET _cluster/health
  • GET {index}/_recovery
  • GET _cluster/allocation/explain
  • GET _cluster/state
  • POST _cluster/reroute
  • GET {index}/_stats
  • GET {index}/_segments
  • GET {index}/_shard_stores
  • GET _cat/[aliases,indices,health,recovery,shards,segments]
@jaymode jaymode added :Core/Infra/Core Core issues without another label Meta labels Dec 16, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Core)

@jaymode jaymode mentioned this issue Dec 20, 2019
8 tasks
jaymode added a commit that referenced this issue Jan 17, 2020
This change introduces a new feature for indices so that they can be
hidden from wildcard expansion. The feature is referred to as hidden
indices. An index can be marked hidden through the use of an index
setting, `index.hidden`, at creation time. One primary use case for
this feature is to have a construct that fits indices that are created
by the stack that contain data used for display to the user and/or
intended for querying by the user. The desire to keep them hidden is
to avoid confusing users when searching all of the data they have
indexed and getting results returned from indices created by the
system.

Hidden indices have the following properties:
* API calls for all indices (empty indices array, _all, or *) will not
  return hidden indices by default.
* Wildcard expansion will not return hidden indices by default unless
  the wildcard pattern begins with a `.`. This behavior is similar to
  shell expansion of wildcards.
* REST API calls can enable the expansion of wildcards to hidden
  indices with the `expand_wildcards` parameter. To expand wildcards
  to hidden indices, use the value `hidden` in conjunction with `open`
  and/or `closed`.
* Creation of a hidden index will ignore global index templates. A
  global index template is one with a match-all pattern.
* Index templates can make an index hidden, with the exception of a
  global index template.
* Accessing a hidden index directly requires no additional parameters.

Relates #50251
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this issue Jan 23, 2020
This change introduces a new feature for indices so that they can be
hidden from wildcard expansion. The feature is referred to as hidden
indices. An index can be marked hidden through the use of an index
setting, `index.hidden`, at creation time. One primary use case for
this feature is to have a construct that fits indices that are created
by the stack that contain data used for display to the user and/or
intended for querying by the user. The desire to keep them hidden is
to avoid confusing users when searching all of the data they have
indexed and getting results returned from indices created by the
system.

Hidden indices have the following properties:
* API calls for all indices (empty indices array, _all, or *) will not
  return hidden indices by default.
* Wildcard expansion will not return hidden indices by default unless
  the wildcard pattern begins with a `.`. This behavior is similar to
  shell expansion of wildcards.
* REST API calls can enable the expansion of wildcards to hidden
  indices with the `expand_wildcards` parameter. To expand wildcards
  to hidden indices, use the value `hidden` in conjunction with `open`
  and/or `closed`.
* Creation of a hidden index will ignore global index templates. A
  global index template is one with a match-all pattern.
* Index templates can make an index hidden, with the exception of a
  global index template.
* Accessing a hidden index directly requires no additional parameters.

Relates elastic#50251
jaymode added a commit to jaymode/elasticsearch that referenced this issue Feb 28, 2020
This commit updates the template used for watch history indices with
hidden index setting so that new indices will be created as hidden.

Additionally, some failing tests were encountered where a search would
find the documents in the history index but a subsequent search would
fail to find the documents. This is most likely due to different
refresh times between the primary and replica shards. The failures
were resolved by using an assertsBusy to retrieve the documents.

Relates elastic#50251
jaymode added a commit that referenced this issue Feb 28, 2020
This commit updates the template used for watch history indices with
the hidden index setting so that new indices will be created as hidden.

Relates #50251
jaymode added a commit to jaymode/elasticsearch that referenced this issue Feb 28, 2020
This commit updates the template used for watch history indices with
the hidden index setting so that new indices will be created as hidden.

Relates elastic#50251
jaymode added a commit that referenced this issue Mar 6, 2020
This commit updates the template used for watch history indices with
the hidden index setting so that new indices will be created as hidden.

Relates #50251
Backport of #52962
@rjernst rjernst added the Team:Core/Infra Meta label for core/infra team label May 4, 2020
jaymode added a commit to jaymode/elasticsearch that referenced this issue Aug 10, 2020
This commit introduces a new thread pool, `system_read`, which is
intended for use by system indices for all read operations (get and
search). The `system_read` pool is a fixed thread pool with a maximum
number of threads equal to lesser of half of the available processors
or 5. Given the combination of both get and read operations in this
thread pool, the queue size has been set to 2000. The motivation for
this change is to allow system read operations to be serviced in spite
of the number of user searches.

In order to avoid a significant performance hit due to pattern matching
on all search requests, a new metadata flag is added to mark indices
as system or non-system. Previously created system indices will have
flag added to their metadata upon upgrade to a version with this
capability.

Additionally, this change also introduces a new class, `SystemIndices`,
which encapsulates logic around system indices. Currently, the class
provides a method to check if an index is a system index and a method
to find a matching index descriptor given the name of an index.

Relates elastic#50251
Relates elastic#37867
jaymode added a commit that referenced this issue Aug 10, 2020
This commit introduces a new thread pool, `system_read`, which is
intended for use by system indices for all read operations (get and
search). The `system_read` pool is a fixed thread pool with a maximum
number of threads equal to lesser of half of the available processors
or 5. Given the combination of both get and read operations in this
thread pool, the queue size has been set to 2000. The motivation for
this change is to allow system read operations to be serviced in spite
of the number of user searches.

In order to avoid a significant performance hit due to pattern matching
on all search requests, a new metadata flag is added to mark indices
as system or non-system. Previously created system indices will have
flag added to their metadata upon upgrade to a version with this
capability.

Additionally, this change also introduces a new class, `SystemIndices`,
which encapsulates logic around system indices. Currently, the class
provides a method to check if an index is a system index and a method
to find a matching index descriptor given the name of an index.

Relates #50251
Relates #37867
@nemhods
Copy link

nemhods commented Jun 16, 2021

Hey, i just got to the bottom of a problem that prevented Fleet Server from working, and it comes back to this issue.
I had an index template set up like
{ "index_patterns": [ "*" ], "order": 10, "settings": { "number_of_shards": "2", "number_of_replicas": "1" } } }

... which also affected the system indices .fleet-actions and .fleet-policies. This resulted in the fleet server not working, logging lines like:

{"log.level":"info","index":".fleet-actions","ctx":"index monitor","error.message":"elastic fail 400:status_exception:wait_for_advance only supports indices with one shard. [shard count: 2]","@timestamp":"2021-06-13T16:38:14.423Z","message":"failed on waiting for global checkpoints advance"}

I manually repaired this by shrinking the affected indices down to one shard again, as well as changing my custom template. This wouldn't have happened if my custom wildcard template was prevented from tampering with system indices.

More here: https://discuss.elastic.co/t/osquery-live-queries-dont-go-through/274428/5

@williamrandolph
Copy link
Contributor

@nemhods Thanks for reporting this issue with Fleet Server and system indexes. That was impressive detective work, helpfully explained, and we appreciate it a lot. I've created a new issue for some targeted work around templates and system indices: #74271

@hungnguyen-elastic
Copy link

What is the plan for the .ds-* data stream indices? They are not really system indices but when users try to delete the index that is part of a data stream, Kibana gives this error which seems more scary than it really is

image

@gwbrown
Copy link
Contributor

gwbrown commented Jul 16, 2021

@hungnguyen-elastic That seems like a Kibana bug. Data stream backing indices are hidden indices, and have been since their introduction.

@hungnguyen-elastic
Copy link

hungnguyen-elastic commented Jul 16, 2021

Thanks @gwbrown - ticket #106040 was created for this

Fyi @nicpenning and @nmprokop

@stefnestor
Copy link
Contributor

stefnestor commented Apr 23, 2022

Had pasted reformatted table of system indices. It's not "official" in any sense. Removing to sub-drop-down to avoid confusing folks googling.

before

👋🏼 This Github appears to be a popular system index reference point / collection, so expanding known indices/aliases from description list (still hugely WIP):

feature alias index
elastic endgame endgame-${version} endgame-${version}-*
elasticsearch enrich geoip NULL .geoip_databases
elasticsearch enrich UNK .enrich-*
elasticsearch index ilm ilm-history-${#} ilm-history*
elasticsearch index search async NULL .async-search
elasticsearch index searchable snapshot NULL .snapshot-blob-cache
elasticsearch ingest transform .data-frame-internal-${#} .transform-internal-*
elasticsearch ingest transform .transform-notifications-read .transform-notifications-*
elasticsearch node (task/threadPool?) NULL .tasks (StackOverFlow) at least v5.0 through v17.0
elasticsearch security .security-tokens .security-tokens-[0-9]+
elasticsearch security .slm-history-${#} .security-[0-9]+
elasticsearch snapshot slm .slm-history-${#} .slm-history*
elasticsearch watcher UNK .triggered_watches*
elasticsearch watcher UNK .watches*
kibana analytic machine learning UNK .ml-annotations*
kibana analytic machine learning UNK .ml-anomalies-*
kibana analytic machine learning UNK .ml-config*
kibana analytic machine learning UNK .ml-inference-*
kibana analytic machine learning UNK .ml-meta*
kibana analytic machine learning UNK .ml-notifications*
kibana analytic machine learning UNK .ml-state*
kibana analytic machine learning UNK .ml-stats-*
kibana entSearch UNK .ent-search*
kibana fleet .fleet-actions .fleet-actions~(-results*)
kibana fleet .fleet-agents .fleet-agents*
kibana fleet .fleet-artifacts .fleet-artifacts*
kibana fleet .fleet-enrollment-api-keys .fleet-enrollment-api-keys*
kibana fleet .fleet-policies-leader .fleet-policies-leader*
kibana fleet .fleet-policies .fleet-policies-[0-9]+*
kibana fleet .fleet-servers .fleet-servers*
kibana fleet UNK .fleet-actions-results
kibana observ apm UNK .apm-agent-configuration
kibana observ apm UNK .apm-custom-link
kibana reporting UNK .reporting-*
kibana security NULL .kibana_security_session*
kibana siem .alerts-security.alerts-${space} .alerts-security.alerts-*, .siem-signals-${space}-*
kibana siem .items-${space} .items-${space}-*
kibana siem .lists-${space} .lists-${space}-*
kibana siem UNK .alerts-observability
kibana .kibana-event-log-${version} .kibana-event-log-${version}-*
kibana .kibana_task_manager .kibana_task_manager_${version}_*
kibana .kibana .kibana_${version}_*
logstash UNK .logstash
UNK UNK .metrics-endpoint

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@krishhteja
Copy link

Once everything is migrated and access system indices is removed, is it possible to access to system indices with some elevated permissions or using rules? or is it not possible at all?

@williamrandolph
Copy link
Contributor

@krishhteja There are still some back doors for superuser access to system indices, but our intention is that any management of or interaction with a system index can be performed via feature-specific REST APIs. See, for example, the PUT /_security/settings API. If there's something you need to be able to do but we don't have an API for it, we can consider a feature request, though you may want to raise your issue in our discuss forums first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label Meta Team:Core/Infra Meta label for core/infra team Top Ask
Projects
None yet
Development

No branches or pull requests