Upgrade assistant should warn of incompatible system indices settings when migrating from 7 to 8 (the index will become red) #88324

lucabelluccini · 2022-07-06T18:19:38Z

Elasticsearch Version

7.x, 8.x

Installed Plugins

No response

Java Version

bundled

OS Version

N/A

Problem Description

System indices do not accept several settings to be overridden in 8.x - only few are allowed at the time of writing:

index.blocks.read_only
index.blocks.read
index.blocks.write
index.blocks.metadata
index.blocks.read_only_allow_delete

When migrating from 7.x to 8.x, it can happen that for some reason a system index (e.g. .security-7) has some setting which is allowed in 7.x but the shards fail to be allocated while upgrading to 8.x

Steps to Reproduce

Create a 7.17.5 cluster.
Perform:

PUT .security-7/_settings
{
  "index.search.slowlog.threshold.query.warn": "10s",
  "index.search.slowlog.threshold.query.info": "5s",
  "index.search.slowlog.threshold.query.debug": "2s",
  "index.search.slowlog.threshold.query.trace": "500ms",
  "index.search.slowlog.threshold.fetch.warn": "1s",
  "index.search.slowlog.threshold.fetch.info": "800ms",
  "index.search.slowlog.threshold.fetch.debug": "500ms",
  "index.search.slowlog.threshold.fetch.trace": "200ms",
  "index.search.slowlog.level": "info"
}

Go to the Upgrade Assistant - All good
Upgrade to 8.3.1
The upgrade will at a given point trigger a cluster red state.
The cluster allocation explain will be:

{
  "can_allocate": "yes",
  "index": ".security-7",
  "target_node": {
    "attributes": {
      "server_name": "instance-0000000001.018d26f51e90476dac2a56befc09ccfc",
      "availability_zone": "us-central1-b",
      "region": "unknown-region",
      "instance_configuration": "gcp.es.datahot.n2.68x10x45",
      "xpack.installed": "true",
      "logical_availability_zone": "zone-1",
      "data": "hot"
    },
    "transport_address": "10.42.4.59:19836",
    "id": "iy10uNC9QRCJVc9xcRkJsg",
    "name": "instance-0000000001"
  },
  "node_allocation_decisions": [
    {
      "node_decision": "yes",
      "transport_address": "10.42.6.132:19611",
      "node_name": "instance-0000000000",
      "node_id": "LiMkJ1k9QUWqa-j972KGmw",
      "store": {
        "in_sync": true,
        "allocation_id": "aK-9wh3OQgq2RoyyRfsPaQ"
      },
      "node_attributes": {
        "server_name": "instance-0000000000.018d26f51e90476dac2a56befc09ccfc",
        "availability_zone": "us-central1-c",
        "region": "unknown-region",
        "instance_configuration": "gcp.es.datahot.n2.68x10x45",
        "xpack.installed": "true",
        "logical_availability_zone": "zone-0",
        "data": "hot"
      }
    },
    {
      "node_decision": "yes",
      "transport_address": "10.42.4.59:19836",
      "node_name": "instance-0000000001",
      "node_id": "iy10uNC9QRCJVc9xcRkJsg",
      "store": {
        "in_sync": true,
        "allocation_id": "BcMGDowKQlKcm9TbEEiaHw"
      },
      "node_attributes": {
        "server_name": "instance-0000000001.018d26f51e90476dac2a56befc09ccfc",
        "availability_zone": "us-central1-b",
        "region": "unknown-region",
        "instance_configuration": "gcp.es.datahot.n2.68x10x45",
        "xpack.installed": "true",
        "logical_availability_zone": "zone-1",
        "data": "hot"
      }
    }
  ],
  "allocation_id": "BcMGDowKQlKcm9TbEEiaHw",
  "current_state": "unassigned",
  "shard": 0,
  "primary": true,
  "note": "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
  "allocate_explanation": "Elasticsearch can allocate the shard.",
  "unassigned_info": {
    "last_allocation_status": "no",
    "reason": "ALLOCATION_FAILED",
    "failed_allocation_attempts": 5,
    "at": "2022-07-06T17:27:45.267Z",
    "details": "failed shard on node [iy10uNC9QRCJVc9xcRkJsg]: failed to create index, failure java.lang.IllegalArgumentException: unknown setting [index.search.slowlog.level] please check that any required plugins are installed, or check the breaking changes documentation for removed settings\n\tat org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:563)\n\tat org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:509)\n\tat org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:479)\n\tat org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:688)\n\tat org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:607)\n\tat org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:177)\n\tat org.elasticsearch.indices.cluster.IndicesClusterStateService.createIndices(IndicesClusterStateService.java:505)\n\tat org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:232)\n\tat org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:545)\n\tat org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:531)\n\tat org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:504)\n\tat org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:429)\n\tat org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:155)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:710)\n\tat org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:260)\n\tat org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:223)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.lang.Thread.run(Thread.java:833)\n"
  }
}

Recovering from this situation would require having a role with "allow_restricted_indices": true and using a user from the file realm (as the native realm is unavailable due to .security index being red).

But the API request below, executed with a role having "allow_restricted_indices": true:

PUT .security-7/_settings
{
    "index": {
        "search": {
            "slowlog": null
        }
    }
}

Is still rejected again with:

{
  "status": 403,
  "error": {
    "root_cause": [
      {
        "reason": "action [indices:admin/settings/update] is unauthorized for user [elastic...] with roles [found-internal-admin,superuser] on restricted indices [.security-7], this action is granted by the index privileges [manage,all]",
        "type": "security_exception"
      }
    ],
    "type": "security_exception",
    "reason": "action [indices:admin/settings/update] is unauthorized for user [elastic...] with roles [found-internal-admin,superuser] on restricted indices [.security-7], this action is granted by the index privileges [manage,all]"
  }
}

Trying with:

POST _snapshot/found-snapshots/cloud-snapshot-2022.07.06-sormf1rgr7iq3v8hmysqzg/_restore
{
  "include_global_state": false,
  "feature_states": ["none"],
  "indices": ".security-7",
    "index_settings": {
    "index.search.slowlog": null
  }
}

We get:

{
  "status": 400,
  "error": {
    "root_cause": [
      {
        "reason": "requested system indices [.security-7], but system indices can only be restored as part of a feature state",
        "type": "illegal_argument_exception"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "requested system indices [.security-7], but system indices can only be restored as part of a feature state"
  }
}

Trying with:

POST _snapshot/found-snapshots/cloud-snapshot-2022.07.06-sormf1rgr7iq3v8hmysqzg/_restore
{
  "indices": "-*",
  "include_global_state": false,
  "feature_states": ["security"],
  "index_settings": {
    "index.search.slowlog": null
  }
}

The request is acknowledged.

The index becomes green:

green open .security-7                     8Yt___ZlRg6_mZBAlmQBdQ 1 1 122  79   1.1mb 458.4kb

The settings are still there!

{
  ".security-7": {
    "settings": {
      "index": {
        "provided_name": ".security-7",
        "number_of_replicas": "1",
        "search": {
          "slowlog": {
            "level": "debug",
            "threshold": {
              "query": {
                "warn": "10s",
                "debug": "2s",
                "info": "5s",
                "trace": "500ms"
              },
              "fetch": {
                "warn": "1s",
                "debug": "500ms",
                "info": "800ms",
                "trace": "200ms"
              }
            }
          }
        },
        ...
      "archived": {
        "index": {
          "search": {
            "slowlog": {
              "level": "info"
            }
          }
        }
      }

Executing:

PUT .security-7/_settings?flat_settings=true
{
    "index.search.slowlog.*": null
}

We get:

{
  "status": 400,
  "error": {
    "suppressed": [
      {
        "reason": "unknown setting [archived.index.search.slowlog] please check that any required plugins are installed, or check the breaking changes documentation for removed settings",
        "type": "illegal_argument_exception"
      }
    ],
    "root_cause": [
      {
        "reason": "unknown setting [archived.index.search.slowlog.level] please check that any required plugins are installed, or check the breaking changes documentation for removed settings",
        "type": "illegal_argument_exception"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "unknown setting [archived.index.search.slowlog.level] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"
  }
}

With:

PUT .security*/_settings
{
    "index.slowlog.*": null
}

We get the acknowledge, but the index has still the broken settings.

Logs (if relevant)

No response

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-07-06T18:19:41Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

grcevski · 2022-07-20T16:25:08Z

We had a discussion on this at the core/infra meeting and there are few follow-up bugs/issues we need to resolve here:

The upgrade assistant should've caught this and it didn't. [Confirmed system indices are ignored by upgrade assistant because users cannot affect them]
Archived settings are useless on system indices and we should be simply removing them on startup, instead of archiving. [Done via https://github.com/Delete invalid settings for system indices #88903]
Archiving filtered settings (before secure settings) can cause issues such as customer password exposed and we need to fix this somehow.
Preventing index setting and cluster settings updates in presence of archived settings is not the best way to warn users that they need to fix something in their setup. Instead of blocking updates to these settings, we should leverage the new health API to bring the problems front and centre.

grcevski · 2022-07-27T00:32:57Z

I did some debugging on why the upgrade assistant didn't warn us on these deprecated options on a system index, and it turns out that Elasticsearch correctly reports the critical deprecation, however the following code in upgrade assistant ignores deprecations on system indices:

https://github.com/elastic/kibana/blob/1bfeab7553899efcfa9a6e46b37dc3c7681dcf3b/x-pack/plugins/upgrade_assistant/server/lib/es_deprecations_status.ts#L33

We correctly reported in Elasticsearch:

"index_settings": {
    ".security-7": [
      {
        "level": "critical",
        "message": "Setting [index.indexing.slowlog.level] is deprecated",
        "url": "https://ela.st/es-deprecation-7-slowlog-settings",
        "details": "Remove the [index.indexing.slowlog.level] setting. Use the [index.*.slowlog.threshold] settings to set the log levels.",
        "resolve_during_rolling_upgrade": false,
        "_meta": {
          "actions": [
            {
              "action_type": "remove_settings",
              "objects": [
                "index.indexing.slowlog.level"
              ]
            }
          ]
        }
      }
    ]
  }

lucabelluccini · 2022-07-27T08:59:34Z

That's great @grcevski - I'm sorry I've not tried to call the API on ES side when reproducing.

My biggest concern here is allowing the node to start and lead to a red index and we have no way to fix the settings once the index is migrated. Is there something I didn't try to do in the reproduction which would allow a user to remove the problematic settings if they've already upgraded?

grcevski · 2022-07-27T14:00:50Z

Oh no problem @lucabelluccini, I was just mentioning what I had found. We'll need to fix this one way or another. It seems that the upgrade assistant code expects that all problems related to system indices will be mentioned in the system index migration section, while these particular ones are generic for all indices and we report them in the normal deprecations. I'll have a team discussion on this to see what's the best way to fix this.

BBQigniter · 2022-09-20T10:00:40Z

found this issue too late :(

Edit:

Some more details - I stumbled into this issue today on our staging cluster which is running on Kubernetes/ECK. The upgrade worked pretty well until only a none-data holding master node-pod and the last hot node-pod with version 7.17.6 was left where the indices with the index.indexing.slowlog.level-setting where moved in the meantime. From there on the upgrade procedure stopped and I had 10 hidden indices that were in yellow-state.

After thorough consultation with the marvelous Elastic-support ( <3 ) I was told to do a "full-cluster-restart" - I just stopped event ingestion via some logstash-pods by scaling them to 0, set cluster.routing.allocation.enable to primaries only and then deleted ALL elasticsearch-pods via kubectl at once. The pods just will be recreated.

Magically the Elasticsearch-cluster fixed itself (as always) in a few minutes and then I removed the cluster.routing.allocation.enable setting again.

igorwwwwwwwwwwwwwwwwwwww · 2023-07-20T13:52:35Z

We ran into this at @GitLab as well during the ES 7 => 8 upgrade.

lucabelluccini added >bug :Core/Infra/Core Core issues without another label needs:triage Requires assignment of a team area label labels Jul 6, 2022

elasticmachine added the Team:Core/Infra Meta label for core/infra team label Jul 6, 2022

lucabelluccini changed the title ~~Upgrade assistant should warn of incompatible system indices settings when migrating from 7 to 8~~ Upgrade assistant should warn of incompatible system indices settings when migrating from 7 to 8 (the index will become red) Jul 6, 2022

grcevski added team-discuss and removed needs:triage Requires assignment of a team area label labels Jul 6, 2022

grcevski self-assigned this Jul 20, 2022

grcevski removed the team-discuss label Jul 20, 2022

grcevski mentioned this issue Jul 28, 2022

Delete invalid settings for system indices #88903

Merged

grcevski removed their assignment Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade assistant should warn of incompatible system indices settings when migrating from 7 to 8 (the index will become red) #88324

Upgrade assistant should warn of incompatible system indices settings when migrating from 7 to 8 (the index will become red) #88324

lucabelluccini commented Jul 6, 2022 •

edited

Loading

elasticmachine commented Jul 6, 2022

grcevski commented Jul 20, 2022 •

edited

Loading

grcevski commented Jul 27, 2022

lucabelluccini commented Jul 27, 2022

grcevski commented Jul 27, 2022

BBQigniter commented Sep 20, 2022 •

edited

Loading

igorwwwwwwwwwwwwwwwwwwww commented Jul 20, 2023

Upgrade assistant should warn of incompatible system indices settings when migrating from 7 to 8 (the index will become red) #88324

Upgrade assistant should warn of incompatible system indices settings when migrating from 7 to 8 (the index will become red) #88324

Comments

lucabelluccini commented Jul 6, 2022 • edited Loading

Elasticsearch Version

Installed Plugins

Java Version

OS Version

Problem Description

Steps to Reproduce

Logs (if relevant)

elasticmachine commented Jul 6, 2022

grcevski commented Jul 20, 2022 • edited Loading

grcevski commented Jul 27, 2022

lucabelluccini commented Jul 27, 2022

grcevski commented Jul 27, 2022

BBQigniter commented Sep 20, 2022 • edited Loading

igorwwwwwwwwwwwwwwwwwwww commented Jul 20, 2023

lucabelluccini commented Jul 6, 2022 •

edited

Loading

grcevski commented Jul 20, 2022 •

edited

Loading

BBQigniter commented Sep 20, 2022 •

edited

Loading