New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kibana stays read only when ES high disk watermark has been exceeded and later gone beneath the limit #13685

Closed
algestam opened this Issue Aug 24, 2017 · 21 comments

Comments

Projects
None yet
@algestam
Copy link

algestam commented Aug 24, 2017

Kibana version: 6.0.0-beta1

Elasticsearch version: 6.0.0-beta1

Server OS version: Ubuntu 16.04.2 LTS

Browser version: Chrome 60.0.3112.90

Browser OS version: Windows 10

Original install method (e.g. download page, yum, from source, etc.): Official tar.gz packages

Description of the problem including expected versus actual behavior:

I'm running a single node Elasticsearch instance, logstash and Kibana. Everything runs on the same host in separate docker containers.

If the high disk watermark is exceeded on the ES host, the following is logged in the elasticsearch log:

[2017-08-24T07:45:11,757][INFO ][o.e.c.r.a.DiskThresholdMonitor] [CSOifAr] rerouting shards: [high disk watermark exceeded on one or more nodes]
[2017-08-24T07:45:41,760][WARN ][o.e.c.r.a.DiskThresholdMonitor] [CSOifAr] flood stage disk watermark [95%] exceeded on [CSOifArqQK-7PBZM_keNoA][CSOifAr][/data/elasticsearch/nodes/0] free: 693.8mb[2.1%], all indice
s on this node will marked read-only

When this has occured, changes to the .kibana index will of course fail as the index cannot be written to. This can be observed by trying to change any setting under Management->Advanced Settings where a change to i.e. search:queryLanguage fails with the message Config: Error 403 Forbidden: blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];

index_read_only

If more disk space now is made available, ES will log that the node has gone under the high watermark:

[2017-08-24T07:47:11,774][INFO ][o.e.c.r.a.DiskThresholdMonitor] [CSOifAr] rerouting shards: [one or more nodes has gone under the high or low watermark]

One would now assume that it would be possible to make changes to Kibana settings but trying to make a settings change still fails with the error message:

Config: Error 403 Forbidden: blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];

Steps to reproduce:

  1. Make sure that setting changes can be performed without errors
  2. Fill up the elasticsearch data disk so that the high disk watermark is exceeded (I used fallocate -l9G largefile)
  3. Verify in the ES log that the high disk watermark has been exceeded and the indices has been marked read-only
  4. Perform a setting change and verify that it fails since writes are prohibited
  5. Resolve the high disk watermark condition (which I did with rm largefile)
  6. Verify that the ES log states that the node has gone under the high disk watermark (and thus should be possible to write to?)
  7. Perform a setting change and it will fail when it actually should succeed.
@scaarup

This comment has been minimized.

Copy link

scaarup commented Sep 29, 2017

So how do I recover from this? .kibana stays in read only no matter what I do. I have tried to snapshot it, delete it and recover it from snapshot - still read only...

@darkpixel

This comment has been minimized.

Copy link

darkpixel commented Nov 20, 2017

I just ran into this on a test machine. For the life of me I can't continue putting data in to the cluster. I finally had to blow away all the involved indices.

@sz3n

This comment has been minimized.

Copy link

sz3n commented Nov 20, 2017

i resolved the issue by deleting the .kibana index:
delete /.kibana/
I lose certains configurations/visualizations/dashboards but it dislocked.

@xose

This comment has been minimized.

Copy link

xose commented Nov 20, 2017

I just got hit by this. It's not just Kibana, all indexes get locked when the disk threshold is reached and never get unlocked when space is freed.

To unlock all indexes manually:

curl -XPUT -H "Content-Type: application/json" https://[YOUR_ELASTICSEARCH_ENDPOINT]:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

@algestam

This comment has been minimized.

Copy link

algestam commented Nov 23, 2017

Thanks @xose, I just got hit by this again and was able to recover by using the command you suggested :)

The problem occurred on all indices, not just the .kibana one.

According to the ES logs, the indices was set to read-only due to low disk space on the elasticsearch host. I run a single host with Elasticsearch, Kibana, Logstash dockerized together with some other tools. As this problem affects other indices is think this is more of an Elasticsearch problem and that the problem seen in Kibana is a symptom of another issue.

@saberkun

This comment has been minimized.

Copy link

saberkun commented Nov 27, 2017

This bug is stupid. Can you Unbreak it for now? At least you should display a warning and list a possible solution. It is really stupid for me to look into js error log and find this thread!

@darkpixel

This comment has been minimized.

Copy link

darkpixel commented Nov 27, 2017

@saberkun You can unbreak it by following the command @xose posted:

curl -XPUT -H "Content-Type: application/json" https://[YOUR_ELASTICSEARCH_ENDPOINT]:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'
@saberkun

This comment has been minimized.

Copy link

saberkun commented Nov 27, 2017

@darkpixel

This comment has been minimized.

Copy link

darkpixel commented Nov 27, 2017

Can you provide additional information? Did you receive an error when running the command? Did the indices unlock and now you're getting a new error message? What error messages are you seeing in your log files now?

@saberkun

This comment has been minimized.

Copy link

saberkun commented Nov 27, 2017

@kesha-antonov

This comment has been minimized.

Copy link

kesha-antonov commented Nov 27, 2017

+1
Receiving this error after upgrade from 5.5 to 6.0

@purplesrl

This comment has been minimized.

Copy link

purplesrl commented Nov 27, 2017

+1

ELK 6, cleared half the drive still read-only, logstash is allowed to write again, kibana remained read-only

Managed to solve the issue with the workaround provided by @xose

@harmenverburg

This comment has been minimized.

Copy link

harmenverburg commented Dec 4, 2017

+1, same error for me.

@sangeetawakhale

This comment has been minimized.

Copy link

sangeetawakhale commented Dec 5, 2017

Same issue for me. Got resolved by solution given by @xose.

@patodevilla

This comment has been minimized.

Copy link

patodevilla commented Jan 10, 2018

Same here. All hail @xose.

@darkpixel

This comment has been minimized.

Copy link

darkpixel commented Jan 14, 2018

I just upgraded a single-node cluster from 6.0.0 to 6.1.1 (both ES and Kibana). When I started the services back up, Kibana was throwing:

blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];

Same as last time--I had to delete the .kibana index to get it back up and going. There was also the current logstash index with one of the shards listed as unallocated. I deleted it as well and then got the usual flood of alerts in.

I didn't run out of space--there's ~92 GB out of 120 GB free on this test machine. The storage location is ZFS and a scrub didn't reveal any data corruption.

The only errors in the log appear to be irrelevant:

[2018-01-13T20:48:14,579][INFO ][o.e.n.Node               ] [ripley1] stopping ...
[2018-01-13T20:48:14,597][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
java.util.concurrent.RejectedExecutionException: event executor terminated
        at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:821) ~[netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:327) ~[netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:320) ~[netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:746) ~[netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:760) [netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:428) [netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.DefaultPromise.setFailure(DefaultPromise.java:113) [netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.DefaultChannelPromise.setFailure(DefaultChannelPromise.java:87) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.AbstractChannelHandlerContext.safeExecute(AbstractChannelHandlerContext.java:1010) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:825) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1027) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:301) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at org.elasticsearch.http.netty4.Netty4HttpChannel.sendResponse(Netty4HttpChannel.java:146) [transport-netty4-6.0.0.jar:6.0.0]
        at org.elasticsearch.rest.RestController$ResourceHandlingHttpChannel.sendResponse(RestController.java:491) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.rest.action.RestResponseListener.processResponse(RestResponseListener.java:37) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:47) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:85) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:81) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.finishHim(TransportBulkAction.java:380) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.onFailure(TransportBulkAction.java:375) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.support.TransportAction$1.onFailure(TransportAction.java:91) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.finishAsFailed(TransportReplicationAction.java:908) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onClusterServiceClose(TransportReplicationAction.java:891) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onClusterServiceClose(ClusterStateObserver.java:310) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onClose(ClusterStateObserver.java:230) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.cluster.service.ClusterApplierService.doStop(ClusterApplierService.java:168) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.common.component.AbstractLifecycleComponent.stop(AbstractLifecycleComponent.java:85) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.cluster.service.ClusterService.doStop(ClusterService.java:106) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.common.component.AbstractLifecycleComponent.stop(AbstractLifecycleComponent.java:85) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.node.Node.stop(Node.java:713) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.node.Node.close(Node.java:735) [elasticsearch-6.0.0.jar:6.0.0]
        at org.apache.lucene.util.IOUtils.close(IOUtils.java:89) [lucene-core-7.0.1.jar:7.0.1 8d6c3889aa543954424d8ac1dbb3f03bf207140b - sarowe - 2017-10-02 14:36:35]
        at org.apache.lucene.util.IOUtils.close(IOUtils.java:76) [lucene-core-7.0.1.jar:7.0.1 8d6c3889aa543954424d8ac1dbb3f03bf207140b - sarowe - 2017-10-02 14:36:35]
        at org.elasticsearch.bootstrap.Bootstrap$4.run(Bootstrap.java:185) [elasticsearch-6.0.0.jar:6.0.0]
[2018-01-13T20:48:14,692][INFO ][o.e.n.Node               ] [ripley1] stopped
[2018-01-13T20:48:14,692][INFO ][o.e.n.Node               ] [ripley1] closing ...
[2018-01-13T20:48:14,704][INFO ][o.e.n.Node               ] [ripley1] closed
[2018-01-13T20:48:39,879][INFO ][o.e.n.Node               ] [ripley1] initializing ...
[2018-01-13T20:48:40,054][INFO ][o.e.e.NodeEnvironment    ] [ripley1] using [1] data paths, mounts [[/scratch/elasticsearch (scratch/elasticsearch)]], net usable_space [92.5gb], net total_space [93.6gb], types [zfs]
[2018-01-13T20:48:40,055][INFO ][o.e.e.NodeEnvironment    ] [ripley1] heap size [989.8mb], compressed ordinary object pointers [true]
[2018-01-13T20:48:40,119][INFO ][o.e.n.Node               ] [ripley1] node name [ripley1], node ID [TvkaGbQpR5KZ-ZScMZN6AQ]
[2018-01-13T20:48:40,119][INFO ][o.e.n.Node               ] [ripley1] version[6.1.1], pid[6942], build[bd92e7f/2017-12-17T20:23:25.338Z], OS[Linux/4.10.0-38-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_151/25.151-b12]
[2018-01-13T20:48:40,120][INFO ][o.e.n.Node               ] [ripley1] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/var/lib/elasticsearch, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/etc/elasticsearch]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [aggs-matrix-stats]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [analysis-common]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [ingest-common]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [lang-expression]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [lang-mustache]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [lang-painless]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [mapper-extras]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [parent-join]
[2018-01-13T20:48:41,320][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [percolator]
[2018-01-13T20:48:41,320][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [reindex]
[2018-01-13T20:48:41,320][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [repository-url]
[2018-01-13T20:48:41,320][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [transport-netty4]
[2018-01-13T20:48:41,320][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [tribe]
[2018-01-13T20:48:41,321][INFO ][o.e.p.PluginsService     ] [ripley1] no plugins loaded
[2018-01-13T20:48:43,801][INFO ][o.e.d.DiscoveryModule    ] [ripley1] using discovery type [zen]
[2018-01-13T20:48:44,587][INFO ][o.e.n.Node               ] [ripley1] initialized
[2018-01-13T20:48:44,587][INFO ][o.e.n.Node               ] [ripley1] starting ...
[2018-01-13T20:48:44,587][INFO ][o.e.n.Node               ] [ripley1] starting ...
[2018-01-13T20:48:44,759][INFO ][o.e.t.TransportService   ] [ripley1] publish_address {192.168.42.40:9300}, bound_addresses {[::]:9300}
[2018-01-13T20:48:44,792][INFO ][o.e.b.BootstrapChecks    ] [ripley1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2018-01-13T20:48:47,864][INFO ][o.e.c.s.MasterService    ] [ripley1] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {ripley1}{TvkaGbQpR5KZ-ZScMZN6AQ}{H39AkwwqS_i-fg3Gl5J8QQ}{192.168.42.40}{192.168.42.40:9300}
[2018-01-13T20:48:47,869][INFO ][o.e.c.s.ClusterApplierService] [ripley1] new_master {ripley1}{TvkaGbQpR5KZ-ZScMZN6AQ}{H39AkwwqS_i-fg3Gl5J8QQ}{192.168.42.40}{192.168.42.40:9300}, reason: apply cluster state (from master [master {ripley1}{TvkaGbQpR5KZ-ZScMZN6AQ}{H39AkwwqS_i-fg3Gl5J8QQ}{192.168.42.40}{192.168.42.40:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[2018-01-13T20:48:47,884][INFO ][o.e.h.n.Netty4HttpServerTransport] [ripley1] publish_address {192.168.42.40:9200}, bound_addresses {[::]:9200}
[2018-01-13T20:48:47,884][INFO ][o.e.n.Node               ] [ripley1] started
[2018-01-13T20:48:48,326][INFO ][o.e.g.GatewayService     ] [ripley1] recovered [6] indices into cluster_state
[2018-01-13T20:49:01,493][INFO ][o.e.c.m.MetaDataDeleteIndexService] [ripley1] [logstash-2018.01.14/D0f_lDkSQpebPFcey6NHFw] deleting index
[2018-01-13T20:49:18,793][INFO ][o.e.c.m.MetaDataCreateIndexService] [ripley1] [logstash-2018.01.14] creating index, cause [auto(bulk api)], templates [logstash-*], shards [5]/[0], mappings []
[2018-01-13T20:49:18,937][INFO ][o.e.c.r.a.AllocationService] [ripley1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[logstash-2018.01.14][4]] ...]).
@zjhgx

This comment has been minimized.

Copy link

zjhgx commented Feb 7, 2018

+1 same error in 6.1.2

@tylersmalley

This comment has been minimized.

Copy link
Member

tylersmalley commented Feb 7, 2018

This is a function of Elasticsearch. Per the Elasticsearch error, all indices on this node will marked read-only.

To revert this for an index you can set index.blocks.read_only_allow_delete to null.

More information on this can be found here: https://www.elastic.co/guide/en/elasticsearch/reference/current/disk-allocator.html

@darkpixel

This comment has been minimized.

Copy link

darkpixel commented Mar 27, 2018

FYI - for anyone still running into this, here's a quick one-liner to fix the indices:
curl -s -H "Content-Type: application/json" http://localhost:9200/_cat/indices | awk '{ print $3 }' | sort | xargs -L 1 -I{} curl -s -XPUT -H "Content-Type: application/json" http://localhost:9200/{}/_settings -d '{"index.blocks.read_only_allow_delete": null}'

It grabs a list of all the indices in your cluster, then for each one it sends the command to make it not read-only.

@outworlder

This comment has been minimized.

Copy link

outworlder commented Oct 2, 2018

FYI - for anyone still running into this, here's a quick one-liner to fix the indices:
curl -s -H "Content-Type: application/json" http://localhost:9200/_cat/indices | awk '{ print $3 }' | sort | xargs -L 1 -I{} curl -s -XPUT -H "Content-Type: application/json" http://localhost:9200/{}/_settings -d '{"index.blocks.read_only_allow_delete": null}'

It grabs a list of all the indices in your cluster, then for each one it sends the command to make it not read-only.

I too was doing this until I found @darkpixel 's solution (#13685 (comment))

You can do this setting for _all instead of going one by one. In my case, it takes quite a while to do it for hundreds of indices, while setting on 'all' takes only a few seconds.

curl -XPUT -H "Content-Type: application/json" https://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

@Frank591

This comment has been minimized.

Copy link

Frank591 commented Nov 29, 2018

i resolved the issue by deleting the .kibana index:
delete /.kibana/
I lose certains configurations/visualizations/dashboards but it dislocked.

Thanks a lot for this WA. It's solved problem for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment