Cluster Unable to Assign Shards, Marvel or otherwise. #16708

zukeru · 2016-02-17T22:01:09Z

Hello, I'm trying to start a new cluster of elasticsearch, but I can't seem to get the shards to allocate correctly. I upgrade to the latest marvel, and elasticsearch 2.2.0 and the cluster won't register the marvel shards. I cant figure out why it wont register. I can't even manually register because it tells me the shard is disabled.

I then created a custom index with a few shards and the shards remain unassigned as well.

curl -XPUT http://localhost:9200/test -d '
{
   "settings" : {
      "number_of_shards" : 3,
      "number_of_replicas" : 1
   }
}

'

In the logs I get the following error:

[2016-02-17 20:04:52,458][ERROR][marvel.agent             ] [i-11a6decb] background thread had an uncaught exception
ElasticsearchException[failed to flush exporter bulks]
    at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:104)
    at org.elasticsearch.marvel.agent.exporter.ExportBulk.close(ExportBulk.java:53)
    at org.elasticsearch.marvel.agent.AgentService$ExportingWorker.run(AgentService.java:201)
    at java.lang.Thread.run(Thread.java:745)
    Suppressed: ElasticsearchException[failed to flush [default_local] exporter bulk]; nested: ElasticsearchException[failure in bulk execution:
[0]: index [.marvel-es-2016.02.17], type [node_stats], id [AVLw1O4Ctq-FZ8CmFK_-], message [UnavailableShardsException[[.marvel-es-2016.02.17][0] primary shard is not active Timeout: [1m], request: [shard bulk {[.marvel-es-2016.02.17][0]}]]]];
        at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:106)
        ... 3 more
    Caused by: ElasticsearchException[failure in bulk execution:
[0]: index [.marvel-es-2016.02.17], type [node_stats], id [AVLw1O4Ctq-FZ8CmFK_-], message [UnavailableShardsException[[.marvel-es-2016.02.17][0] primary shard is not active Timeout: [1m], request: [shard bulk {[.marvel-es-2016.02.17][0]}]]]]
        at org.elasticsearch.marvel.agent.exporter.local.LocalBulk.flush(LocalBulk.java:114)
        at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:101)
        ... 3 more

Then when I try to run a re-route I get:

Kenzans-MacBook-Pro-39:~ grantzukel$ curl -XPOST http://localhost:9200/_cluster/reroute?pretty -d '{ "commands" : [ { "allocate" : { "index" : ".marvel-es-data", "shard" : 0, "node" :"i-e098e03a" } } ] }' 
{
  "error" : {
    "root_cause" : [ {
      "type" : "remote_transport_exception",
      "reason" : "[i-169f4cce][10.194.35.20:9300][cluster:admin/reroute]"
    } ],
    "type" : "illegal_argument_exception",
    "reason" : "[allocate] trying to allocate a primary shard [.marvel-es-data][0], which is disabled"
  },
  "status" : 400
}

Here is my elasticsearch config where i enable rebalance and rerouting and primaries to true.


my settings:

---

cluster.name: infra_elastic_cluster_3

index.number_of_shards: 3
index.store.throttle.type: none

action.auto_create_index: true

index.number_of_replicas: 1
index.requests.cache.enable: true
index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms
index.search.slowlog.threshold.fetch.warn: 1s
index.search.slowlog.threshold.fetch.info: 800ms
index.search.slowlog.threshold.fetch.debug: 500ms
index.search.slowlog.threshold.fetch.trace: 200ms

index.refresh_interval: 1

cloud:
    aws:
      region: us-west-2
    node:
      auto_attributes: true

discovery:
    type: ec2
    ec2:
      groups: infra_elastic_cluster_3
      any_group: false
      ping_timeout: 60s
    zen:
      minimum_master_nodes: 2

node:
  data: true
  master: false
  name: i-29fd85f3

http:
  max_content_length: 1000mb
  cors.allow-origin: "/.*/"
  cors.enabled: true

bootstrap.mlockall: true

script.inline: on 
script.indexed: on 

tr.logging.maxlength: 500000

indices.memory.index_buffer_size: 30%
indices.store.throttle.max_bytes_per_sec: 1000mb
indices.store.throttle.type: Merge
indices.fielddata.cache.size:  40%

threadpool.bulk.type: fixed
threadpool.bulk.size: 100
threadpool.bulk.queue_size: 10000

network.host: _eth0_

query.bool.max_clause_count: 10240

cluster.routing.allocation.enable: all
cluster.routing.allocation.disable_new_allocation: false
cluster.routing.allocation.disable_allocation: false

cluster.routing.allocation.allow_primary: true
cluster.routing.allocation.allow_rebalance: always

Trace log output

2016-02-17 21:57:03,686][TRACE][action.bulk              ] [i-29fd85f3] primary shard [[.marvel-es-2016.02.17][0]] is not yet active, scheduling a retry: action [indices:data/write/bulk[s]], request [shard bulk {[.marvel-es-2016.02.17][0]}], cluster state version [50]
[2016-02-17 21:57:03,686][TRACE][action.bulk              ] [i-29fd85f3] observer: sampled state rejected by predicate (version [50], status [APPLIED]). adding listener to ClusterService
[2016-02-17 21:57:03,686][TRACE][action.bulk              ] [i-29fd85f3] observer: postAdded - predicate rejected state (version [50], status [APPLIED])
[2016-02-17 21:57:43,104][DEBUG][org.apache.http.impl.conn.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-02-17 21:57:43,104][DEBUG][com.amazonaws.internal.SdkSSLSocket] shutting down output of ec2.us-west-2.amazonaws.com/205.251.235.5:443
[2016-02-17 21:57:43,105][DEBUG][com.amazonaws.internal.SdkSSLSocket] closing ec2.us-west-2.amazonaws.com/205.251.235.5:443
[2016-02-17 21:57:43,106][DEBUG][org.apache.http.impl.conn.DefaultClientConnection] Connection 0.0.0.0:38289<->205.251.235.5:443 closed
[2016-02-17 21:58:03,687][TRACE][action.bulk              ] [i-29fd85f3] observer: timeout notification from cluster service. timeout setting [1m], time since start [1m]
[2016-02-17 21:58:03,687][TRACE][action.bulk              ] [i-29fd85f3] primary shard [[.marvel-es-2016.02.17][0]] is not yet active, scheduling a retry: action [indices:data/write/bulk[s]], request [shard bulk {[.marvel-es-2016.02.17][0]}], cluster state version [50]
[2016-02-17 21:58:03,687][TRACE][action.bulk              ] [i-29fd85f3] operation failed. action [indices:data/write/bulk[s]], request [shard bulk {[.marvel-es-2016.02.17][0]}]
UnavailableShardsException[[.marvel-es-2016.02.17][0] primary shard is not active Timeout: [1m], request: [shard bulk {[.marvel-es-2016.02.17][0]}]]
    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retryBecauseUnavailable(TransportReplicationAction.java:555)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.doRun(TransportReplicationAction.java:431)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onTimeout(TransportReplicationAction.java:520)
    at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:239)
    at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:794)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
[2016-02-17 21:58:03,687][ERROR][marvel.agent             ] [i-29fd85f3] background thread had an uncaught exception
ElasticsearchException[failed to flush exporter bulks]
    at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:104)
    at org.elasticsearch.marvel.agent.exporter.ExportBulk.close(ExportBulk.java:53)
    at org.elasticsearch.marvel.agent.AgentService$ExportingWorker.run(AgentService.java:201)
    at java.lang.Thread.run(Thread.java:745)
    Suppressed: ElasticsearchException[failed to flush [default_local] exporter bulk]; nested: ElasticsearchException[failure in bulk execution:
[0]: index [.marvel-es-2016.02.17], type [node_stats], id [AVLxPI5GwpZSvKDdhqNh], message [UnavailableShardsException[[.marvel-es-2016.02.17][0] primary shard is not active Timeout: [1m], request: [shard bulk {[.marvel-es-2016.02.17][0]}]]]];
        at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:106)
        ... 3 more
    Caused by: ElasticsearchException[failure in bulk execution:
[0]: index [.marvel-es-2016.02.17], type [node_stats], id [AVLxPI5GwpZSvKDdhqNh], message [UnavailableShardsException[[.marvel-es-2016.02.17][0] primary shard is not active Timeout: [1m], request: [shard bulk {[.marvel-es-2016.02.17][0]}]]]]
        at org.elasticsearch.marvel.agent.exporter.local.LocalBulk.flush(LocalBulk.java:114)
        at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:101)
        ... 3 more
[2016-02-17 21:58:13,693][TRACE][action.bulk              ] [i-29fd85f3] primary shard [[.marvel-es-2016.02.17][0]] is not yet active, scheduling a retry: action [indices:data/write/bulk[s]], request [shard bulk {[.marvel-es-2016.02.17][0]}], cluster state version [50]
[2016-02-17 21:58:13,693][TRACE][action.bulk              ] [i-29fd85f3] observer: sampled state rejected by predicate (version [50], status [APPLIED]). adding listener to ClusterService
[2016-02-17 21:58:13,694][TRACE][action.bulk              ] [i-29fd85f3] observer: postAdded - predicate rejected state (version [50], status [APPLIED])

The text was updated successfully, but these errors were encountered:

ywelsch · 2016-02-17T23:42:39Z

Can you try the reroute command again by setting allow_primary to true (see https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-reroute.html)? This allows the allocate command to also allocate primary shards (Note that this loses all existing data for that shard):

curl -XPOST http://localhost:9200/_cluster/reroute?pretty -d '{ "commands" : [ { "allocate" : { "index" : ".marvel-es-data", "shard" : 0, "node" :"i-e098e03a", "allow_primary": "true" } } ] }'

clintongormley · 2016-05-09T12:40:58Z

No further feedback. Closing

portante · 2016-06-22T04:54:40Z

@clintongormley, I encountered the same problem, and the above reroute fixed that instance. How do I fix this so that all new marvel indices don't have this problem? Do I need to add a template that addresses this?

clintongormley · 2016-06-22T10:28:02Z

@portante the important thing to figure out is why the index is not being allocated - we never got to the bottom of the story here. Possibly to do with allocation settings? Feel free to open a new issue so we can delve into it

zukeru changed the title ~~Marvel ElasticsearchException[failed to flush exporter bulks]~~ Cluster Unable to Assign Shards, Marvel or otherwise. Feb 17, 2016

clintongormley added feedback_needed :Allocation labels Feb 28, 2016

clintongormley closed this as completed May 9, 2016

lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018

clintongormley added :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) and removed :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster Unable to Assign Shards, Marvel or otherwise. #16708

Cluster Unable to Assign Shards, Marvel or otherwise. #16708

zukeru commented Feb 17, 2016

ywelsch commented Feb 17, 2016

clintongormley commented May 9, 2016

portante commented Jun 22, 2016

clintongormley commented Jun 22, 2016

Cluster Unable to Assign Shards, Marvel or otherwise. #16708

Cluster Unable to Assign Shards, Marvel or otherwise. #16708

Comments

zukeru commented Feb 17, 2016

ywelsch commented Feb 17, 2016

clintongormley commented May 9, 2016

portante commented Jun 22, 2016

clintongormley commented Jun 22, 2016