New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

river instance not created after successful creation of _meta document #3840

Closed
karol-gwaj opened this Issue Oct 7, 2013 · 1 comment

Comments

Projects
None yet
2 participants
@karol-gwaj

karol-gwaj commented Oct 7, 2013

river instance is not created after successfully issuing PUT _meta request

this problem happens when creating river on completely fresh cluster
and only for first river (following updates to _meta document for this river still dont start it)
creating second river works fine (but first one is still down)

im using latest elasticsearch sources from master branch
and running it from eclipse (one node cluster, no replication)
to replicate delete all cluster data files, then start single node cluster and create river as first request to the cluster

overall it looks like racing condition, after debugging for while i see that when checking if river cluster state changed (RiverClusterStateUpdateTask), event.state().metaData().index(riverIndexName) is not returning newly created river (probably slowed down by initial index creation)

debug logs:

[2013-10-07 15:35:38,937][DEBUG][cluster.service ] [Peepers] processing [create-index [_river], cause [auto(index api)]]: execute
[2013-10-07 15:35:38,950][DEBUG][indices ] [Peepers] creating Index [_river], shards [1]/[0]
[2013-10-07 15:35:43,713][DEBUG][index.mapper ] [Peepers] [_river] using dynamic[true], default mapping: default_mapping_location[null], loaded_from[file:/mnt/data/projects/git/elasticsearch/elasticsearch/target/classes/org/elasticsearch/index/mapper/default-mapping.json], default percolator mapping: location[null], loaded_from[null]
[2013-10-07 15:35:43,722][DEBUG][index.cache.query.parser.resident] [Peepers] [_river] using [resident] query cache with max_size [100], expire [null]
[2013-10-07 15:35:43,965][DEBUG][index.store.fs ] [Peepers] [_river] using index.store.throttle.type [node], with index.store.throttle.max_bytes_per_sec [0b]
[2013-10-07 15:35:45,458][INFO ][cluster.metadata ] [Peepers] [_river] creating index, cause [auto(index api)], shards [1]/[0], mappings []
[2013-10-07 15:35:45,560][DEBUG][index.cache.filter.weighted] [Peepers] [_river] full cache clear, reason [close]
[2013-10-07 15:35:45,570][DEBUG][cluster.service ] [Peepers] cluster state updated, version [3], source [create-index [_river], cause [auto(index api)]]
[2013-10-07 15:35:45,572][DEBUG][cluster.service ] [Peepers] publishing cluster state version 3
[2013-10-07 15:35:45,574][DEBUG][cluster.service ] [Peepers] set local cluster state to version 3
[2013-10-07 15:35:45,579][DEBUG][river.cluster ] [Peepers] processing [reroute_rivers_node_changed]: execute
[2013-10-07 15:35:45,584][DEBUG][indices.cluster ] [Peepers] [_river] creating index
[2013-10-07 15:35:45,587][DEBUG][indices ] [Peepers] creating Index [_river], shards [1]/[0]
[2013-10-07 15:35:45,615][DEBUG][river.cluster ] [Peepers] processing [reroute_rivers_node_changed]: no change in cluster_state
[2013-10-07 15:35:47,464][DEBUG][index.mapper ] [Peepers] [_river] using dynamic[true], default mapping: default_mapping_location[null], loaded_from[file:/mnt/data/projects/git/elasticsearch/elasticsearch/target/classes/org/elasticsearch/index/mapper/default-mapping.json], default percolator mapping: location[null], loaded_from[null]
[2013-10-07 15:35:47,472][DEBUG][index.cache.query.parser.resident] [Peepers] [_river] using [resident] query cache with max_size [100], expire [null]
[2013-10-07 15:35:47,519][DEBUG][index.store.fs ] [Peepers] [_river] using index.store.throttle.type [node], with index.store.throttle.max_bytes_per_sec [0b]
[2013-10-07 15:35:47,593][DEBUG][indices.cluster ] [Peepers] [_river][0] creating shard
[2013-10-07 15:35:47,595][DEBUG][index.service ] [Peepers] [_river] creating shard_id [0]
[2013-10-07 15:35:49,258][DEBUG][index.deletionpolicy ] [Peepers] [_river][0] Using [keep_only_last] deletion policy
[2013-10-07 15:35:49,287][DEBUG][index.merge.policy ] [Peepers] [_river][0] using [tiered] merge policy with expunge_deletes_allowed[10.0], floor_segment[2mb], max_merge_at_once[10], max_merge_at_once_explicit[30], max_merged_segment[5gb], segments_per_tier[10.0], reclaim_deletes_weight[2.0], async_merge[true]
[2013-10-07 15:35:49,293][DEBUG][index.merge.scheduler ] [Peepers] [_river][0] using [concurrent] merge scheduler with max_thread_count[1]
[2013-10-07 15:35:49,377][DEBUG][index.shard.service ] [Peepers] [_river][0] state: [CREATED]
[2013-10-07 15:35:49,389][DEBUG][index.translog ] [Peepers] [_river][0] interval [5s], flush_threshold_ops [5000], flush_threshold_size [200mb], flush_threshold_period [30m]
[2013-10-07 15:35:49,467][DEBUG][index.shard.service ] [Peepers] [_river][0] state: [CREATED]->[RECOVERING], reason [from gateway]
[2013-10-07 15:35:49,474][DEBUG][index.gateway ] [Peepers] [_river][0] starting recovery from local ...
[2013-10-07 15:35:49,620][DEBUG][index.engine.robin ] [Peepers] [_river][0] starting engine
[2013-10-07 15:35:49,708][DEBUG][cluster.service ] [Peepers] processing [create-index [_river], cause [auto(index api)]]: done applying updated cluster_state (version: 3)
[2013-10-07 15:35:50,875][DEBUG][index.shard.service ] [Peepers] [_river][0] scheduling refresher every 1s
[2013-10-07 15:35:50,882][DEBUG][index.shard.service ] [Peepers] [_river][0] scheduling optimizer / merger every 1s
[2013-10-07 15:35:50,884][DEBUG][index.shard.service ] [Peepers] [_river][0] state: [RECOVERING]->[POST_RECOVERY], reason [post recovery from gateway, no translog]
[2013-10-07 15:35:50,888][DEBUG][index.gateway ] [Peepers] [_river][0] recovery completed from [local], took [1.4s]
[2013-10-07 15:35:50,890][DEBUG][cluster.action.shard ] [Peepers] sending shard started for [_river][0], node[FDoGx4I-T8SLGc1IYsBkzw], [P], s[INITIALIZING], indexUUID [hhVydPb3QvuD110I9V4XyA], reason [after recovery from gateway]
[2013-10-07 15:35:50,892][DEBUG][cluster.action.shard ] [Peepers] received shard started for [_river][0], node[FDoGx4I-T8SLGc1IYsBkzw], [P], s[INITIALIZING], indexUUID [hhVydPb3QvuD110I9V4XyA], reason [after recovery from gateway]
[2013-10-07 15:35:50,900][DEBUG][cluster.service ] [Peepers] processing [shard-started ([_river][0], node[FDoGx4I-T8SLGc1IYsBkzw], [P], s[INITIALIZING]), reason [after recovery from gateway]]: execute
[2013-10-07 15:35:50,905][DEBUG][cluster.action.shard ] [Peepers] [_river][0] will apply shard started [_river][0], node[FDoGx4I-T8SLGc1IYsBkzw], [P], s[INITIALIZING], indexUUID [hhVydPb3QvuD110I9V4XyA], reason [after recovery from gateway]
[2013-10-07 15:35:50,915][DEBUG][cluster.service ] [Peepers] cluster state updated, version [4], source [shard-started ([_river][0], node[FDoGx4I-T8SLGc1IYsBkzw], [P], s[INITIALIZING]), reason [after recovery from gateway]]
[2013-10-07 15:35:50,917][DEBUG][cluster.service ] [Peepers] publishing cluster state version 4
[2013-10-07 15:35:50,920][DEBUG][cluster.service ] [Peepers] set local cluster state to version 4
[2013-10-07 15:35:50,924][DEBUG][river.cluster ] [Peepers] processing [reroute_rivers_node_changed]: execute
[2013-10-07 15:35:50,936][DEBUG][river.cluster ] [Peepers] processing [reroute_rivers_node_changed]: no change in cluster_state
[2013-10-07 15:35:50,937][DEBUG][index.shard.service ] [Peepers] [_river][0] state: [POST_RECOVERY]->[STARTED], reason [global state moved to started]
[2013-10-07 15:35:51,155][DEBUG][cluster.service ] [Peepers] processing [shard-started ([_river][0], node[FDoGx4I-T8SLGc1IYsBkzw], [P], s[INITIALIZING]), reason [after recovery from gateway]]: done applying updated cluster_state (version: 4)
[2013-10-07 15:35:51,762][DEBUG][cluster.service ] [Peepers] processing [update-mapping [_river][test]]: execute
[2013-10-07 15:35:51,962][DEBUG][cluster.metadata ] [Peepers] [_river] update_mapping test with source [{"test":{"index_analyzer":"default_index","search_analyzer":"default_search","properties":{"type":{"type":"string"}}}}]
[2013-10-07 15:35:52,022][DEBUG][cluster.service ] [Peepers] cluster state updated, version [5], source [update-mapping [_river][test]]
[2013-10-07 15:35:52,032][DEBUG][cluster.service ] [Peepers] publishing cluster state version 5
[2013-10-07 15:35:52,042][DEBUG][cluster.service ] [Peepers] set local cluster state to version 5
[2013-10-07 15:35:52,055][DEBUG][river.cluster ] [Peepers] processing [reroute_rivers_node_changed]: execute
[2013-10-07 15:35:52,233][DEBUG][river.cluster ] [Peepers] processing [reroute_rivers_node_changed]: no change in cluster_state
[2013-10-07 15:35:52,382][DEBUG][cluster.service ] [Peepers] processing [update-mapping [_river][test]]: done applying updated cluster_state (version: 5)

@ghost ghost assigned javanna Nov 10, 2013

javanna added a commit to javanna/elasticsearch that referenced this issue Nov 10, 2013

Schedule retry if the river type is available but the _meta document …
…isn't

With #3782 we changed the execution order of dynamic mapping updates and index operations. We now first send the mapping update to the master node, and then we index the document. This makes sense but caused issues with rivers as they are started due to the cluster changed event that is triggered on the master node right after the mapping update has been applied, but in order for the river to be started its _meta document needs to be available, which is not the case anymore as the index operation most likely hasn't happened yet. As a result in most of the cases rivers don't get started.
What we want to do is retry a few times if the _meta document wasn't found, so that the river gets started anyway.

Closes #4089, #3840

javanna added a commit that referenced this issue Nov 10, 2013

Schedule retry if the river type is available but the _meta document …
…isn't

With #3782 we changed the execution order of dynamic mapping updates and index operations. We now first send the mapping update to the master node, and then we index the document. This makes sense but caused issues with rivers as they are started due to the cluster changed event that is triggered on the master node right after the mapping update has been applied, but in order for the river to be started its _meta document needs to be available, which is not the case anymore as the index operation most likely hasn't happened yet. As a result in most of the cases rivers don't get started.
What we want to do is retry a few times if the _meta document wasn't found, so that the river gets started anyway.

Closes #4089, #3840

javanna added a commit that referenced this issue Nov 10, 2013

Schedule retry if the river type is available but the _meta document …
…isn't

With #3782 we changed the execution order of dynamic mapping updates and index operations. We now first send the mapping update to the master node, and then we index the document. This makes sense but caused issues with rivers as they are started due to the cluster changed event that is triggered on the master node right after the mapping update has been applied, but in order for the river to be started its _meta document needs to be available, which is not the case anymore as the index operation most likely hasn't happened yet. As a result in most of the cases rivers don't get started.
What we want to do is retry a few times if the _meta document wasn't found, so that the river gets started anyway.

Closes #4089, #3840
@javanna

This comment has been minimized.

Show comment
Hide comment
@javanna

javanna Nov 10, 2013

Member

I pushed a fix for this, as I said in #4089 . Could you confirm it solves the issue?

Member

javanna commented Nov 10, 2013

I pushed a fix for this, as I said in #4089 . Could you confirm it solves the issue?

@javanna javanna closed this Nov 10, 2013

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Schedule retry if the river type is available but the _meta document …
…isn't

With #3782 we changed the execution order of dynamic mapping updates and index operations. We now first send the mapping update to the master node, and then we index the document. This makes sense but caused issues with rivers as they are started due to the cluster changed event that is triggered on the master node right after the mapping update has been applied, but in order for the river to be started its _meta document needs to be available, which is not the case anymore as the index operation most likely hasn't happened yet. As a result in most of the cases rivers don't get started.
What we want to do is retry a few times if the _meta document wasn't found, so that the river gets started anyway.

Closes #4089, #3840
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment