Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG][Segment Replication] ReplicationFailedException and ALLOCATION_FAILED #9966

Closed
kksaha opened this issue Sep 11, 2023 · 2 comments · Fixed by #10370
Closed

[BUG][Segment Replication] ReplicationFailedException and ALLOCATION_FAILED #9966

kksaha opened this issue Sep 11, 2023 · 2 comments · Fixed by #10370
Labels
bug Something isn't working Indexing:Replication Issues and PRs related to core replication framework eg segrep Indexing Indexing, Bulk Indexing and anything related to indexing v2.11.0 Issues and PRs related to version 2.11.0

Comments

@kksaha
Copy link

kksaha commented Sep 11, 2023

Describe the bug

Shard failure, reason [replication failure]], failure [ReplicationFailedException
Failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException.||

Logs:

[2023-09-08T09:28:10,201][WARN ][o.o.c.r.a.AllocationService] [master-2] failing shard [failed shard, shard [ppe-000298][0], node[ylgTCi8VSk-iytumnaGxlg], [R], s[STARTED], a[id=1OR-X-XDTrCMLvXWr8k0sw], message [shard failure, reason [replication failure]], failure [ReplicationFailedException[[ppe-000298][0]: Replication failed on (failed to clean after replication)]; nested: CorruptIndexException[Problem reading index. (resource=/usr/share/opensearch/data/nodes/0/indices/rIJ86tpXTIG4h-Cn_MoPRg/0/index/_7tvv.cfe)]; nested: NoSuchFileException[/usr/share/opensearch/data/nodes/0/indices/rIJ86tpXTIG4h-Cn_MoPRg/0/index/_7tvv.cfe]; ], markAsStale [true]] [2023-09-08T09:28:10,201][WARN ][o.o.c.r.a.AllocationService] [master-2] failing shard [failed shard, shard [ppe-000298][0], node[ylgTCi8VSk-iytumnaGxlg], [R], s[STARTED], a[id=1OR-X-XDTrCMLvXWr8k0sw], message [shard failure, reason [replication failure]], failure [ReplicationFailedException[[ppe-000298][0]: Replication failed on (failed to clean after replication)]; nested: CorruptIndexException[Problem reading index. (resource=/usr/share/opensearch/data/nodes/0/indices/rIJ86tpXTIG4h-Cn_MoPRg/0/index/_7tvv.cfe)]; nested: NoSuchFileException[/usr/share/opensearch/data/nodes/0/indices/rIJ86tpXTIG4h-Cn_MoPRg/0/index/_7tvv.cfe]; ], markAsStale [true]] [2023-09-08T09:28:20,522][WARN ][o.o.c.r.a.AllocationService] [master-2] failing shard [failed shard, shard [ppe-000298][0], node[EDhutdeXT5W5luFLpIF7sw], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=5cUw3rGZSbuWLOSrQygkvA], unassigned_info[[reason=ALLOCATION_FAILED], at[2023-09-08T09:28:10.201Z], failed_attempts[1], delayed=false, details[failed shard on node [ylgTCi8VSk-iytumnaGxlg]: shard failure, reason [replication failure], failure ReplicationFailedException[[ppe-000298][0]: Replication failed on (failed to clean after replication)]; nested: CorruptIndexException[Problem reading index. (resource=/usr/share/opensearch/data/nodes/0/indices/rIJ86tpXTIG4h-Cn_MoPRg/0/index/_7tvv.cfe)]; nested: NoSuchFileException[/usr/share/opensearch/data/nodes/0/indices/rIJ86tpXTIG4h-Cn_MoPRg/0/index/_7tvv.cfe]; ], allocation_status[no_attempt]], expected_shard_size[13863289464], message [failed to create shard], failure [IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[ppe-000298][0]: obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [19241322ms]]; ], markAsStale [true]] [2023-09-08T09:28:20,522][WARN ][o.o.c.r.a.AllocationService] [master-2] failing shard [failed shard, shard [ppe-000298][0], node[EDhutdeXT5W5luFLpIF7sw], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=5cUw3rGZSbuWLOSrQygkvA], unassigned_info[[reason=ALLOCATION_FAILED], at[2023-09-08T09:28:10.201Z], failed_attempts[1], delayed=false, details[failed shard on node [ylgTCi8VSk-iytumnaGxlg]: shard failure, reason [replication failure], failure ReplicationFailedException[[ppe-000298][0]: Replication failed on (failed to clean after replication)]; nested: CorruptIndexException[Problem reading index. (resource=/usr/share/opensearch/data/nodes/0/indices/rIJ86tpXTIG4h-Cn_MoPRg/0/index/_7tvv.cfe)]; nested: NoSuchFileException[/usr/share/opensearch/data/nodes/0/indices/rIJ86tpXTIG4h-Cn_MoPRg/0/index/_7tvv.cfe]; ], allocation_status[no_attempt]], expected_shard_size[13863289464], message [failed to create shard], failure [IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[ppe-000298][0]: obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [19241322ms]]; ], markAsStale [true]] [2023-09-08T09:28:30,607][WARN ][o.o.c.r.a.AllocationService] [master-2] failing shard [failed shard, shard [ppe-000298][0], node[ylgTCi8VSk-iytumnaGxlg], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=A7J2Jm_7SAOY2RLIXH-qZA], unassigned_info[[reason=ALLOCATION_FAILED], at[2023-09-08T09:28:20.522Z], failed_attempts[2], failed_nodes[[EDhutdeXT5W5luFLpIF7sw]], delayed=false, details[failed shard on node [EDhutdeXT5W5luFLpIF7sw]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[ppe-000298][0]: obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [19241322ms]]; ], allocation_status[no_attempt]], message [failed to create shard], failure [IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[ppe-000298][0]: obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [20397ms]]; ], markAsStale [true]] [2023-09-08T09:28:30,607][WARN ][o.o.c.r.a.AllocationService] [master-2] failing shard [failed shard, shard [ppe-000298]

It seems segment replication event failed due to index corruption exception because of missing segment file
NoSuchFileException "/usr/share/opensearch/data/nodes/0/indices/rIJ86tpXTIG4h-Cn_MoPRg/0/index/_7tvv.cfe" doesn't exist
and ShardLockObtainFailedException on shard 0.

{ "index": "ppe-000298", "shard": 0, "primary": false, "current_state": "unassigned", "unassigned_info": { "reason": "ALLOCATION_FAILED", "at": "2023-09-08T14:12:35.637Z", "failed_allocation_attempts": 5, "details": "failed shard on node [ylgTCi8VSk-iytumnaGxlg]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[ppe-000298][0]: obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [17065425ms]]; ", "last_allocation_status": "no_attempt" }, "can_allocate": "no", "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes", "node_allocation_decisions": [ { "node_id": "17zuTMYtQ9KvKKNa7gm0Ig", "node_name": “\data-az1-1", "transport_address": “*.*.*.*:9300", "node_attributes": { "zone": "az1", "shard_indexing_pressure_enabled": "true" }, "node_decision": "no", "deciders": [ { "decider": "max_retry", "decision": "NO", "explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2023-09-08T14:12:35.637Z], failed_attempts[5], failed_nodes[[EDhutdeXT5W5luFLpIF7sw, ylgTCi8VSk-iytumnaGxlg]], delayed=false, details[failed shard on node [ylgTCi8VSk-iytumnaGxlg]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[ppe-000298][0]: obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [17065425ms]]; ], allocation_status[no_attempt]]]" } ] }

Eventually replica shards fall behind primary for too long and huge lagging.

Screenshots
index shard prirep state docs store ip node
ppe-000298 1 p STARTED 33768999 31.4gb 10...* data-az2-3
ppe-000298 1 r STARTED 1412441 1.3gb 10...* data-az1-4
ppe-000298 2 p STARTED 33763101 35.3gb 10...* data-az1-6
ppe-000298 2 r STARTED 5928658 5.1gb 10...* data-az2-2
ppe-000298 0 p STARTED 33758088 30.1gb 10...* data-az2-1
ppe-000298 0 r UNASSIGNED

Host/Environment (please complete the following information):

  • OS: Linux
  • Version: 2.8.0

We have tried to manually reroute the shard allocation but that didn't help.

@kksaha kksaha added bug Something isn't working untriaged labels Sep 11, 2023
@kksaha
Copy link
Author

kksaha commented Sep 15, 2023

Can anyone please advise on how to fix this issue?

@kotwanikunal kotwanikunal added the Indexing Indexing, Bulk Indexing and anything related to indexing label Sep 19, 2023
@anasalkouz anasalkouz added Indexing:Replication Issues and PRs related to core replication framework eg segrep and removed untriaged labels Sep 22, 2023
@mch2
Copy link
Member

mch2 commented Sep 25, 2023

Hi @kksaha This looks like the same issue reported here.

To fix on the running cluster you would need to bounce the node that is trying to allocate this shard ppe-000298 0 r UNASSIGNED.

The cause of the corruption has been fixed with 2.9 with this PR. However, we still need to figure out why the shard was not able to auto recover & re-allocate to the same node.

From your trace, it looks like the shard lock is still being held by the store https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/store/Store.java#L868C3-L868C3, meaning post corruption we weren't able to close the shard - so it is unable to recreate. Working on reproducing this to see whats going on.

@anasalkouz anasalkouz added the v2.11.0 Issues and PRs related to version 2.11.0 label Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Indexing:Replication Issues and PRs related to core replication framework eg segrep Indexing Indexing, Bulk Indexing and anything related to indexing v2.11.0 Issues and PRs related to version 2.11.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants