[BUG] Same translog metadata file uploaded from old primary during a race condition #11322

ashking94 · 2023-11-24T05:08:02Z

Describe the bug
Across different nodes, the combination of primary term, translog generation has to be unique for the translog metadata file name.
There is a bug where the old primary can still upload a translog metadata file which has same primary term and generation which is generated as part of the relocation handoff by the new primary. This happens when there is any internal or background flush triggered around the same time as the relocation handoff but just before the primary mode becomes false on the old primary. In the cases where we found the issue, the internal flush was triggered due to no writes happening in last 5 mins on a shard and the relocation happening around the same time as of the internal flush.

OpenSearch/server/src/main/java/org/opensearch/index/shard/IndexShard.java

Lines 2679 to 2701 in fe2d585

    
           public void flushOnIdle(long inactiveTimeNS) { 
        
               Engine engineOrNull = getEngineOrNull(); 
        
               if (engineOrNull != null && System.nanoTime() - engineOrNull.getLastWriteNanos() >= inactiveTimeNS) { 
        
                   boolean wasActive = active.getAndSet(false); 
        
                   if (wasActive) { 
        
                       logger.debug("flushing shard on inactive"); 
        
                       threadPool.executor(ThreadPool.Names.FLUSH).execute(new AbstractRunnable() { 
        
                           @Override 
        
                           public void onFailure(Exception e) { 
        
                               if (state != IndexShardState.CLOSED) { 
        
                                   logger.warn("failed to flush shard on inactive", e); 
        
                               } 
        
                           } 
        
                           @Override 
        
                           protected void doRun() { 
        
                               flush(new FlushRequest().waitIfOngoing(false).force(false)); 
        
                               periodicFlushMetric.inc(); 
        
                           } 
        
                       }); 
        
                   } 
        
               } 
        
           }

To Reproduce
This is very difficult to reproduce and shows up at very high scale. However, we can still attempt to reproduce by creating mutliple indexes and triggering the relocation around the 5th minute of no write on the shard.

Expected behavior
The old primary must not upload once the control reaches the handoff stage.

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

OS: [e.g. iOS]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

kiranprakash154 · 2024-01-19T00:32:13Z

Hi, are we on track for this to be released in 2.12 ?

ashking94 · 2024-02-07T09:20:59Z

This has been solved, the PR is referenced.

ashking94 added bug Something isn't working untriaged labels Nov 24, 2023

ashking94 self-assigned this Nov 24, 2023

ashking94 added Storage:Durability Issues and PRs related to the durability framework Storage:Remote v2.12.0 Issues and PRs related to version 2.12.0 and removed untriaged labels Nov 24, 2023

ashking94 mentioned this issue Nov 24, 2023

[Remote Store] Handoff refreshes, translog uploads during relocation from old to new primary #11330

Merged

8 tasks

ashking94 closed this as completed Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Same translog metadata file uploaded from old primary during a race condition #11322

[BUG] Same translog metadata file uploaded from old primary during a race condition #11322

ashking94 commented Nov 24, 2023

kiranprakash154 commented Jan 19, 2024

ashking94 commented Feb 7, 2024

[BUG] Same translog metadata file uploaded from old primary during a race condition #11322

[BUG] Same translog metadata file uploaded from old primary during a race condition #11322

Comments

ashking94 commented Nov 24, 2023

kiranprakash154 commented Jan 19, 2024

ashking94 commented Feb 7, 2024