Data race condition in automaton queries #105911

scampi · 2024-03-04T15:15:17Z

Elasticsearch Version

8.12.1

Installed Plugins

No response

Java Version

bundled

OS Version

archlinux 6.7.8

Problem Description

First described in the post https://discuss.elastic.co/t/data-race-condition-in-automaton-queries/353937?u=yfful there seems to be a data race condition when evaluating automaton queries like regexp. In a local unit test involving a runtime field and a regexp query, I have experienced search inconsistencies with the result count.

Steps to Reproduce

The query shown below uses the script parity which returns even or odd depending on the value of another numeric field. It returns sometimes an unexpected number of hits.

{
  "query": {
    "regexp": {
      "outer_parity": {
        "value": "e.e."
      }
    }
  },
  "runtime_mappings": {
    "outer_parity": {
      "type": "keyword",
      "script": {
        "lang": "painless",
        "source": "parity"
      }
    }
  }
}

I believe that following this change that enabled parallelization by default, the queries extending AbstractStringScriptFieldAutomatonQuery have a data race condition. If I understand that change correctly, then the code below can be called concurrently:

elasticsearch/server/src/main/java/org/elasticsearch/search/runtime/AbstractScriptFieldQuery.java

Lines 71 to 77 in f0e4317

    
           public Scorer scorer(LeafReaderContext ctx) { 
        
               S scriptContext = scriptContextFunction.apply(ctx); 
        
               DocIdSetIterator approximation = DocIdSetIterator.all(ctx.reader().maxDoc()); 
        
               TwoPhaseIterator twoPhase = new TwoPhaseIterator(approximation) { 
        
                   @Override 
        
                   public boolean matches() { 
        
                       return AbstractScriptFieldQuery.this.matches(scriptContext, approximation.docID());

Therefore, the BytesRefBuilder scratch below is shared by all threads that execute a search on different segments, which would lead to the race condition I am seeing. With the query shared above, the scratch variable would contain even for example, although the values list passed in argument contains only odd. The race condition would explain this inconsistency.

elasticsearch/server/src/main/java/org/elasticsearch/search/runtime/AbstractStringScriptFieldAutomatonQuery.java

Line 20 in f0e4317

private final BytesRefBuilder scratch = new BytesRefBuilder();

Logs (if relevant)

No response

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2024-03-04T16:19:22Z

Pinging @elastic/es-search (Team:Search)

Back when we introduced queries against runtime fields, Elasticsearch did not support inter-segment concurrency yet. At the time, it was fine to assume that segments will be searched sequentially. AbstractStringScriptFieldAutomatonQuery used to have a BytesRefBuilder instance shared across the segments, which gets re-initialized when each segment starts its work. This is no longer possible with inter-segment concurrency. Closes elastic#105911

Back when we introduced queries against runtime fields, Elasticsearch did not support inter-segment concurrency yet. At the time, it was fine to assume that segments will be searched sequentially. AbstractStringScriptFieldAutomatonQuery used to have a BytesRefBuilder instance shared across the segments, which gets re-initialized when each segment starts its work. This is no longer possible with inter-segment concurrency. Closes #105911

scampi added >bug needs:triage Requires assignment of a team area label labels Mar 4, 2024

joegallo added the :Search/Search Search-related issues that do not fall into other categories label Mar 4, 2024

elasticsearchmachine added Team:Search Meta label for search team and removed needs:triage Requires assignment of a team area label labels Mar 4, 2024

javanna self-assigned this Mar 11, 2024

javanna mentioned this issue Mar 22, 2024

Fix concurrency bug in AbstractStringScriptFieldAutomatonQuery #106678

Merged

javanna closed this as completed in #106678 Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data race condition in automaton queries #105911

Data race condition in automaton queries #105911

scampi commented Mar 4, 2024 •

edited

elasticsearchmachine commented Mar 4, 2024

Data race condition in automaton queries #105911

Data race condition in automaton queries #105911

Comments

scampi commented Mar 4, 2024 • edited

Elasticsearch Version

Installed Plugins

Java Version

OS Version

Problem Description

Steps to Reproduce

Logs (if relevant)

elasticsearchmachine commented Mar 4, 2024

scampi commented Mar 4, 2024 •

edited