Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data race condition in automaton queries #105911

Closed
scampi opened this issue Mar 4, 2024 · 1 comment · Fixed by #106678
Closed

Data race condition in automaton queries #105911

scampi opened this issue Mar 4, 2024 · 1 comment · Fixed by #106678
Assignees
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@scampi
Copy link
Contributor

scampi commented Mar 4, 2024

Elasticsearch Version

8.12.1

Installed Plugins

No response

Java Version

bundled

OS Version

archlinux 6.7.8

Problem Description

First described in the post https://discuss.elastic.co/t/data-race-condition-in-automaton-queries/353937?u=yfful there seems to be a data race condition when evaluating automaton queries like regexp. In a local unit test involving a runtime field and a regexp query, I have experienced search inconsistencies with the result count.

Steps to Reproduce

The query shown below uses the script parity which returns even or odd depending on the value of another numeric field. It returns sometimes an unexpected number of hits.

{
  "query": {
    "regexp": {
      "outer_parity": {
        "value": "e.e."
      }
    }
  },
  "runtime_mappings": {
    "outer_parity": {
      "type": "keyword",
      "script": {
        "lang": "painless",
        "source": "parity"
      }
    }
  }
}

I believe that following this change that enabled parallelization by default, the queries extending AbstractStringScriptFieldAutomatonQuery have a data race condition. If I understand that change correctly, then the code below can be called concurrently:

public Scorer scorer(LeafReaderContext ctx) {
S scriptContext = scriptContextFunction.apply(ctx);
DocIdSetIterator approximation = DocIdSetIterator.all(ctx.reader().maxDoc());
TwoPhaseIterator twoPhase = new TwoPhaseIterator(approximation) {
@Override
public boolean matches() {
return AbstractScriptFieldQuery.this.matches(scriptContext, approximation.docID());

Therefore, the BytesRefBuilder scratch below is shared by all threads that execute a search on different segments, which would lead to the race condition I am seeing. With the query shared above, the scratch variable would contain even for example, although the values list passed in argument contains only odd. The race condition would explain this inconsistency.

Logs (if relevant)

No response

@scampi scampi added >bug needs:triage Requires assignment of a team area label labels Mar 4, 2024
@joegallo joegallo added the :Search/Search Search-related issues that do not fall into other categories label Mar 4, 2024
@elasticsearchmachine elasticsearchmachine added Team:Search Meta label for search team and removed needs:triage Requires assignment of a team area label labels Mar 4, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@javanna javanna self-assigned this Mar 11, 2024
javanna added a commit to javanna/elasticsearch that referenced this issue Mar 25, 2024
Back when we introduced queries against runtime fields, Elasticsearch did not support
inter-segment concurrency yet. At the time, it was fine to assume that segments will be
searched sequentially. AbstractStringScriptFieldAutomatonQuery used to have a BytesRefBuilder
instance shared across the segments, which gets re-initialized when each segment starts its work.
This is no longer possible with inter-segment concurrency.

Closes elastic#105911
javanna added a commit that referenced this issue Mar 26, 2024
Back when we introduced queries against runtime fields, Elasticsearch did not support
inter-segment concurrency yet. At the time, it was fine to assume that segments will be
searched sequentially. AbstractStringScriptFieldAutomatonQuery used to have a BytesRefBuilder
instance shared across the segments, which gets re-initialized when each segment starts its work.
This is no longer possible with inter-segment concurrency.

Closes #105911
javanna added a commit that referenced this issue Mar 26, 2024
Back when we introduced queries against runtime fields, Elasticsearch did not support
inter-segment concurrency yet. At the time, it was fine to assume that segments will be
searched sequentially. AbstractStringScriptFieldAutomatonQuery used to have a BytesRefBuilder
instance shared across the segments, which gets re-initialized when each segment starts its work.
This is no longer possible with inter-segment concurrency.

Closes #105911
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants