Skip to content

Conversation

piergm
Copy link
Member

@piergm piergm commented Oct 15, 2025

50% code from @alexey-ivanov-es 😄

Description to be added before I transition this PR to be ready for review

listener.onResponse(FieldCapabilitiesResponse.builder().withFailures(failures).build());
} else {
// throw back the first exception
listener.onFailure(failures.get(0).getException());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is a bit tricky for CCS/CPS index resolution. As commented it literally re-throws back the first exception as is it was reported by remote.

For example for FieldCapsRequest(indices=[r1:index1,r2:index1,r3:index1]) (note index names are the same, only remotes are different), it could throw something like: org.elasticsearch.transport.RemoteTransportException with a cause org.elasticsearch.index.IndexNotFoundException: no such index [index1]. Unfortunately this lacks the information about remote originally caused the issue even though we know it from failures.get(0).getIndices() (although it might be caused my multiple indices).

This information is very important for ES|QL (and for other usecases) in order to properly report index resolution issues.

I believe there are 2 ways to resolve it:

  • introduce a flag that causes FC to always resolve listener nonExceptionally with list of result and failures (basically first branch here). This way each caller might deduce the root cause of the issue on top of all exceptions and construct corresponding error message.
  • introduce a flag that causes FC to wrap failures into new exception (maybe IndexResolutionException) retaining the index information and summarizing the errors. This way error reporting/formatting logic could be the same for all callers.

I believe this should be done separately. Let me open a draft/proposal with such change.

Comment on lines +929 to +931
Set<String> concreteIndices = expression.localExpressions().expressions();
assertEquals(1, concreteIndices.size());
assertTrue(concreteIndices.contains("new_index"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Set<String> concreteIndices = expression.localExpressions().expressions();
assertEquals(1, concreteIndices.size());
assertTrue(concreteIndices.contains("new_index"));
assertThat(expression.localExpressions().expressions(), containsInAnyOrder("new_index"));

I am not a big fan of hamcrest, but I suggest we use above assertion instead (here and in other tests).
This way in case of test failure (such as more or other indices resolved) we would immediately see the actual collection contents in error message. This could save some time when debugging/reproducing the issue.

assertEquals(1, resolvedLocally.expressions().size());
ResolvedIndexExpression expression = expressions.get(0);
assertEquals("*index", expression.original());
Set<String> concreteIndices = expression.localExpressions().expressions();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was introduced in another pr, but I find naming a bit confusing.

response.getResolvedLocally().expressions() -> I assume those are expressions originally supplied by the user, all good here
response.getResolvedLocally().expressions().getFirst().localExpressions().expressions() -> the last 2 expressions are a bit confusing. Should it be something like response.getResolvedLocally().expressions().getFirst().resolution().concreteIndices()? (and corresponding LocalExpressions renamed to LocalIndexResolution). Maybe I misunderstand the content/meaning of the data structure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also confused by ResolvedIndexExpression#remoteExpressions. What does it represent and do we need to keep it in context of FC call?

response.getResolvedLocally().expressions().getFirst().remoteExpressions() and response.getResolvedRemotely().get(clusterAlias).expressions().getFirst().remoteExpressions() looks confusing.


private final String[] indices;
private final ResolvedIndexExpressions resolvedLocally;
private final transient Map<String, ResolvedIndexExpressions> resolvedRemotely;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why resolvedLocally are serialized but resolvedRemotely not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the "coordinating" node can populate resolvedRemotely, because it knows from which remote the response was from.
Also, since chaining is disabled each FieldCapsResponse can only have the locally resolved index expressions as of now.

So. If the coordinating node receives the following index expression (in ccs notation) local-index,r*:remote-index
The coordinating node will populate resolvedLocally and forward the "r*:remote-index" to all matching remotes/projects.
The remotes/projects will then "resolveLocally" and respond back to the coordinating node. So for the remotes/projects we will never have the resolvedRemotely section populated.

When the coordinator receives back the responses from the remotes will then populate its local resolvedRemotely based on the responses it gets back.
resolvedRemotely then can be used either by ES|QL code to get infos on which original index expression what resolved to what and where and/or by CPS error handling code to possibly throw errors.

Sounds reasonable?
LMK if you have any other further questions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify - is the coordinator node always the same node that sent the original FieldCapabilitiesRequest?

this.resolvedRemotely = Collections.emptyMap();
} else {
this.resolvedLocally = null;
this.resolvedRemotely = Collections.emptyMap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: identical in both branches, so can be outside if

}

/**
* Locally resolved index expressions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is wrong, originally my fault, sorry 😕

}

public static Builder threadSafeBuilder() {
return new Builder(Collections.synchronizedList(new ArrayList<>()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case ResolvedIndexExpressions for remotes on the coordinating node will always be created with a synchronized list and passed to response consumers. I don't think synchronization is strictly necessary in the Builder right now since modifications always happen through ConcurrentHashMap.computeIfAbsent and access is also through the map.
I'm fine with keeping it to prevent future concurrent bugs. But, I'd just suggest copying the list in build() to avoid synchronization when consumers use it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants