-
Notifications
You must be signed in to change notification settings - Fork 25.6k
WIP Field caps transport changes to return for each original expression what it was resolved to #136632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
listener.onResponse(FieldCapabilitiesResponse.builder().withFailures(failures).build()); | ||
} else { | ||
// throw back the first exception | ||
listener.onFailure(failures.get(0).getException()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is a bit tricky for CCS/CPS index resolution. As commented it literally re-throws back the first exception as is it was reported by remote.
For example for FieldCapsRequest(indices=[r1:index1,r2:index1,r3:index1])
(note index names are the same, only remotes are different), it could throw something like: org.elasticsearch.transport.RemoteTransportException
with a cause org.elasticsearch.index.IndexNotFoundException: no such index [index1]
. Unfortunately this lacks the information about remote originally caused the issue even though we know it from failures.get(0).getIndices()
(although it might be caused my multiple indices).
This information is very important for ES|QL (and for other usecases) in order to properly report index resolution issues.
I believe there are 2 ways to resolve it:
- introduce a flag that causes FC to always resolve listener nonExceptionally with list of result and failures (basically first branch here). This way each caller might deduce the root cause of the issue on top of all exceptions and construct corresponding error message.
- introduce a flag that causes FC to wrap failures into new exception (maybe IndexResolutionException) retaining the index information and summarizing the errors. This way error reporting/formatting logic could be the same for all callers.
I believe this should be done separately. Let me open a draft/proposal with such change.
Set<String> concreteIndices = expression.localExpressions().expressions(); | ||
assertEquals(1, concreteIndices.size()); | ||
assertTrue(concreteIndices.contains("new_index")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set<String> concreteIndices = expression.localExpressions().expressions(); | |
assertEquals(1, concreteIndices.size()); | |
assertTrue(concreteIndices.contains("new_index")); | |
assertThat(expression.localExpressions().expressions(), containsInAnyOrder("new_index")); |
I am not a big fan of hamcrest, but I suggest we use above assertion instead (here and in other tests).
This way in case of test failure (such as more or other indices resolved) we would immediately see the actual collection contents in error message. This could save some time when debugging/reproducing the issue.
assertEquals(1, resolvedLocally.expressions().size()); | ||
ResolvedIndexExpression expression = expressions.get(0); | ||
assertEquals("*index", expression.original()); | ||
Set<String> concreteIndices = expression.localExpressions().expressions(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was introduced in another pr, but I find naming a bit confusing.
response.getResolvedLocally().expressions()
-> I assume those are expressions originally supplied by the user, all good here
response.getResolvedLocally().expressions().getFirst().localExpressions().expressions()
-> the last 2 expressions are a bit confusing. Should it be something like response.getResolvedLocally().expressions().getFirst().resolution().concreteIndices()
? (and corresponding LocalExpressions
renamed to LocalIndexResolution
). Maybe I misunderstand the content/meaning of the data structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also confused by ResolvedIndexExpression#remoteExpressions
. What does it represent and do we need to keep it in context of FC call?
response.getResolvedLocally().expressions().getFirst().remoteExpressions()
and response.getResolvedRemotely().get(clusterAlias).expressions().getFirst().remoteExpressions()
looks confusing.
|
||
private final String[] indices; | ||
private final ResolvedIndexExpressions resolvedLocally; | ||
private final transient Map<String, ResolvedIndexExpressions> resolvedRemotely; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why resolvedLocally
are serialized but resolvedRemotely
not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the "coordinating" node can populate resolvedRemotely, because it knows from which remote the response was from.
Also, since chaining is disabled each FieldCapsResponse can only have the locally resolved index expressions as of now.
So. If the coordinating node receives the following index expression (in ccs notation) local-index,r*:remote-index
The coordinating node will populate resolvedLocally and forward the "r*:remote-index" to all matching remotes/projects.
The remotes/projects will then "resolveLocally" and respond back to the coordinating node. So for the remotes/projects we will never have the resolvedRemotely section populated.
When the coordinator receives back the responses from the remotes will then populate its local resolvedRemotely
based on the responses it gets back.
resolvedRemotely
then can be used either by ES|QL code to get infos on which original index expression what resolved to what and where and/or by CPS error handling code to possibly throw errors.
Sounds reasonable?
LMK if you have any other further questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify - is the coordinator node always the same node that sent the original FieldCapabilitiesRequest?
this.resolvedRemotely = Collections.emptyMap(); | ||
} else { | ||
this.resolvedLocally = null; | ||
this.resolvedRemotely = Collections.emptyMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: identical in both branches, so can be outside if
} | ||
|
||
/** | ||
* Locally resolved index expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is wrong, originally my fault, sorry 😕
} | ||
|
||
public static Builder threadSafeBuilder() { | ||
return new Builder(Collections.synchronizedList(new ArrayList<>())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case ResolvedIndexExpressions
for remotes on the coordinating node will always be created with a synchronized list and passed to response consumers. I don't think synchronization is strictly necessary in the Builder
right now since modifications always happen through ConcurrentHashMap.computeIfAbsent
and access is also through the map.
I'm fine with keeping it to prevent future concurrent bugs. But, I'd just suggest copying the list in build()
to avoid synchronization when consumers use it
50% code from @alexey-ivanov-es 😄
Description to be added before I transition this PR to be ready for review