Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow any map key type when serialising hash maps #88686

Conversation

salvatore-campagna
Copy link
Contributor

@salvatore-campagna salvatore-campagna commented Jul 21, 2022

Before this change only maps with String keys
were supported. There is no specific reason
why we should not support non-String keys
too, provided that they are serialisable. With
this PR we include support for other map key
types.

Resolves #66057

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v8.4.0 labels Jul 21, 2022
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed needs:triage Requires assignment of a team area label labels Jul 21, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @salvatore-campagna, I've created a changelog YAML for you.

@salvatore-campagna salvatore-campagna added the test-full-bwc Trigger full BWC version matrix tests label Jul 21, 2022
@nik9000 nik9000 added the :Core/Infra/Core Core issues without another label label Jul 21, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Jul 21, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@not-napoleon
Copy link
Member

Is StreamOutput the right place to fix this issue? I'm worried that will impact more than just scripted metric. Maybe that's the right thing to do though, I haven't thought about it. But I need a little more explanation of what's going on here.

@nik9000
Copy link
Member

nik9000 commented Jul 21, 2022

I think it'd be kinder to catch this when the script tries to write a non-string key into the map. Painless doesn't support generics and the API we give for these is Map - which can have Object keys. We could call toString on all the keys on the way in. Or fail any that aren't strings. That'd be kindest, I think.

@nik9000
Copy link
Member

nik9000 commented Jul 21, 2022

That'd be kindest, I think.

@jdconrad mentioned it could be possible to make a StringObjectMap implements Map<String, Object>. Painless would see that type. It could just wrap a regular HashMap.

@nik9000
Copy link
Member

nik9000 commented Jul 21, 2022

Er, well, maybe we shouldn't do this. If you had a stored script that required the old type signature then it won't compile and it'll fail to start the node. That's terrible.... Maybe better to just do it at runtime. It'll be more expensive, but scripted metric isn't fast.

@rjernst
Copy link
Member

rjernst commented Jul 21, 2022

What is the behavior we see today? We are already doing a hard cast to Map<String, Object>, so I think a class cast exception would already occur?

Technically we could support any key type, it just needs to be (1) hashable and (2) supported by StreamInput/StreamOutput. Today we assume String, but we could not do the cast to Map<String,Object>, and then call writeGenericValue on the key, just like we do for the value?

@salvatore-campagna
Copy link
Contributor Author

salvatore-campagna commented Jul 22, 2022

Is StreamOutput the right place to fix this issue? I'm worried that will impact more than just scripted metric. Maybe that's the right thing to do though, I haven't thought about it. But I need a little more explanation of what's going on here.

I suspected this but I was not sure where those values are generated. I also tried a solution where I just toString the key no matter what the type is. In that case we don't have an error anymore, but I didn't go for that, because from the original issue my understanding is that we want to give back an error. Is my understanding wrong?

Anyway I am not sure I am changing the behaviour of other parts because:

  1. if the key is a String result stays the same.
  2. if it is not a String the difference is that we have an IllegalArgumentException instead of a ClassCastException both being RuntimeException.

Should we go for a solution using key.toString()? What about the (string) format in that case? In this case, anyway, we don't generate an error anymore...which means we are actually changing the behaviour.

@salvatore-campagna
Copy link
Contributor Author

What is the behavior we see today? We are already doing a hard cast to Map<String, Object>, so I think a class cast exception would already occur?

Technically we could support any key type, it just needs to be (1) hashable and (2) supported by StreamInput/StreamOutput. Today we assume String, but we could not do the cast to Map<String,Object>, and then call writeGenericValue on the key, just like we do for the value?

As it is now, without changes, this results in a ClassCastException.

Caused by: java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.String (java.lang.Long and java.lang.String are in module java.base of loader 'bootstrap')
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.common.io.stream.StreamOutput.lambda$static$15(StreamOutput.java:695)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.common.io.stream.StreamOutput.writeGenericValue(StreamOutput.java:820)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.common.io.stream.StreamOutput.writeCollection(StreamOutput.java:1160)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.search.aggregations.metrics.InternalScriptedMetric.doWriteTo(InternalScriptedMetric.java:70)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.search.aggregations.InternalAggregation.writeTo(InternalAggregation.java:60)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.common.io.stream.StreamOutput.writeNamedWriteable(StreamOutput.java:1091)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.common.io.stream.StreamOutput.writeNamedWriteableList(StreamOutput.java:1213)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.search.aggregations.InternalAggregations.writeTo(InternalAggregations.java:67)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.common.io.stream.DelayableWriteable.getSerializedSize(DelayableWriteable.java:218)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.common.io.stream.DelayableWriteable$Referencing.getSerializedSize(DelayableWriteable.java:121)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.action.search.QueryPhaseResultConsumer$PendingMerges.ramBytesUsedQueryResult(QueryPhaseResultConsumer.java:313)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.action.search.QueryPhaseResultConsumer$PendingMerges.consume(QueryPhaseResultConsumer.java:339)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.action.search.QueryPhaseResultConsumer.consumeResult(QueryPhaseResultConsumer.java:109)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardResult(AbstractSearchAsyncAction.java:598)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.action.search.SearchQueryThenFetchAsyncAction.onShardResult(SearchQueryThenFetchAsyncAction.java:120)
        at org.elasticsearch.server@8.4.0-SNAPSHOT/org.elasticsearch.action.search.AbstractSearchAsyncAction$1.innerOnResponse(AbstractSearchAsyncAction.java:338)

@salvatore-campagna
Copy link
Contributor Author

salvatore-campagna commented Jul 22, 2022

I understand that the problem is here:

"map_script": "state.test[doc.instanceId.value] = 1",

The script creates a map where doc.InstanceId.value is a Long...which then results in the creation of a Map<Long, Object> and now I am wondering: Why do we have to reject keys that are not strings? As long as keys are hash-able they should be fine right?

@salvatore-campagna
Copy link
Contributor Author

I have the feeling my last commit might break a few tests...

@salvatore-campagna salvatore-campagna changed the title fix: do not allow map key types other than String Allow any map key type when serialising hash maps Jul 22, 2022
@salvatore-campagna
Copy link
Contributor Author

salvatore-campagna commented Jul 22, 2022

I had a look at ScriptedMetricAggContexts, InternalScriptedMetric and so on. I see they use Map<String, Object> to collect params and state. It is not clear to me why the use String as a key type. I guess that is bacause the state, for instance, uses a map of string to save some kind of context where the key is just the name used in the script. I also understand that this issue might be dealt with fixing it at "scripted metric level" other then in StreamOutput/StreamInput. I don't get, anyway, why in StreamOutput/StreamInput we only deal with maps whose key type is String. Could it be that allowing any type other than String we might hit performance issues due to serialisation and/or hashing overhead?

@salvatore-campagna salvatore-campagna self-assigned this Nov 9, 2022
@kingherc kingherc added v8.7.0 and removed v8.6.0 labels Nov 16, 2022
@salvatore-campagna
Copy link
Contributor Author

salvatore-campagna commented Dec 16, 2022

I think the BWC issue here is because we have a lot of places where maps are declared like

Map<String, Object>

while, according to this change we would need a

Map<Object, Object>

I tried to replace some of them but the change propagates to a very large number of classes.
Moreover, in some places we use writeMapWithConsistentOrder which sorts by key...keys that need to be Comparable, not the case for Object.

Sorting maps requires keys to be comparamble. As a result,
we can't use a generic key.
@salvatore-campagna
Copy link
Contributor Author

I have no clue why the BWC tests are failing...it looks correct to me.

@rjernst
Copy link
Member

rjernst commented Dec 16, 2022

This was a tricky issue to find! The mismatch is because you are using version directly. If you use getVersion(), the tests will pass. The reason is FilterStreamInput is used, which overrides getVersion(). That can then create a mismatch of versions which cause writing one way but reading in the other way. IMO this is a bug (or trappiness) in StreamInput. While it would be better if we could enforce using getVersion(), I don't see any way to do that, so i've raised #92422 to at least make it so this.version is correct.

rjernst added a commit to rjernst/elasticsearch that referenced this pull request Dec 16, 2022
When version is set on FilterStreamInput, it is passed to the delegate
stream. However, any uses of this.version directly on the stream are not
correct. While it would be better to force using getVersion()
consistently, this commit makes the situation less trappy by also
setting the version on the wrapper stream.

relates elastic#88686
elasticsearchmachine pushed a commit that referenced this pull request Dec 16, 2022
When version is set on FilterStreamInput, it is passed to the delegate
stream. However, any uses of this.version directly on the stream are not
correct. While it would be better to force using getVersion()
consistently, this commit makes the situation less trappy by also
setting the version on the wrapper stream.

relates #88686
@salvatore-campagna
Copy link
Contributor Author

salvatore-campagna commented Dec 19, 2022

This was a tricky issue to find! The mismatch is because you are using version directly. If you use getVersion(), the tests will pass. The reason is FilterStreamInput is used, which overrides getVersion(). That can then create a mismatch of versions which cause writing one way but reading in the other way. IMO this is a bug (or trappiness) in StreamInput. While it would be better if we could enforce using getVersion(), I don't see any way to do that, so i've raised #92422 to at least make it so this.version is correct.

Ok now I understand. May I ask how did you find that? Just step by step debugging? I spent quite some time debugging this without realizing that could be the issue.

If getVersion is overridded a mismatch is possible between
version and getVersion. As a result of this mismatch it is
possible that version checking on reading and writing ends
is done incorrectly.
@salvatore-campagna
Copy link
Contributor Author

🥳 thanks @rjernst

@rjernst
Copy link
Member

rjernst commented Dec 19, 2022

May I ask how did you find that?

it was time consuming. I started by repeating one of the tests that failed. That test was failing on deserializing runtime mappings in a search source builder, so I added printouts on several Stream Input/Output methods regarding maps. Eventually I saw the write side was writing a generic string, but the read side was reading a boolean. I realized the boolean type id 5 was the size of the string, and then added a couple more printouts that showed it was using the new code on the write side, but then old code on the read side. After staring at the read code for a while, I wondered what would happen if getVersion were overridden, and that’s when I saw FilterStreamInput did just that.

Debugging the transport protocol can be quite tedious. I’m sure there are more efficient ways than what I did above, but in any case I think long term we need a better way to debug (I have an idea for this for a future spacetime project).

@salvatore-campagna
Copy link
Contributor Author

@nik9000 @rjernst do you think this should be backported?

@salvatore-campagna
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/full-bwc please

@rjernst
Copy link
Member

rjernst commented Dec 22, 2022

I would not worry about backporting unless there is a compelling reason to do so (eg an anticipated PR that would want to use a map with an object key). However, without backporting it might make all future PRs (to 7.17) that have a map more difficult (though I expect since they are just calling the methods, there will not be a conflict). It really comes down to how easy the backport applies, if it's at all difficult, I wouldn't worry about it.

@salvatore-campagna
Copy link
Contributor Author

@rjernst @nik9000 to day I am going to have another quick review...anyway I need at least an approval to merge this.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I think fixing this in InternalScriptedMetric would be tedious (given that it just serialises a generic list). The fix in StreamOutput/StreamInput is smaller and shouldn't harm existing usages of maps in the internal protocol.

o.writeGenericValue(entry.getValue());
if (o.getVersion().onOrAfter(Version.V_8_7_0)) {
@SuppressWarnings("unchecked")
final Map<Object, Object> map = (Map<Object, Object>) v;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe final Map<?, ?> map = (Map<?, ?>) v; ? Then @SuppressWarnings("unchecked") can be removed.

o.writeMap(map, StreamOutput::writeGenericValue, StreamOutput::writeGenericValue);
} else {
@SuppressWarnings("unchecked")
final Map<String, Object> map = (Map<String, Object>) v;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same?

Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the key needs to be a String since we need to use StreamOutput::writeString below. Using a String requires an unchecked cast here (but not above).

We need to explicitly cast the map so to have String keys
because of the call writeMap taking StreamOutput::writeString.
As a result, here we need the unchecked cast and we need the
SuppressWarnings.
@salvatore-campagna salvatore-campagna merged commit 8f8f55e into elastic:main Jan 10, 2023
danielmitterdorfer pushed a commit to danielmitterdorfer/elasticsearch that referenced this pull request Jan 12, 2023
Before this change only maps with String keys were supported.
There is no reason why we should not support non-String keys
too, provided that they are serialisable. With this PR we include
support for other map key types.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug :Core/Infra/Core Core issues without another label Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Core/Infra Meta label for core/infra team test-full-bwc Trigger full BWC version matrix tests v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scripted Metric Aggregation Failure During Serialization