-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing XContent SerializableString/SerializedString #104857
base: main
Are you sure you want to change the base?
Conversation
The pair works as an abstraction over cached string serialization (Jackson SerializedString), to avoid creation and copy of byte arrays every time a "symbol" (const string) is serialized (e.g. enums and field names)
Pinging @elastic/es-core-infra (Team:Core/Infra) |
|
||
public class URLRepositoryPlugin extends Plugin implements RepositoryPlugin { | ||
private final AtomicReference<URLHttpClient.Factory> httpClientFactory = new AtomicReference<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this unrelated change sneaked in, could you remove it from the changeset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly don't know why -- it's the second time :) seems something sneaked into Armin's branch and got carried over. Will revert it and double check.
@@ -31,6 +31,8 @@ public class ParseField { | |||
|
|||
private static final String[] EMPTY = new String[0]; | |||
|
|||
private final SerializableString serializableName; | |||
|
|||
private ParseField( | |||
String name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just replacing the type of String name
with SerializableString
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mosche we could do that, by introducing a SerializableString#getValue
(or even overriding toString
, why not?) to return the original value (name) and use that inside ParseField, where needed. Is that what you are meaning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes 👍 Or alternatively, given value
is final / immutable, it could also just be publicly visible.
And it's probably necessary to implement hashCode / equals then.
import java.util.function.Function; | ||
|
||
public class SerializableString { | ||
private final String name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename more generally to string
or value
?
|
||
import java.util.function.Function; | ||
|
||
public class SerializableString { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some short Javadocs explaining the purpose of the class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, will do!
public <T> T computeIfAbsent(Class<T> clazz, Function<String, T> supplier) { | ||
if (serializedString != null) { | ||
assert serializedString.getClass().equals(clazz); | ||
return clazz.cast(serializedString); | ||
} | ||
var serializedString = supplier.apply(name); | ||
this.serializedString = serializedString; | ||
return serializedString; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make initialization more deterministic? Also, given we know T
, just cast using (T)
?
public <T> T computeIfAbsent(Class<T> clazz, Function<String, T> supplier) { | |
if (serializedString != null) { | |
assert serializedString.getClass().equals(clazz); | |
return clazz.cast(serializedString); | |
} | |
var serializedString = supplier.apply(name); | |
this.serializedString = serializedString; | |
return serializedString; | |
} | |
public <T> T computeIfAbsent(Class<T> clazz, Function<String, T> factory) { | |
if (serializedString == null) { | |
synchronized (this) { | |
if (serializedString == null) { | |
return (T) (serializedString = factory.apply(name)); | |
} | |
} | |
} | |
assert serializedString.getClass().equals(clazz); | |
return (T) serializedString; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this a bit with Armin if you remember - the worst that can happen is that the String is serialized twice, and we "give out" different copies.
But then again using the double-checked locking pattern would make the synchronization block overhead negligible, so I'm OK with that!
Re: T, I feel it is cleaner to do a Class.cast
instead of (T)
, but I'm not rigid about it (unless using (T) gives us some IDE or compiler warning - if so, I'd stick with Class.cast
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to using clazz.cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remembered that this pattern is safe in C# but only under some conditions, and I am aware that Java has a weaker memory model, so I've looked up if it's safe in Java too. And it seems it's not: https://wiki.sei.cmu.edu/confluence/display/java/LCK10-J.+Use+a+correct+form+of+the+double-checked+locking+idiom
In particular I fear that the double-checking as it is written now may run into 2 problems:
- lack of happens-before relationship between the first read at line 26 and the third read at line 34 means that it is possible for the third read (line 34) to obtain a stale null value (perhaps because its value was cached or reordered by the compiler), causing the method to return a null pointer.
- return the newly created (non-null) value of the reference, but the object could be seen to be in an invalid or incorrect state - the JMM guarantees that objects are fully constructed before they become visible to any other thread only if they are immutable (sort of).
I think you can get rid of the first problem this way:
if (v != null) {
return clazz.cast(v);
}
var v = this.serializedValue; // Only un-synchronized read
synchronized (this) {
v = serializedValue; // In synchronized block, so this is safe
if (v == null) {
v = supplier.apply(value);
this.serializedValue = v;
}
}
return clazz.cast(v);
But I also think this suffers from no.2. BTW, I believe my original code suffers from no.2 as well.
Now, all parameters in jackson SerializableString are final or effectively final, so I think we are safe here, but we cannot assume it for every <T>
-- can we?
The easy fix would be to make serializedString
volatile, which was what we were hoping not to do. I'm calling @original-brownbear here to see if it's really important.
If it is, we need to find a way to make sure that threads always observe serializedValue
to be either null or a fully constructed, valid object. Let me think about it a bit more :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix on 1), you're lines are just a bit out of order.
Good point on 2, yes I think both our versions could theoretically suffer from that but should be fine for SerializedString
(with _value
being final). If we want to be safe, we could store serializedValue in an immutable helper (with a single final field), right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's my understanding too.
I think that introducing a Holder class that holds the newly created serializedValue in a final field should guarantee safe publication of the reference, at the cost of one more level of indirection.
But I'll wait for @original-brownbear or @rjernst to chime in/confirm that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that should work: https://shipilev.net/blog/2014/safe-public-construction/
Pretty much identical (the if is reversed) but for the immutable helper/final wrapper:
private FinalWrapper wrappedSerializedValue;
public <T> T computeIfAbsent(Class<T> clazz, Function<String, T> factory) {
var v = this.wrappedSerializedValue;
if (v == null) {
synchronized(this) {
v = this.wrappedSerializedValue;
if (v == null) {
v = new FinalWrapper(supplier.apply(value));
this.wrappedSerializedValue = v;
}
}
}
return clazz.cast(v.instance);
}
private static class FinalWrapper {
public final Object value;
public FinalWrapper(Object value) {
this.value = value;
}
}
But it also seems to gain little over volatile reads (that article has some good explanations and benchmarks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, lgtm :)
private ParseField( | ||
String name, | ||
Predicate<RestApiVersion> forRestApiVersion, | ||
String[] deprecatedNames, | ||
boolean fullyDeprecated, | ||
String allReplacedWith | ||
) { | ||
this.name = name; | ||
this.serializableName = SerializableString.of(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use SerializableString in the private constructor?
The pair works as an abstraction over cached string serialization (Jackson SerializedString), to avoid creation and copy of byte arrays every time a "symbol" (const string) is serialized (e.g. enums and field names).
Ref: Spacetime
#1353