Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing XContent SerializableString/SerializedString #104857

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ldematte
Copy link
Contributor

@ldematte ldematte commented Jan 29, 2024

The pair works as an abstraction over cached string serialization (Jackson SerializedString), to avoid creation and copy of byte arrays every time a "symbol" (const string) is serialized (e.g. enums and field names).

Ref: Spacetime #1353

The pair works as an abstraction over cached string serialization (Jackson SerializedString), to avoid creation and copy of byte arrays every time a "symbol" (const string) is serialized (e.g. enums and field names)
@ldematte ldematte added :Core/Infra/Core Core issues without another label >non-issue labels Jan 29, 2024
@ldematte ldematte marked this pull request as ready for review January 29, 2024 13:06
@ldematte ldematte requested a review from mosche January 29, 2024 13:06
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Jan 29, 2024

public class URLRepositoryPlugin extends Plugin implements RepositoryPlugin {
private final AtomicReference<URLHttpClient.Factory> httpClientFactory = new AtomicReference<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this unrelated change sneaked in, could you remove it from the changeset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't know why -- it's the second time :) seems something sneaked into Armin's branch and got carried over. Will revert it and double check.

@@ -31,6 +31,8 @@ public class ParseField {

private static final String[] EMPTY = new String[0];

private final SerializableString serializableName;

private ParseField(
String name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just replacing the type of String name with SerializableString?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mosche we could do that, by introducing a SerializableString#getValue (or even overriding toString, why not?) to return the original value (name) and use that inside ParseField, where needed. Is that what you are meaning?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 👍 Or alternatively, given value is final / immutable, it could also just be publicly visible.
And it's probably necessary to implement hashCode / equals then.

import java.util.function.Function;

public class SerializableString {
private final String name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename more generally to string or value?


import java.util.function.Function;

public class SerializableString {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some short Javadocs explaining the purpose of the class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, will do!

Comment on lines 25 to 33
public <T> T computeIfAbsent(Class<T> clazz, Function<String, T> supplier) {
if (serializedString != null) {
assert serializedString.getClass().equals(clazz);
return clazz.cast(serializedString);
}
var serializedString = supplier.apply(name);
this.serializedString = serializedString;
return serializedString;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make initialization more deterministic? Also, given we know T, just cast using (T)?

Suggested change
public <T> T computeIfAbsent(Class<T> clazz, Function<String, T> supplier) {
if (serializedString != null) {
assert serializedString.getClass().equals(clazz);
return clazz.cast(serializedString);
}
var serializedString = supplier.apply(name);
this.serializedString = serializedString;
return serializedString;
}
public <T> T computeIfAbsent(Class<T> clazz, Function<String, T> factory) {
if (serializedString == null) {
synchronized (this) {
if (serializedString == null) {
return (T) (serializedString = factory.apply(name));
}
}
}
assert serializedString.getClass().equals(clazz);
return (T) serializedString;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this a bit with Armin if you remember - the worst that can happen is that the String is serialized twice, and we "give out" different copies.
But then again using the double-checked locking pattern would make the synchronization block overhead negligible, so I'm OK with that!

Re: T, I feel it is cleaner to do a Class.cast instead of (T), but I'm not rigid about it (unless using (T) gives us some IDE or compiler warning - if so, I'd stick with Class.cast)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to using clazz.cast

Copy link
Contributor Author

@ldematte ldematte Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remembered that this pattern is safe in C# but only under some conditions, and I am aware that Java has a weaker memory model, so I've looked up if it's safe in Java too. And it seems it's not: https://wiki.sei.cmu.edu/confluence/display/java/LCK10-J.+Use+a+correct+form+of+the+double-checked+locking+idiom
In particular I fear that the double-checking as it is written now may run into 2 problems:

  1. lack of happens-before relationship between the first read at line 26 and the third read at line 34 means that it is possible for the third read (line 34) to obtain a stale null value (perhaps because its value was cached or reordered by the compiler), causing the method to return a null pointer.
  2. return the newly created (non-null) value of the reference, but the object could be seen to be in an invalid or incorrect state - the JMM guarantees that objects are fully constructed before they become visible to any other thread only if they are immutable (sort of).

I think you can get rid of the first problem this way:

if (v != null) {
   return clazz.cast(v);
}
var v = this.serializedValue;      // Only un-synchronized read
synchronized (this) {
      v = serializedValue;          // In synchronized block, so this is safe
      if (v == null) {
         v = supplier.apply(value);
         this.serializedValue = v; 
      }
}
return clazz.cast(v);

But I also think this suffers from no.2. BTW, I believe my original code suffers from no.2 as well.
Now, all parameters in jackson SerializableString are final or effectively final, so I think we are safe here, but we cannot assume it for every <T> -- can we?

The easy fix would be to make serializedString volatile, which was what we were hoping not to do. I'm calling @original-brownbear here to see if it's really important.

If it is, we need to find a way to make sure that threads always observe serializedValue to be either null or a fully constructed, valid object. Let me think about it a bit more :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix on 1), you're lines are just a bit out of order.
Good point on 2, yes I think both our versions could theoretically suffer from that but should be fine for SerializedString (with _value being final). If we want to be safe, we could store serializedValue in an immutable helper (with a single final field), right?

Copy link
Contributor Author

@ldematte ldematte Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's my understanding too.
I think that introducing a Holder class that holds the newly created serializedValue in a final field should guarantee safe publication of the reference, at the cost of one more level of indirection.
But I'll wait for @original-brownbear or @rjernst to chime in/confirm that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that should work: https://shipilev.net/blog/2014/safe-public-construction/
Pretty much identical (the if is reversed) but for the immutable helper/final wrapper:

private FinalWrapper wrappedSerializedValue;

public <T> T computeIfAbsent(Class<T> clazz, Function<String, T> factory) {
    var v = this.wrappedSerializedValue;
    if (v == null) {
      synchronized(this) {
        v = this.wrappedSerializedValue;
        if (v == null) {
          v = new FinalWrapper(supplier.apply(value));
          this.wrappedSerializedValue = v; 
        }
      }
    }
    return clazz.cast(v.instance);
  }

  private static class FinalWrapper {
    public final Object value;
    public FinalWrapper(Object value) {
      this.value = value;
    }
  }

But it also seems to gain little over volatile reads (that article has some good explanations and benchmarks)

Copy link
Contributor

@mosche mosche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, lgtm :)

private ParseField(
String name,
Predicate<RestApiVersion> forRestApiVersion,
String[] deprecatedNames,
boolean fullyDeprecated,
String allReplacedWith
) {
this.name = name;
this.serializableName = SerializableString.of(name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use SerializableString in the private constructor?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label >non-issue Team:Core/Infra Meta label for core/infra team v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants