Skip to content

Conversation

carl-mastrangelo
Copy link
Contributor

No description provided.

@carl-mastrangelo
Copy link
Contributor Author

carl-mastrangelo commented Sep 2, 2016

Alternatives considered:

  1. Cache the transformation inside MethodDescriptor. This is difficult to expose to the transport layer, since it would be giving access to the mutable byte[]. It would also still cost the allocation of an AsciiString
  2. Use a single static synchronized map in Netty to cache transformation. This adds synchronization even though it saves memory.
  3. Use a netty fast thread local. Since Users provide their own thread factory, it doesn't seem possible to assume we are using the necessary thread factory needed. FastThreadLocal says that it needs this.

Option 1 is the best, but I am wary of putting a public accessor method on the API even if marked as Internal. We are already finding places where users are abusing our API quirks.

@buchgr
Copy link
Contributor

buchgr commented Sep 5, 2016

Weak LGTM.

I am unsure about this change. I don't think I would make it myself. Seems very complex, for the benefit (if there is one). Hash table lookup vs. an allocation which certainly dies young. Hmm 🤔. If deciding to not use the threadlocal approach, then the allocation due to prepending of / might be removed by EA, however to be sure you could allocate a correctly sized byte array, and prepend / "manually".


import java.util.IdentityHashMap;
import java.util.Map;
import java.util.function.Supplier;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about this over a long bike ride, I think I am going to revert this change in favor of a cleaner solution. Sorry for the wasted review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries :-). Looking forward to your other solution!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carl-mastrangelo It's amazing how many insights I get during long rides as well. No review is ever a waste, rather I look at them as a natural progression :) Though is Supplier also in Java 6? Maybe that check might not be necessary for this part of the code.

@carl-mastrangelo
Copy link
Contributor Author

@buchgr PTAL

This solution trades startup time to fast access time. I tried to make it somewhat shade-proof by expecting the io.grpc to always be a prefix of the io.grpc.netty

@ejona86 If you like how this CL works, I think we should consider using it to unify how we access internal data across package boundaries. There are a few places today where we mark things @Internal but don't really have any teeth. This is a clean and enforceable way to set package boundaries. Someone would have to resort to reflection to bypass them, which is what they would have to do if the fields were actually private.

@pgrosu yes, I find them meditatively useful.

*/
@Internal
public InternalAccessor() {
StackTraceElement[] st = new Throwable().getStackTrace();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carl-mastrangelo do we REALLY need to verify this? I don't know how, but I have a feeling that this might fail in some weird way on some application / build system / JVM combo :).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With other internal methods, probably not, but this one is exposing mutable state.

When analyzing allocations this showed up (with a little fire icon in YourKit) which is the motivation for this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd be okay with just exposing the method with @Internal. If we want a stronger approach, maybe have an Internal class instead that provides accessors like this? That is then more obvious to the callers they are doing something they probably shouldn't. Such a class doesn't actively prohibit using it (although it could, with the same trick here), but maybe doing so is too strong?

Looking at the class name wouldn't work too well on Android, although, yes, only Netty is involved in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of API stability, I think having a higher barrier to entry is the key. Marking classes as @Internal has worked so far, but this is the first time we are exposing something that could really break if someone started depending on it.

I also toyed with the idea of making this hard in other ways, such as masking the return type, or giving it a weird name. For example:

@Internal
public Object internalOnlyGetRawPath();

This would force consumers to cast it, and reduce the risk of having auto complete pick it up. However, our own code would have to do the same thing, so anyone looking at it might get the idea that its okay. With the reflect guard, there is no way to misinterpret the intent.

This also conveniently works for internal constructors, where it is more difficult to change the name.

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I realize this is not exactly clean, but its easy to remove later. Would you consider letting this in, under the premise that it can be removed if it causes problems later?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorta okay with letting this in as-is, but I really don't see why bothering in the first place. What is so much worse about this method than, say, Metadata.serialize()? I can understand the concern, but I don't see this as a new concern. Thus caller-verification seems orthogonal to me, and I don't understand why I have to accept the caller enforcement to receive the performance optimization. I'm driven toward doing what we've done in the past: KISS and just have an @Internal getter.

I don't think we should be in the habit of adding things and removing them if they cause trouble. Instead we add things when they have been shown to be necessary. I see trying to merge this as-is for speed of review a false dichotomy.

@buchgr
Copy link
Contributor

buchgr commented Sep 6, 2016

Weak LGTM.

I feel like the construct you introduced is very ingenious and probably a good idea if we really need access across package boundaries. I am not convinced that this is the case here. It seems like this small allocation is not going to make a difference.

Would be curious to know what @ejona86 or @nmittler think.

@carl-mastrangelo
Copy link
Contributor Author

After talking with Louis, I changed the API a bit. There is no more reflection. PTAL

@buchgr
Copy link
Contributor

buchgr commented Sep 7, 2016

Weak LGTM :). I like this approach best so far, but it just seems to me that having this allocation is not going to make a difference. Not a difference that justifies these changes, at least.

@carl-mastrangelo
Copy link
Contributor Author

@buchgr

Ran the numbers, seems like a minor ns bump. These changes are not obviously better, but thats mainly because this is a testbed for seeing how to make cross package opt. The numbers:

Benchmark                                      Mode      Cnt   Score   Error  Units
MethodDescriptorBenchmark.direct             sample  1179094  43.483 ± 0.610  ns/op
MethodDescriptorBenchmark.old                sample  1079285  66.210 ± 5.402  ns/op
MethodDescriptorBenchmark.transportSpecific  sample  1408070  36.800 ± 0.423  ns/op

Direct is the original reflection based version of this PR. Old is the current way, and transportSpecific is with the AtomicRef array.

I think am going to proceed with this PR. (If the reason to hide our internals so strong is not clear, we can chat offline)

Benchmark                                      Mode      Cnt   Score   Error  Units
MethodDescriptorBenchmark.direct             sample  1179094  43.483 ± 0.610  ns/op
MethodDescriptorBenchmark.old                sample  1079285  66.210 ± 5.402  ns/op
MethodDescriptorBenchmark.transportSpecific  sample  1408070  36.800 ± 0.423  ns/op
@carl-mastrangelo carl-mastrangelo merged commit ca5a402 into grpc:master Sep 7, 2016
@carl-mastrangelo carl-mastrangelo deleted the noMethodAlloc branch September 7, 2016 21:41
boolean idempotent) {
private MethodDescriptor(
MethodType type, String fullMethodName,
Marshaller<ReqT> requestMarshaller,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent is a bit weird here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops! I'll fix it.

@ejona86
Copy link
Member

ejona86 commented Sep 7, 2016

LGTM.

@carl-mastrangelo
Copy link
Contributor Author

One other note: Assuming that the 30ns gain is real, that happens on every outbound RPC. I think we can get about 80,000qps per core, which means its about a 2ms speed up. Not a huge amount but I think it would be notice able. It is getting increasingly hard to find memory allocations that are garbage rather than non garbage.

@buchgr
Copy link
Contributor

buchgr commented Sep 8, 2016

Well, can you show that it's noticable? :) That's 0.2%. There is so much non-determinism, I can't even measure a 1 or 2% improvement reliably :).

@carl-mastrangelo
Copy link
Contributor Author

Well, I can't show that it is noticeable, but I think that's because the benchmarks have ~1-3% error noise in them.

To be honest, these recent CL's are more about denoising the allocation profiling. They show up there and throw the cumulative accounting off.

@lock lock bot locked as resolved and limited conversation to collaborators Jan 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants