-
Notifications
You must be signed in to change notification settings - Fork 3.9k
netty: cache method path conversion #2233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Alternatives considered:
Option 1 is the best, but I am wary of putting a public accessor method on the API even if marked as |
Weak LGTM. I am unsure about this change. I don't think I would make it myself. Seems very complex, for the benefit (if there is one). Hash table lookup vs. an allocation which certainly dies young. Hmm 🤔. If deciding to not use the threadlocal approach, then the allocation due to prepending of |
|
||
import java.util.IdentityHashMap; | ||
import java.util.Map; | ||
import java.util.function.Supplier; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about this over a long bike ride, I think I am going to revert this change in favor of a cleaner solution. Sorry for the wasted review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no worries :-). Looking forward to your other solution!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carl-mastrangelo It's amazing how many insights I get during long rides as well. No review is ever a waste, rather I look at them as a natural progression :) Though is Supplier
also in Java 6? Maybe that check might not be necessary for this part of the code.
eb534bf
to
45a8f96
Compare
@buchgr PTAL This solution trades startup time to fast access time. I tried to make it somewhat shade-proof by expecting the io.grpc to always be a prefix of the io.grpc.netty @ejona86 If you like how this CL works, I think we should consider using it to unify how we access internal data across package boundaries. There are a few places today where we mark things @pgrosu yes, I find them meditatively useful. |
*/ | ||
@Internal | ||
public InternalAccessor() { | ||
StackTraceElement[] st = new Throwable().getStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carl-mastrangelo do we REALLY need to verify this? I don't know how, but I have a feeling that this might fail in some weird way on some application / build system / JVM combo :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With other internal methods, probably not, but this one is exposing mutable state.
When analyzing allocations this showed up (with a little fire icon in YourKit) which is the motivation for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd be okay with just exposing the method with @Internal
. If we want a stronger approach, maybe have an Internal
class instead that provides accessors like this? That is then more obvious to the callers they are doing something they probably shouldn't. Such a class doesn't actively prohibit using it (although it could, with the same trick here), but maybe doing so is too strong?
Looking at the class name wouldn't work too well on Android, although, yes, only Netty is involved in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of API stability, I think having a higher barrier to entry is the key. Marking classes as @Internal
has worked so far, but this is the first time we are exposing something that could really break if someone started depending on it.
I also toyed with the idea of making this hard in other ways, such as masking the return type, or giving it a weird name. For example:
@Internal
public Object internalOnlyGetRawPath();
This would force consumers to cast it, and reduce the risk of having auto complete pick it up. However, our own code would have to do the same thing, so anyone looking at it might get the idea that its okay. With the reflect guard, there is no way to misinterpret the intent.
This also conveniently works for internal constructors, where it is more difficult to change the name.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I realize this is not exactly clean, but its easy to remove later. Would you consider letting this in, under the premise that it can be removed if it causes problems later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorta okay with letting this in as-is, but I really don't see why bothering in the first place. What is so much worse about this method than, say, Metadata.serialize()
? I can understand the concern, but I don't see this as a new concern. Thus caller-verification seems orthogonal to me, and I don't understand why I have to accept the caller enforcement to receive the performance optimization. I'm driven toward doing what we've done in the past: KISS and just have an @Internal
getter.
I don't think we should be in the habit of adding things and removing them if they cause trouble. Instead we add things when they have been shown to be necessary. I see trying to merge this as-is for speed of review a false dichotomy.
Weak LGTM. I feel like the construct you introduced is very ingenious and probably a good idea if we really need access across package boundaries. I am not convinced that this is the case here. It seems like this small allocation is not going to make a difference. |
After talking with Louis, I changed the API a bit. There is no more reflection. PTAL |
Weak LGTM :). I like this approach best so far, but it just seems to me that having this allocation is not going to make a difference. Not a difference that justifies these changes, at least. |
Ran the numbers, seems like a minor ns bump. These changes are not obviously better, but thats mainly because this is a testbed for seeing how to make cross package opt. The numbers:
Direct is the original reflection based version of this PR. Old is the current way, and transportSpecific is with the AtomicRef array. I think am going to proceed with this PR. (If the reason to hide our internals so strong is not clear, we can chat offline) |
c70edc7
to
1df5ed4
Compare
Benchmark Mode Cnt Score Error Units MethodDescriptorBenchmark.direct sample 1179094 43.483 ± 0.610 ns/op MethodDescriptorBenchmark.old sample 1079285 66.210 ± 5.402 ns/op MethodDescriptorBenchmark.transportSpecific sample 1408070 36.800 ± 0.423 ns/op
1df5ed4
to
ca5a402
Compare
boolean idempotent) { | ||
private MethodDescriptor( | ||
MethodType type, String fullMethodName, | ||
Marshaller<ReqT> requestMarshaller, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indent is a bit weird here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops! I'll fix it.
LGTM. |
One other note: Assuming that the 30ns gain is real, that happens on every outbound RPC. I think we can get about 80,000qps per core, which means its about a 2ms speed up. Not a huge amount but I think it would be notice able. It is getting increasingly hard to find memory allocations that are garbage rather than non garbage. |
Well, can you show that it's noticable? :) That's 0.2%. There is so much non-determinism, I can't even measure a 1 or 2% improvement reliably :). |
Well, I can't show that it is noticeable, but I think that's because the benchmarks have ~1-3% error noise in them. To be honest, these recent CL's are more about denoising the allocation profiling. They show up there and throw the cumulative accounting off. |
No description provided.