Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PAYARA-3468 MP FT 2.0 update #3911

Merged
merged 32 commits into from May 1, 2019

Conversation

Projects
None yet
4 participants
@jbee
Copy link
Contributor

commented Apr 23, 2019

Implements MicroProfile Fault-Tolerance 2.0.

While 2.0 did not add a lot of new features larger changes were done for two reasons:

  1. To make the interaction between different FT behaviours compliant with 2.0 semantics some form of restructuring was required, partly due to semantics being slightly extended and more strictly defined in 2.0, partly because some cases were not compliant before. E.g. @Fallback needed to be allowed alone and in any combination with other annotations. The best solution to most of the interaction and processing problems seemed to be to merge the interceptor into one.
  2. The implementation should become more modular, less repetitive, (unit) testable and easier to follow by introducing abstractions. While non of this was strictly necessary I think its time well spend. Otherwise this time would have gone into lengthy debugging sessions looking for errors. Also this was a prerequisite for the unit tests I wanted to add for TCK tests that cannot be run due to test setup problems.

Change summary:

  • Interceptors have been merged into FaultToleranceInterceptor that activates on new FaultTolerance marker annotation (FT handling itself was further extracted into FaultTolerancePolicy).
  • new FaultTolerance annotation is added (at runtime) to all methods with FT annotations (that is the solution to do "one interceptor to rule them all")
  • FaultToleranceExtension now handles interceptor priority changes done via Config (missing feature)
  • Validation was moved from annotation processing time to invocation time (cached) as well as annotation processing time to validate both the annotations but also the actual values used after overrides from Config would be applied. This is captured in the overall FaultTolerancePolicy that should be apply to a method. It combines all possible FT policy values for a method. Each of the policies has the same fields as the annotation they represent just that they hold the values after overrides were applied. This makes validation now effectively enforced by construction. Using invalid policies is no longer possible.
  • Most of the policy analysis moved from each invocation to once per method on first invocation to reduce the overhead of now validating actually used policies. To still allow changing Config overrides at runtime the used policy has a TTL of a minute after which it is recreated.
  • Use of Config and override logic was extracted and abstracted into FaultToleranceConfig interface
  • Use of MetricsRegistry and key names was extracted and abstracted into FaultToleranceMetrics interface
  • Use of any "services" was abstracted into FaultToleranceService interface, the implementation was renamed (and somewhat re-purposed) to FaultToleranceServiceImpl.
  • FaultToleranceObject was renamed FaultToleranceApplicationState and adopted to other changes
  • BulkheadSemaphore class was created to contain bulkhead specific requirements on the basic Semaphore.
  • Fallback method lookup is much more sophisticated taking method parameter types and inheritance into account (see MethodLookupUtils; actually checked by TCK)
  • Couple of other smaller changes needed to decouple things to the point where they could be tested more isolated.
  • Moved all the service implementation specific "private" classes into new package service

Overall the FT logic moved from the interceptors into FaultTolerancePolicy where each annotation is handled by a method. These methods are called in a fixed chain, each representing a stage of the overall FT handling as required for the policy in place.

Logging

  • Level FINER is used for execution status information.
  • Level FINE is used for "event-like" information.

Tests & Testing:
All tests of the TCK that need to be excluded since they expect an unwrapped exception where replicated in added unit tests so that we do test correct behaviour. In addition I added tests for config overrides because the logic is somewhat confusing and not very well illustrated by the TCK tests. Last but not least I added tests for the asynchronous error handling since I discovered that the TCK has very little coverage on this important aspect (and indeed I did find another error in the implementation when adding the tests).

For most unit tests there is a corresponding method with FT annotations. The test method and the corresponding method under test are linked via name convention. The method under test has the name <test-method-name>_Method (which I took from some TCK test).

FYI: With the added tests the coverage of the module is now ~65% with main logic being around 80% covered. The 73 tests run in less then a second.

jbee added some commits Mar 29, 2019

PAYARA-3468 TCK bulkhead, circuit breaker, config, disableEnv and fal…
…lbackmethod PASS; added unit tests for fallback method lookup and validation taken from TCK scenarios

@jbee jbee self-assigned this Apr 23, 2019

@jbee jbee requested a review from Pandrex247 Apr 23, 2019

@jbee

This comment has been minimized.

Copy link
Contributor Author

commented Apr 23, 2019

jenkins test please

@jbee

This comment has been minimized.

Copy link
Contributor Author

commented Apr 25, 2019

jenkins test please

@pdudits
Copy link
Contributor

left a comment

At least the typo should be corrected ;)

Show resolved Hide resolved ...fish/payara/microprofile/faulttolerance/cdi/FaultToleranceExtension.java Outdated
Show resolved Hide resolved ...fish/payara/microprofile/faulttolerance/cdi/FaultToleranceExtension.java
* A simple cache with a fix {@link #TTL} with a policy for each target method.
*/
private static final ConcurrentHashMap<Class<?>, ConcurrentHashMap<Method, FaultTolerancePolicy>> POLICY_BY_METHOD
= new ConcurrentHashMap<>();

This comment has been minimized.

Copy link
@pdudits

pdudits Apr 26, 2019

Contributor

Adding an actual tuple type for key (Class, Method) might simplify working with this map

This comment has been minimized.

Copy link
@jbee

jbee Apr 29, 2019

Author Contributor

Originally this was intentionally done this way to avoid object creation on lookup. I was hoping to not have to create garbage for each invocation. Later this turned out to be very difficult so there will be 2-3 garbage objects per invocation. We could change this and make it a bit more garbage and it certainly would increase readability. On the other hand there are just 2 methods using this so the simplification isn't that big and not creating an additional object might still be worth it. WDYT?

This comment has been minimized.

Copy link
@arjantijms

arjantijms Apr 30, 2019

Member

My gut feeling says don't optimise at this level until it's really proven to be a bottleneck (e.g. this being in a tight loop with thousands of lookups). In CDI these temp objects might be dwarfed by all the other things going on.

Also, in select cases the JVM will allocate objects on the stack (if escape analysis proves they can't escape their scope), making object creation for temp object very cheap).

This comment has been minimized.

Copy link
@jbee

jbee Apr 30, 2019

Author Contributor

It was not really applied optimizing. I saw two options: use a key class or do a nested map. I knew there was only two usages of the structure and nested maps had the benefit of avoiding garbage objects so I went for that option. Making this decision a big thing is maybe also wrong focus. Only reason I did not change it later was that it does not make much sense to put more work into something that makes so little difference as its scope is and will be tiny.

This comment has been minimized.

Copy link
@pdudits

pdudits Apr 30, 2019

Contributor

I was more looking at this from code readability point of view, as operations on nested maps were always bit hard to read (although it's now better with computeIfAbsent).

A theoretical performance argument for the key class would be, that it will reduce lookup time, by only utilizing single map rather than two of them.

A compromise solution would be to encapsulate the map into separate class only exposing the operations that are needed.

This comment has been minimized.

Copy link
@jbee

jbee Apr 30, 2019

Author Contributor

I was thinking about readability was well. So ultimately the question now is: do we think improved readability is worth the effort of changing this, then I think we should do it. I hesitated since there are literally 2 lines of code affected and using a key class would mean actually more code line wise so it felt less clear and didn't really call for action so I left it alone.

Show resolved Hide resolved ...fish/payara/microprofile/faulttolerance/policy/FaultTolerancePolicy.java
Show resolved Hide resolved ...fish/payara/microprofile/faulttolerance/policy/FaultTolerancePolicy.java
*
* @author Jan Bernitt
*/
public class BulkheadBasicTest {

This comment has been minimized.

Copy link
@pdudits

pdudits Apr 26, 2019

Contributor

Oh actually, those are nicely testable.

For concurrency, can you think about a stress test? This what helped me time to time to validate, that my view of the concurrent behavior of my code is also CPU's view on it :)

For example limit at most 4 concurrent invocations over 8-thread threadpool putting few thousand invocations in the pool, and verify some resonable invariants?

This comment has been minimized.

Copy link
@jbee

jbee Apr 29, 2019

Author Contributor

I added a test that uses all annotations except @Timeout (since I did not want to get into real time waiting) and spawns a number of concurrent callers each doing a number of calls. The tested method will fail hard every 3rd time and "soft" every 5th time (unless this is also every 3rd time). The attributes on the annotations are set in such a way that the failing has different effects, it definitely will cause retries, it definitely will cause circuit breaker transitions and there is a good chance that even after retrying some calls (from caller point of view) do fail entirely. This is nothing we can be sure of though. The test asserts that the numbers make sense and that the end state is clean.

@jbee

This comment has been minimized.

Copy link
Contributor Author

commented Apr 29, 2019

jenkins test please

@jbee

This comment has been minimized.

Copy link
Contributor Author

commented Apr 29, 2019

@pdudits addressed all your comments. PR ready for re-review.

Pandrex247 added some commits Apr 30, 2019

@Pandrex247
Copy link
Member

left a comment

Aside from this one comment and the couple of tiny things I've fixed for you, all seems bon :) Nicely done.

@arjantijms

This comment has been minimized.

Copy link
Member

commented May 1, 2019

The single policy looks good, and may make it easier indeed to combine the several aspects of FT in a somewhat more coherent way.

I do have some reservations about hardcoding the FT asynchronous annotation here, and would have loved to see this working with any (CDI based) asynchronous annotation, without the policy having explicit knowledge about which one was used, but this reservation for now is not big enough to ask for changes.

}
}

public static final class PriorityLiteral extends AnnotationLiteral<Priority> implements Priority {

This comment has been minimized.

Copy link
@arjantijms

arjantijms May 1, 2019

Member

Note to self and others: should propose literals to be in the FT spec for each annotation

@arjantijms

This comment has been minimized.

Copy link
Member

commented May 1, 2019

Another general comment, also not strong enough to warrant a changes requested, but what about renaming Policy to something a bit more descriptive, like FaultTolerancePolicy?

@jbee

This comment has been minimized.

Copy link
Contributor Author

commented May 1, 2019

@arjantijms I had not forgotten your input from our show and tell. I was planning to do the "multi-annotation" support as an extra PR. I'll need some input from you on what the goal is. Maybe this is reasonable to handle by a separate jira where you can dump what you know about annotations that should work or just comment on PAYARA-3468?

On the Policy name: The name FaultTolerancePolicy was already taken :D. That is the overall policy combining the 6 possible policies.

@jbee

This comment has been minimized.

Copy link
Contributor Author

commented May 1, 2019

Thanks @Pandrex247 for the fixes. I addressed your comment on the version and moved it to dependency management as discussed.

@Pandrex247

This comment has been minimized.

Copy link
Member

commented May 1, 2019

Jenkins test please

@jbee

This comment has been minimized.

Copy link
Contributor Author

commented May 1, 2019

jenkins test please

@Pandrex247 Pandrex247 merged commit 0aa6aa0 into payara:master May 1, 2019

59 checks passed

Payara Quick Build and Test Quick build and test passed!
Details
security/snyk - api/payara-api/pom.xml (payara-ci) No new issues
Details
security/snyk - api/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/admin/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/admingui/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/ant-tasks/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/appclient/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/batch/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/common/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/concurrent/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/connectors/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/core/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/deployment/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/distributions/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/ejb/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/extras/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/featuresets/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/flashlight/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/grizzly/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/ha/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/installer/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/javaee-api/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/jdbc/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/jms/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/load-balancer/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/orb/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/osgi-platforms/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/packager/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/payara-appserver-modules/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/persistence/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/registration/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/resources/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/security/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/tests/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/transaction/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/web/pom.xml (payara-ci) No new issues
Details
security/snyk - appserver/webservices/pom.xml (payara-ci) No new issues
Details
security/snyk - copyright/pom.xml (payara-ci) No new issues
Details
security/snyk - nucleus/admin/pom.xml (payara-ci) No new issues
Details
security/snyk - nucleus/cluster/pom.xml (payara-ci) No new issues
Details
security/snyk - nucleus/common/pom.xml (payara-ci) No new issues
Details
security/snyk - nucleus/core/pom.xml (payara-ci) No new issues
Details
security/snyk - nucleus/deployment/pom.xml (payara-ci) No new issues
Details
security/snyk - nucleus/diagnostics/pom.xml (payara-ci) No new issues
Details
security/snyk - nucleus/distributions/pom.xml (payara-ci) No new issues
Details
security/snyk - nucleus/flashlight/pom.xml (payara-ci) No new issues
Details
security/snyk - nucleus/grizzly/pom.xml (payara-ci) No new issues
Details
security/snyk - nucleus/hk2/pom.xml (payara-ci) No new issues
Details
security/snyk - nucleus/osgi-platforms/pom.xml (payara-ci) No new issues
Details
security/snyk - nucleus/packager/pom.xml (payara-ci) No new issues
Details
security/snyk - nucleus/payara-modules/pom.xml (payara-ci) No new issues
Details
security/snyk - nucleus/pom.xml (payara-ci) No new issues
Details
security/snyk - nucleus/resources-l10n/pom.xml (payara-ci) No new issues
Details
security/snyk - nucleus/resources/pom.xml (payara-ci) No new issues
Details
security/snyk - nucleus/security/pom.xml (payara-ci) No new issues
Details
security/snyk - nucleus/test-utils/pom.xml (payara-ci) No new issues
Details
security/snyk - nucleus/tests/pom.xml (payara-ci) No new issues
Details
security/snyk - pom.xml (payara-ci) No new issues
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.