Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISPN-9127 Component registry #6232

Merged

Conversation

danberindei
Copy link
Member

@danberindei danberindei commented Sep 3, 2018

https://issues.jboss.org/browse/ISPN-9127

I extracted the main functionality of the component registries in a BasicComponentRegistry interface, implemented by BasicComponentRegistryImpl.

  • It has no cache-specific logic or references to specific components.
  • All the dependencies are wired before being injected in another component. ComponentRef<T> fields or injection method parameters can be used as lazy dependencies to break dependency cycles. Lazy dependencies are only created and wired when the wired() method is called.
  • Start/stop priorities only have a meaning within a component, if a component has multiple start/stop methods. A component's start methods are always invoked after the start methods of its non-lazy dependencies.
  • registerComponent() doesn't overwrite existing components.
  • replaceComponent() replaces existing components, but it is intended only for tests.
  • getComponent() and registerComponent() return a ComponentRef<T>, and the component is always at least wired (all the dependencies have been injected). Use running() to get the actual instance, or wired() iff running() would create a cycle.

I also changed the behaviour of [Global]ComponentRegistry a bit:

  • getComponent() always tries to create and start the component (skips the start if the registry is only INSTANTIATED)
  • registerComponent() doesn't overwrite existing components

@danberindei danberindei force-pushed the ISPN-9127_component_registry branch 3 times, most recently from 7b52cea to 93f3743 Compare September 7, 2018 09:30
@wburns
Copy link
Member

wburns commented Sep 7, 2018

Quite a few test failures - also the point of this refactor is to ensure that all components are started before operations can begin? Is it not as simple as ensuring the start order is changed to not allow remote commands until other components are started?

@danberindei danberindei force-pushed the ISPN-9127_component_registry branch 3 times, most recently from 49b6795 to 23c91d8 Compare September 11, 2018 14:36
@tristantarrant
Copy link
Member

Needs rebase

@danberindei danberindei force-pushed the ISPN-9127_component_registry branch 7 times, most recently from 6f5cc8d to 62e2db5 Compare September 17, 2018 14:59
@gustavocoding
Copy link

gustavocoding commented Sep 17, 2018

I still have trouble starting servers. Not getting exception in the counters anymore, but one of the nodes get suspected and the state transfer times out preventing the cluster to form. I am gathering some logs...

@danberindei danberindei force-pushed the ISPN-9127_component_registry branch 5 times, most recently from 595c00e to 39ee2d2 Compare September 25, 2018 11:33
@danberindei danberindei changed the title [PREVIEW] ISPN-9127 Component registry ISPN-9127 Component registry Sep 26, 2018
@danberindei
Copy link
Member Author

Should be ok to integrate now, it passed the spark connector tests.
I removed the preview label and fleshed out the description a bit.

@tristantarrant
Copy link
Member

And CI says it's green !

@gustavocoding
Copy link

gustavocoding commented Oct 1, 2018

Tested again today and the deadlock is not happening anymore. Also, the NPE starting the server is related to other issue https://issues.jboss.org/browse/ISPN-9563, so LGTM

@danberindei danberindei force-pushed the ISPN-9127_component_registry branch 4 times, most recently from db50ad9 to 8c58bb1 Compare October 3, 2018 13:46
@gustavocoding gustavocoding self-assigned this Oct 3, 2018
@gustavocoding
Copy link

Let's wait for CI

@gustavocoding
Copy link

CI timed out on some query tests:

A few of those were thrown:

Exception in thread "remote-thread-MassIndexingTest-NodeB-p11241-t5" org.infinispan.IllegalLifecycleStateException: Cannot wire or start components while the registry is not running
	at org.infinispan.factories.impl.BasicComponentRegistryImpl.prepareWrapperChange(BasicComponentRegistryImpl.java:610)

and the one that crashed:

Test org.infinispan.query.affinity.ShardDistributionTest.testAllocation has been running for more than 300 seconds. Interrupting the test thread and dumping thread stacks of the test suite process and its children.
Dumping thread stacks of process 17174 to /home/infinispan/workspace/Infinispan_PR-6232-CX6DTOWKPIX5PKDSWX2BR3HEF5425SISGE6LZF6JTVH5U7U5JLAQ/query/threaddump-org_infinispan_query_affinity_ShardDistributionTest_testAllocation-2018-10-04-17174.log

@gustavocoding
Copy link

needs rebase again

@danberindei
Copy link
Member Author

danberindei commented Oct 4, 2018

The core test timeouts are almost certainly related to the JGroups upgrade. I have reproduced them with trace logging and the problem is that JGroups isn't able to install a proper view after the coordinator leaves. Unfortunately I can't reproduce it with trace enabled for org.jgroups.protocols.UDP, so I'm not sure where the view message is lost.

10:06:21,444 TRACE (jgroups-10,Test-NodeD-37696:[]) [GMS] Test-NodeD-37696: mcasting view [Test-NodeD-37696|8] (3) [Test-NodeD-37696, Test-NodeB-39137, Test-NodeC-57941]
10:06:22,078 TRACE (jgroups-10,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: mcasting view [Test-NodeB-39137|9] (2) [Test-NodeB-39137, Test-NodeC-57941]
10:06:36,043 TRACE (jgroups-24,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: mcasting view MergeView::[Test-NodeB-39137|10] (2) [Test-NodeB-39137, Test-NodeC-57941], 1 subgroups: [Test-NodeB-39137|9] (2) [Test-NodeB-39137, Test-NodeC-57941]
10:06:49,878 TRACE (jgroups-26,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: mcasting view MergeView::[Test-NodeB-39137|11] (2) [Test-NodeB-39137, Test-NodeC-57941], 1 subgroups: [Test-NodeB-39137|10] (2) [Test-NodeB-39137, Test-NodeC-57941]
10:07:03,759 TRACE (jgroups-23,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: mcasting view MergeView::[Test-NodeB-39137|12] (2) [Test-NodeB-39137, Test-NodeC-57941], 1 subgroups: [Test-NodeB-39137|11] (2) [Test-NodeB-39137, Test-NodeC-57941]
10:07:17,704 TRACE (jgroups-26,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: mcasting view MergeView::[Test-NodeB-39137|13] (2) [Test-NodeB-39137, Test-NodeC-57941], 1 subgroups: [Test-NodeB-39137|12] (2) [Test-NodeB-39137, Test-NodeC-57941]
...
10:10:17,435 TRACE (jgroups-10,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: mcasting view MergeView::[Test-NodeB-39137|26] (2) [Test-NodeB-39137, Test-NodeC-57941], 1 subgroups: [Test-NodeB-39137|25] (2) [Test-NodeB-39137, Test-NodeC-57941]
10:10:31,254 TRACE (jgroups-10,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: mcasting view MergeView::[Test-NodeB-39137|27] (2) [Test-NodeB-39137, Test-NodeC-57941], 1 subgroups: [Test-NodeB-39137|26] (2) [Test-NodeB-39137, Test-NodeC-57941]
10:10:45,102 TRACE (jgroups-24,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: mcasting view MergeView::[Test-NodeB-39137|28] (2) [Test-NodeB-39137, Test-NodeC-57941], 1 subgroups: [Test-NodeB-39137|27] (2) [Test-NodeB-39137, Test-NodeC-57941]
10:10:58,917 TRACE (jgroups-24,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: mcasting view MergeView::[Test-NodeB-39137|29] (2) [Test-NodeB-39137, Test-NodeC-57941], 1 subgroups: [Test-NodeB-39137|28] (2) [Test-NodeB-39137, Test-NodeC-57941]
10:11:12,745 TRACE (jgroups-8,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: mcasting view MergeView::[Test-NodeB-39137|30] (2) [Test-NodeB-39137, Test-NodeC-57941], 1 subgroups: [Test-NodeB-39137|29] (2) [Test-NodeB-39137, Test-NodeC-57941]

The query ones are possibly related, I'm still investigating.

@gustavocoding
Copy link

gustavocoding commented Oct 4, 2018

I am trying to reproduce the query issue here Not related

@danberindei
Copy link
Member Author

Updated! I thought I'd fix the IllegalLifecycleStateException in the log as well, but it's taking longer than I thought so I'm going to make it a separate PR.

@danberindei danberindei added this to the 9.4.0.Final milestone Oct 4, 2018
Introduce BasicComponentRegistry, which handles dependency injection but
doesn't have any logic specific to caches or managers.

Each component has its own lifecycle status, and starting a component
also starts its dependencies. Components can start in parallel.

ComponentRegistry and GlobalComponentRegistry still need to maintain
their own lifecycle status, but it is now possible to start a cache
before all the global components have finished starting.
Also fix NullPointerException when stopping a cache with indexing
disabled.
@gustavocoding gustavocoding merged commit 8787441 into infinispan:master Oct 5, 2018
@gustavocoding
Copy link

Merged, thanks @danberindei !

@danberindei danberindei deleted the ISPN-9127_component_registry branch October 5, 2018 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants