FISH-6291 Fix Data Synchronization Issue in a @Clustered @Singleton #6302

Pandrex247 · 2023-06-08T15:37:40Z

Description

Fixes the concurrent synchronisation issue with clustered singletons by applying the lock at the point of reading from Hazelcast, rather than at the point of executing the method.

Without this change multiple threads would read the singleton state from Hazelcast at the same time, and then lock/wait for the lock to release. This means that when the lock is released and they proceed with their own execution they won't have received the updates that any other execution on another thread will have written to Hazelcast.

The lock on reading from Hazelcast should still retain respect for definition of @AccessTimeouts on methods to prevent it locking indefinitely.

@lprimak if you're hovering about I know clustered singletons are your baby 😜

Important Info

Dependant PRs

None

Blockers

None

Testing

New tests

None

Testing Performed

Ran Petr's reproducer attached to the Jira issue (adjusted for Jakarta EE 10 and Java 11) on a single instance, adjusted with my own logging and so that it only removes 6 items per invocation. I also added a call for it to print out what it locally sees the sequence as in the browser.

RestResource.java

@Path("request")
@RequestScoped
public class RestResource {

    @Resource
    ManagedExecutorService executor;

    @EJB
    private SingletonClusteredEJB singletonClusteredEjb;

    @GET
    @Path("clustered")
    @Produces(MediaType.TEXT_PLAIN)
    public String getEjbClustered() {
        for (int i = 0; i < 6; i++) {
            executor.submit(() -> {
                System.out.println("!!!!!!!! Starting a new task, calling hello.");
                singletonClusteredEjb.hello();
            });
        }

        try {
            Thread.sleep(10000);
        } catch (InterruptedException interruptedException) {
            // Nom nom nom
        }

        return "Launched via @EJB, look at the Payara log. Sequence is: " + singletonClusteredEjb.getSequence();
    }
...

SingletonClusteredEJB.java

@Clustered
@Singleton
public class SingletonClusteredEJB implements Serializable {

    private Map<String, List<Long>> idCache = new ConcurrentHashMap<>();

    @Lock
    public void hello() {
        System.out.println("Start..." + this);
        
        List<Long> cachedIDs = idCache.get("sequence");
        if (cachedIDs == null) {
            cachedIDs = new ArrayList<>();
            idCache.put("sequence", cachedIDs);
        }
        if (cachedIDs.isEmpty()) {
            for (long i = 0; i < 20; i++) {
                cachedIDs.add(i);
            }
            System.out.println("############################################################\n"
                    + "  Repopulating sequence\n"
                    + "  ############################################################\n");
        }

        System.out.println("[ID] cached ids " + cachedIDs + ", this: " + this);

        try {
            Thread.sleep(1000);
        } catch (InterruptedException ex) {
            Logger.getLogger(SingletonClusteredEJB.class.getName()).log(Level.SEVERE, null, ex);
        }

        Long removedId = cachedIDs.remove(0);
        System.out.println("[ID] removed ID " + removedId);
        System.out.println("...end.");
    }

    public String getSequence() {
        if (idCache == null) {
            return "Null!";
        }

        return Arrays.toString(idCache.get("sequence").toArray());
    }
...

Test 1 - DAS Default Config

Reploy reproducer to a clean DAS
Hit endpoint: http://localhost:8080/SimpleWAR/rest/request/clustered
Should get a sequence of [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Hit endpoint again, should get a sequence of [12, 13, 14, 15, 16, 17, 18, 19]

Test 2 - Deployment Group - Non-standard Executor

Create two instances in a deployment group
Configure both instances' EJB Container to use Hazelcast for persistence and availability
Configure both instances to use Hazelcast for EJB Timer
Deploy test application to the deployment group
Edit default managed executor service to have
- Core = 6
- Max = 12
- Queue = 18
Hit endpoint on both instances simultaneously http://localhost:28080/SimpleWAR/rest/request/clustered & http://localhost:28081/SimpleWAR/rest/request/clustered
Should get a sequence on both instances of [12, 13, 14, 15, 16, 17, 18, 19]
- You should also be able to see in the logs that only one invocation gets to execute at a time

Test 3 - Deployment Group - Non-standard Executor with AccessTimeout

Adjust the reproducer by adding @AccessTimeout(2000) to the hello method
Create two instances in a deployment group
Configure both instances' EJB Container to use Hazelcast for persistence and availability
Configure both instances to use Hazelcast for EJB Timer
Deploy test application to the deployment group
Edit default managed executor service to have
- Core = 6
- Max = 12
- Queue = 18
Hit endpoint on both instances simultaneously http://localhost:28080/SimpleWAR/rest/request/clustered & http://localhost:28081/SimpleWAR/rest/request/clustered
Should get a sequence on both instances of [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] (or something similar - it might also manage to remove the 3, but the result on both instances should be the same)
- You should also be able to see in the logs that only one invocation gets to execute at a time, and numerous access timeouts

Testing Environment

Windows 11, Zulu 11.0.19

Documentation

N/A

Notes for Reviewers

When I was testing, I dropped in Hazelcast 5.3.0 rather than the 5.2.2 still defined in the pom - this fixes the NPEs that get spammed from Hazelcast.

I left Bean Managed Concurrency alone. It should not lock on this and therefore the concurrency exceptions will still be present. I understood the spec description of BMC to mean that a user would need to order and control concurrent access rather than us.

…ingleton

…ingleton When @clustered @singleton service was executed from different threads, the initialization created new instances.

Signed-off-by: Andrew Pielage <pandrex247@hotmail.com>

Pandrex247 · 2023-06-08T16:32:12Z

Jenkins test please

lprimak · 2023-06-08T17:51:27Z

have you looked at CDI version? Does that work correctly?

Otherwise LGTM, good job!

lprimak · 2023-06-08T22:15:13Z

Looks like the CDI version is susceptible to the same issue.
Here is my suggested implementation:

See Pandrex247#4

updated CDI version of clustered singletons

Pandrex247 · 2023-06-13T16:38:03Z

@lprimak
After some testing CDI is indeed susceptible to the same issue.
Tested before and after your PR and your PR indeed seems to fix it - thanks!

Test executed

ApplicationScoped Bean

@Clustered(lock = DistributedLockType.LOCK)
@ApplicationScoped
public class ClusteredApplicationBean implements Serializable {
    private Map<String, List<Long>> idCache = new ConcurrentHashMap<>();

    public void hello() {
        System.out.println("Start..." + this);
        
        List<Long> cachedIDs = idCache.get("sequence");
        if (cachedIDs == null) {
            cachedIDs = new ArrayList<>();
            idCache.put("sequence", cachedIDs);
        }
        if (cachedIDs.isEmpty()) {
            for (long i = 0; i < 20; i++) {
                cachedIDs.add(i);
            }
            System.out.println("############################################################\n"
                    + "  Repopulating sequence\n"
                    + "  ############################################################\n");
        }

        System.out.println("[ID] cached ids " + cachedIDs + ", this: " + this);

        try {
            Thread.sleep(1000);
        } catch (InterruptedException ex) {
            Logger.getLogger(ClusteredApplicationBean.class.getName()).log(Level.SEVERE, null, ex);
        }

        Long removedId = cachedIDs.remove(0);
        System.out.println("[ID] removed ID " + removedId);
        System.out.println("...end.");
    }

    public String getSequence() {
        if (idCache == null) {
            return "Null!";
        }

        return Arrays.toString(idCache.get("sequence").toArray());
    }
}

REST endpoint

@Path("request")
@RequestScoped
public class RestResource {
    @Resource
    ManagedExecutorService executor;

    public RestResource() {
    }

    @Inject
    ClusteredApplicationBean clusteredApplicationBean;

    @GET
    @Path("clustered-application")
    @Produces(MediaType.TEXT_PLAIN)
    public String getClusteredApplication() {
        for (int i = 0; i < 6; i++) {
            executor.submit(() -> {
                System.out.println("!!!!!!!! Starting a new task, calling hello.");
                clusteredApplicationBean.hello();
            });
        }

        try {
            Thread.sleep(10000);
        } catch (InterruptedException interruptedException) {
            // Nom nom nom
        }

        return "Launched @ApplicationScoped bean, look at the Payara log. Sequence is: " + clusteredApplicationBean.getSequence();
    }

Deployment Group test utilising standard Managed Executor

Create two instances in a deployment group
Configure both instances' EJB Container to use Hazelcast for persistence and availability
Configure both instances to use Hazelcast for EJB Timer
Deploy test application to the deployment group
Hit endpoint on both instances simultaneously http://localhost:28080/SimpleWAR/rest/request/clustered-application & http://localhost:28081/SimpleWAR/rest/request/clustered-application
Should get a sequence on both instances of [12, 13, 14, 15, 16, 17, 18, 19]
- You should also be able to see in the logs that only one invocation gets to execute at a time

Signed-off-by: Andrew Pielage <pandrex247@hotmail.com>

Remove static imports, move methods, and add explanative comment Signed-off-by: Andrew Pielage <pandrex247@hotmail.com>

Signed-off-by: Andrew Pielage <pandrex247@hotmail.com>

Pandrex247 · 2023-06-15T14:46:31Z

Jenkins test please

aubi

Tested, went carefully thought the code, discussed...
Thanks for the explanatory comments, @Pandrex247!
LGTM

aubi and others added 5 commits June 8, 2023 15:34

FISH-6291 fix creation of duplicate instances annotated @clustered @s…

11573c6

…ingleton When @clustered @singleton service was executed from different threads, the initialization created new instances.

Comment typo

40ebbdf

FISH-6291 Lock around reading and writing singleton context

4a4a4cc

Signed-off-by: Andrew Pielage <pandrex247@hotmail.com>

FISH-6291 Remove duplicate code and cleanup

368804e

Signed-off-by: Andrew Pielage <pandrex247@hotmail.com>

FISH-6291 Adjust locking to only lock once and respect timeouts

9633d77

Signed-off-by: Andrew Pielage <pandrex247@hotmail.com>

Pandrex247 requested review from pdudits and aubi June 8, 2023 15:37

lprimak and others added 2 commits June 8, 2023 18:50

fixed CDI version of clustered singletons sync issues

e11c423

Merge pull request #4 from flowlogix/clustered-singleton-cdi-sync

11bd9c0

updated CDI version of clustered singletons

Copyright

cd8f151

Signed-off-by: Andrew Pielage <pandrex247@hotmail.com>

lprimak approved these changes Jun 14, 2023

View reviewed changes

Pandrex247 added 2 commits June 15, 2023 14:50

FISH-6291 Petr review changes

a40fa34

Remove static imports, move methods, and add explanative comment Signed-off-by: Andrew Pielage <pandrex247@hotmail.com>

FISH-6291 More static import removals

01c9341

Signed-off-by: Andrew Pielage <pandrex247@hotmail.com>

aubi approved these changes Jun 15, 2023

View reviewed changes

Pandrex247 merged commit 0240e63 into payara:master Jun 16, 2023
1 check passed

Pandrex247 deleted the FISH-6291-Community branch June 16, 2023 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FISH-6291 Fix Data Synchronization Issue in a @Clustered @Singleton #6302

FISH-6291 Fix Data Synchronization Issue in a @Clustered @Singleton #6302

Pandrex247 commented Jun 8, 2023 •

edited

Loading

Pandrex247 commented Jun 8, 2023

lprimak commented Jun 8, 2023 •

edited

Loading

lprimak commented Jun 8, 2023 •

edited

Loading

Pandrex247 commented Jun 13, 2023

Pandrex247 commented Jun 15, 2023

aubi left a comment

FISH-6291 Fix Data Synchronization Issue in a @Clustered @Singleton #6302

FISH-6291 Fix Data Synchronization Issue in a @Clustered @Singleton #6302

Conversation

Pandrex247 commented Jun 8, 2023 • edited Loading

Description

Important Info

Dependant PRs

Blockers

Testing

New tests

Testing Performed

Test 1 - DAS Default Config

Test 2 - Deployment Group - Non-standard Executor

Test 3 - Deployment Group - Non-standard Executor with AccessTimeout

Testing Environment

Documentation

Notes for Reviewers

Pandrex247 commented Jun 8, 2023

lprimak commented Jun 8, 2023 • edited Loading

lprimak commented Jun 8, 2023 • edited Loading

Pandrex247 commented Jun 13, 2023

Test executed

ApplicationScoped Bean

REST endpoint

Deployment Group test utilising standard Managed Executor

Pandrex247 commented Jun 15, 2023

aubi left a comment

Choose a reason for hiding this comment

Pandrex247 commented Jun 8, 2023 •

edited

Loading

lprimak commented Jun 8, 2023 •

edited

Loading

lprimak commented Jun 8, 2023 •

edited

Loading