Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FISH-6291 Fix Data Synchronization Issue in a @Clustered @Singleton #6302

Merged
merged 10 commits into from
Jun 16, 2023

Conversation

Pandrex247
Copy link
Member

@Pandrex247 Pandrex247 commented Jun 8, 2023

Description

Fixes the concurrent synchronisation issue with clustered singletons by applying the lock at the point of reading from Hazelcast, rather than at the point of executing the method.

Without this change multiple threads would read the singleton state from Hazelcast at the same time, and then lock/wait for the lock to release. This means that when the lock is released and they proceed with their own execution they won't have received the updates that any other execution on another thread will have written to Hazelcast.

The lock on reading from Hazelcast should still retain respect for definition of @AccessTimeouts on methods to prevent it locking indefinitely.

@lprimak if you're hovering about I know clustered singletons are your baby 😜

Important Info

Dependant PRs

None

Blockers

None

Testing

New tests

None

Testing Performed

Ran Petr's reproducer attached to the Jira issue (adjusted for Jakarta EE 10 and Java 11) on a single instance, adjusted with my own logging and so that it only removes 6 items per invocation. I also added a call for it to print out what it locally sees the sequence as in the browser.

RestResource.java

@Path("request")
@RequestScoped
public class RestResource {

    @Resource
    ManagedExecutorService executor;

    @EJB
    private SingletonClusteredEJB singletonClusteredEjb;

    @GET
    @Path("clustered")
    @Produces(MediaType.TEXT_PLAIN)
    public String getEjbClustered() {
        for (int i = 0; i < 6; i++) {
            executor.submit(() -> {
                System.out.println("!!!!!!!! Starting a new task, calling hello.");
                singletonClusteredEjb.hello();
            });
        }

        try {
            Thread.sleep(10000);
        } catch (InterruptedException interruptedException) {
            // Nom nom nom
        }

        return "Launched via @EJB, look at the Payara log. Sequence is: " + singletonClusteredEjb.getSequence();
    }
...

SingletonClusteredEJB.java

@Clustered
@Singleton
public class SingletonClusteredEJB implements Serializable {

    private Map<String, List<Long>> idCache = new ConcurrentHashMap<>();

    @Lock
    public void hello() {
        System.out.println("Start..." + this);
        
        List<Long> cachedIDs = idCache.get("sequence");
        if (cachedIDs == null) {
            cachedIDs = new ArrayList<>();
            idCache.put("sequence", cachedIDs);
        }
        if (cachedIDs.isEmpty()) {
            for (long i = 0; i < 20; i++) {
                cachedIDs.add(i);
            }
            System.out.println("############################################################\n"
                    + "  Repopulating sequence\n"
                    + "  ############################################################\n");
        }

        System.out.println("[ID] cached ids " + cachedIDs + ", this: " + this);

        try {
            Thread.sleep(1000);
        } catch (InterruptedException ex) {
            Logger.getLogger(SingletonClusteredEJB.class.getName()).log(Level.SEVERE, null, ex);
        }

        Long removedId = cachedIDs.remove(0);
        System.out.println("[ID] removed ID " + removedId);
        System.out.println("...end.");
    }

    public String getSequence() {
        if (idCache == null) {
            return "Null!";
        }

        return Arrays.toString(idCache.get("sequence").toArray());
    }
...

Test 1 - DAS Default Config

Test 2 - Deployment Group - Non-standard Executor

  • Create two instances in a deployment group
  • Configure both instances' EJB Container to use Hazelcast for persistence and availability
  • Configure both instances to use Hazelcast for EJB Timer
  • Deploy test application to the deployment group
  • Edit default managed executor service to have
    • Core = 6
    • Max = 12
    • Queue = 18
  • Hit endpoint on both instances simultaneously http://localhost:28080/SimpleWAR/rest/request/clustered & http://localhost:28081/SimpleWAR/rest/request/clustered
  • Should get a sequence on both instances of [12, 13, 14, 15, 16, 17, 18, 19]
    • You should also be able to see in the logs that only one invocation gets to execute at a time

Test 3 - Deployment Group - Non-standard Executor with AccessTimeout

  • Adjust the reproducer by adding @AccessTimeout(2000) to the hello method
  • Create two instances in a deployment group
  • Configure both instances' EJB Container to use Hazelcast for persistence and availability
  • Configure both instances to use Hazelcast for EJB Timer
  • Deploy test application to the deployment group
  • Edit default managed executor service to have
    • Core = 6
    • Max = 12
    • Queue = 18
  • Hit endpoint on both instances simultaneously http://localhost:28080/SimpleWAR/rest/request/clustered & http://localhost:28081/SimpleWAR/rest/request/clustered
  • Should get a sequence on both instances of [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] (or something similar - it might also manage to remove the 3, but the result on both instances should be the same)
    • You should also be able to see in the logs that only one invocation gets to execute at a time, and numerous access timeouts

Testing Environment

Windows 11, Zulu 11.0.19

Documentation

N/A

Notes for Reviewers

When I was testing, I dropped in Hazelcast 5.3.0 rather than the 5.2.2 still defined in the pom - this fixes the NPEs that get spammed from Hazelcast.

I left Bean Managed Concurrency alone. It should not lock on this and therefore the concurrency exceptions will still be present. I understood the spec description of BMC to mean that a user would need to order and control concurrent access rather than us.

aubi and others added 5 commits June 8, 2023 15:34
…ingleton

When @clustered @singleton service was executed from different threads,
the initialization created new instances.
Signed-off-by: Andrew Pielage <pandrex247@hotmail.com>
Signed-off-by: Andrew Pielage <pandrex247@hotmail.com>
Signed-off-by: Andrew Pielage <pandrex247@hotmail.com>
@Pandrex247 Pandrex247 requested review from pdudits and aubi June 8, 2023 15:37
@Pandrex247
Copy link
Member Author

Jenkins test please

@lprimak
Copy link
Contributor

lprimak commented Jun 8, 2023

have you looked at CDI version? Does that work correctly?

Otherwise LGTM, good job!

@lprimak
Copy link
Contributor

lprimak commented Jun 8, 2023

Looks like the CDI version is susceptible to the same issue.
Here is my suggested implementation:

See Pandrex247#4

@Pandrex247
Copy link
Member Author

@lprimak
After some testing CDI is indeed susceptible to the same issue.
Tested before and after your PR and your PR indeed seems to fix it - thanks!

Test executed

ApplicationScoped Bean

@Clustered(lock = DistributedLockType.LOCK)
@ApplicationScoped
public class ClusteredApplicationBean implements Serializable {
    private Map<String, List<Long>> idCache = new ConcurrentHashMap<>();

    public void hello() {
        System.out.println("Start..." + this);
        
        List<Long> cachedIDs = idCache.get("sequence");
        if (cachedIDs == null) {
            cachedIDs = new ArrayList<>();
            idCache.put("sequence", cachedIDs);
        }
        if (cachedIDs.isEmpty()) {
            for (long i = 0; i < 20; i++) {
                cachedIDs.add(i);
            }
            System.out.println("############################################################\n"
                    + "  Repopulating sequence\n"
                    + "  ############################################################\n");
        }

        System.out.println("[ID] cached ids " + cachedIDs + ", this: " + this);

        try {
            Thread.sleep(1000);
        } catch (InterruptedException ex) {
            Logger.getLogger(ClusteredApplicationBean.class.getName()).log(Level.SEVERE, null, ex);
        }

        Long removedId = cachedIDs.remove(0);
        System.out.println("[ID] removed ID " + removedId);
        System.out.println("...end.");
    }

    public String getSequence() {
        if (idCache == null) {
            return "Null!";
        }

        return Arrays.toString(idCache.get("sequence").toArray());
    }
}

REST endpoint

@Path("request")
@RequestScoped
public class RestResource {
    @Resource
    ManagedExecutorService executor;

    public RestResource() {
    }

    @Inject
    ClusteredApplicationBean clusteredApplicationBean;

    @GET
    @Path("clustered-application")
    @Produces(MediaType.TEXT_PLAIN)
    public String getClusteredApplication() {
        for (int i = 0; i < 6; i++) {
            executor.submit(() -> {
                System.out.println("!!!!!!!! Starting a new task, calling hello.");
                clusteredApplicationBean.hello();
            });
        }

        try {
            Thread.sleep(10000);
        } catch (InterruptedException interruptedException) {
            // Nom nom nom
        }

        return "Launched @ApplicationScoped bean, look at the Payara log. Sequence is: " + clusteredApplicationBean.getSequence();
    }

Deployment Group test utilising standard Managed Executor

Signed-off-by: Andrew Pielage <pandrex247@hotmail.com>
Remove static imports, move methods, and add explanative comment

Signed-off-by: Andrew Pielage <pandrex247@hotmail.com>
Signed-off-by: Andrew Pielage <pandrex247@hotmail.com>
@Pandrex247
Copy link
Member Author

Jenkins test please

Copy link
Contributor

@aubi aubi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested, went carefully thought the code, discussed...
Thanks for the explanatory comments, @Pandrex247!
LGTM

@Pandrex247 Pandrex247 merged commit 0240e63 into payara:master Jun 16, 2023
1 check passed
@Pandrex247 Pandrex247 deleted the FISH-6291-Community branch June 16, 2023 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants