Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow concurrent additions to a single parent resource #2021

Merged
merged 8 commits into from
Feb 2, 2023

Conversation

whikloj
Copy link
Collaborator

@whikloj whikloj commented Jan 13, 2023

JIRA Ticket: https://fedora-repository.atlassian.net/browse/FCREPO-3800

Creating as draft until we figure out #2019

What does this Pull Request do?

  • Adds a new ResourceLockType enum to differentiate between exclusive and non-exclusive locks.
  • Adds a new ResourceLock interface and implementation which stores the transaction ID, resource ID and ResourceLockType
  • Alters the Transaction interface and implementation to add a new lockResourceNonExclusive (the default is still an exclusive lock)
  • Alters the InMemoryLockManager to allow multiple resources to hold a non-exclusive lock on the same resource, but does not allow a non-exclusive and exclusive lock to occur on the same resource at the same time (unless it is the same transaction, i.e. create and delete a resource in a single transaction.)

How should this be tested?

  1. Create an object
    curl -v -XPUT -ufedoraAdmin:fedoraAdmin http://localhost:8080/rest/A
    
  2. Create 1 long-running transaction
    curl -v -XPOST -ufedoraAdmin:fedoraAdmin -XPOST http://localhost:8080/rest/fcr:tx
    
  3. Do a POST to the new object inside the transaction
    curl -v -XPOST -ufedoraAdmin:fedoraAdmin -XPOST -H"Atomic-ID: http://localhost:8080/rest/fcr:tx/<what ever your transaction ID is>" http://localhost:8080/rest/A
    
  4. Do a similar POST without the long running transaction
    curl -v -XPOST -ufedoraAdmin:fedoraAdmin -XPOST http://localhost:8080/rest/A
    

With this PR step 4 succeeds, without you get a 409 Conflict and message like Cannot update info:fedora/A because it is being updated by another transaction.

If (with this PR) you commit your transaction

curl -v -XPUT -ufedoraAdmin:fedoraAdmin http://localhost:8080/rest/fcr:tx/<whatever your transaction ID is>

you will see two children for http://localhost:8080/rest/A

Interested parties

Tag (@ mention) interested parties or, if unsure, @fcrepo/committers

Copy link
Collaborator

@pwinckles pwinckles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Thanks for working on this

private Object acquireInternalLock(final String resourceId) {
return internalResourceLocks.computeIfAbsent(resourceId, key -> new Object());
private Object acquireInternalLock(final String txId, final String resourceId, final ResourceLockType lockType) {
return internalResourceLocks.computeIfAbsent(resourceId,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this will work correctly. The purpose of the internal lock was to synchronize the internal lock acquisition at the resource level. That it to say, there can only ever be one thread attempting to acquire a lock on a specific resource at a time, but multiple threads could be concurrently acquiring locks on different resources.

These code changes will create a new internal lock for every thread, which has the affect of not synchronizing the lock acquisition at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand what you mean. If a thread calls this method to acquire the lock defined by the key resourceID how does storing a ResourceLock differ from storing a new Object(). I don't see how the old code actually stopped a second thread from acquiring the same lock.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code was synchronizing on a single lock instance per resource. You can synchronize on a ResourceLock instance instead, if you want, but the important part is that the synchronization happens on the same exact instance in all threads. Otherwise, the lock won't work.

That aside, I don't have this code pulled down currently, but it seems like this method might be returning a boolean now, which is the output of Set.add(). If that is true, it actually means that the current code is synchronizing across all threads regardless of the resource they're attempting to lock.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am synchronizing on a single ResourceLock instance for each resource, won't that mean I can't have concurrent non-exclusive locks? Do I need to have the ResourceLock contain all the locks for a resource? i.e.

class ResourceLock {
    private String resourceId;
    private Map<String, ResourceLockType> transactionLocks;
    ...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, for the internal lock it doesn't actually matter what the object is, which is why it was originally using Object. The point of the internal lock is so that one thread can safely acquire the resource lock without a race condition. The internal lock is only held long enough for a thread to either acquire the resource lock that it's attempting to acquire or to find out that it's not allowed to have a lock on the resource.

For example, take the following alternate implementation:

public class InMemoryResourceLockManager {
    private synchronized void acquireInternal(final String txId, final FedoraId resourceId, final ResourceLockType lockType) {
     // code goes here
    }
}

In that example, Java will only allow a single thread to enter the acquireInternal() method at a time. That code is equivalent to:

public class InMemoryResourceLockManager {
    private void acquireInternal(final String txId, final FedoraId resourceId, final ResourceLockType lockType) {
        syncronized (this) {
            // code goes here
         }
    }
}

You could write the lock manager like that, and it would work correctly. The downside of that approach is that every thread would block on that method regardless of the resource they were trying to lock. So, instead of using a single internal lock guard lock acquisition for all resource, the intent was to use a per-resource internal lock, so that there is only contention between threads that are attempting to lock the same resource. To continue the example, this is the equivalent of:

public class InMemoryResourceLockManager {
    private void acquireInternal(final String txId, final FedoraId resourceId, final ResourceLockType lockType) {
        syncronized (internalResourceLocks.computeIfAbsent(resourceId, key -> new Object())) {
            // code goes here
         }
    }
}

The point is, that it is only safe for one thread at a time to be executing the logic that awards or declines a lock on a resource, so you need a way to synchronize the lock acquisition code.

Does that make sense?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I think there's a bug in the way the original code worked that could result in two different threads independently holding different internal locks on the same resource...

I'll think about it a little more this weekend. For sure the safest thing to do would be to globally synchronize lock acquisition and release. It would perform a little worse than synchronizing by resource, but it certainly wouldn't be the largest latency source in Fedora.

@whikloj
Copy link
Collaborator Author

whikloj commented Jan 13, 2023

Force push after rebase on main again.

@andreasnef
Copy link

Hi @whikloj
I did run our tests today and they look good when using this PR.
Best, Andreas

@whikloj whikloj marked this pull request as ready for review January 23, 2023 22:26
@whikloj
Copy link
Collaborator Author

whikloj commented Jan 24, 2023

@fcrepo/committers now both myself and @pwinckles have our hands in this PR, so we'll need another reviewer.

@bbpennel bbpennel self-assigned this Jan 30, 2023
Copy link
Collaborator

@bbpennel bbpennel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading of this looks good over all, added a few thoughts and suggestions

}

/**
* Ghost nodes are special paths that become part of a single resource, so you cannot have the below situation of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ghost nodes are kind of mysterious and I don't see a page dedicated to them in confluence, we should probably add documentation discussing what they are and what restrictions apply to them. Would this test have the same expectations without transactions (only one thread succeeding)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


@Test
public void testExclusiveLockString() {
testLockString("exclusive", ResourceLockType.EXCLUSIVE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm being honest, I'm having a really hard time parsing what these test methods are evaluating. After numerous re-reads I think its just confirming that using the constructor with a string or enum produce the same results, but I think the combination of instance variables being compared against local objects instantiated with a mixture of the same instance and local variables is hurting my brain. Maybe getting rid of testLockString and testLockType would reduce the cognitive load some? (and might make the test class slightly shorter). Or it might just a personal problem, so feel free to disregard.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to remove them, I wondered after staring at them for a while if they had any value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those tests were just to make me comfortable that the class was tracking appropriately. All it confirms is that the information given is the same as you get back. I'm actually fine to remove the entire test class.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with keeping the test, doesn't hurt to have direct unit test coverage of the class and I find it more readable now, but I won't protest if you want to remove it

@andreasnef
Copy link

FWIW: Thomas asked me to do another test run with last week's changes which I did and can confirm that they fail as expected for 6.2 and 6.3, but are successful with this PR.

@whikloj
Copy link
Collaborator Author

whikloj commented Feb 1, 2023

I covered all the code review comments except switching out String for FedoraId in the first commit, the second does the switch so if we don't like it we can revert easily.

bbpennel
bbpennel previously approved these changes Feb 2, 2023
}

/**
* Ghost nodes are special paths that become part of a single resource, so you cannot have the below situation of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bbpennel bbpennel merged commit 780eff4 into fcrepo:main Feb 2, 2023
@whikloj whikloj deleted the FCREPO-3800 branch February 6, 2023 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants