Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unnecessary locking in Authentication Service; improve Publish #6918

Closed
landreev opened this issue May 19, 2020 · 1 comment · Fixed by #7118
Closed

Fix unnecessary locking in Authentication Service; improve Publish #6918

landreev opened this issue May 19, 2020 · 1 comment · Fixed by #7118

Comments

@landreev
Copy link
Contributor

landreev commented May 19, 2020

See #IQSS/dataverse.harvard.edu/issues/73 for the details on the investigation. AuthenticationServiceBean being a @Singleton, without any @Lock or @ConcurrencyManagement annotations, defaults to being a locking time bomb, with the capacity to paralyze the entire application. The simple solution is to move the methods that don't need to be locking out of the service, and into a regular @Stateless bean. Though it'll probably be cleaner to create a new @Singleton bean - AuthenticationProviderRegistrationService? - and move the provider registration methods there (the original intended purpose of the service bean); and leave the methods that do "normal" work in AuthenticatonService, changing it to @Stateless.

Part 2: We were exceptionally lucky that this wasn't happening with any regularity before, because we are more or less ASKING FOR IT. If publishing a dataset, in the synchronous mode takes any time at all, the spinner goes away very quickly, and then "nothing happens" - which all but encourages the user to click the publish button again. Making the service methods above non-locking will not stop these multiple Publish commands from blocking each other. But it will of course prevent them from locking up the entire application.
2 things:

  1. The "spinner disappearing too quickly" appears to be unique to our prod. setup (a minute or so?) It goes on for much longer on my dev. box; so it must be something specific to how our production server is (mis)configured behind Apache. Is there an issue for that? - I couldn't find one right away. But let's investigate that - it's likely messing up other updates and such.
  2. Publish NEEDS A LOCK!! Even if the spinner kept going for an hour - a user could still reload the page and try again; or end up doing it from a different window, etc. We only lock the dataset when we go into the async. mode for the file registration. We just need to lock it every time we publish, the same way we do with EditInProgress when saving changes.
    The dataset that was causing havoc yesterday has 10 files - right under the cutoff, so it was being done synchronously, and they appear to have tried to publish it 4 times, in parallel.

I suggest we address both the evil singleton and the "publish lock" part above in one PR - because why not.

@djbrooke
Copy link
Contributor

djbrooke commented May 20, 2020

  • when this is complete, we'll patch (as Harvard Dataverse is the only installation experiencing this that we know of)
  • the patch will not include a code change for locking upon publish (settings change), but the code change will be added in the PR that will be included in Dataverse 5
  • we'll create another issue for applying the patch (Production patch for 4.20, 6918 #6922) and pick a time that's minimally disruptive.

@djbrooke djbrooke added this to the Dataverse 5 milestone Jun 4, 2020
@landreev landreev moved this from Up Next 🛎 to IQSS Team - In Progress 💻 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Jun 23, 2020
@landreev landreev self-assigned this Jun 23, 2020
landreev added a commit that referenced this issue Jul 20, 2020
… conditions.

MUCH SIMPLIFIED, in terms of the number of files that had to be modified; but should be
even more effective in ensuring that the service should never become locked under
normal operating conditions. (#6918)
landreev added a commit that referenced this issue Jul 23, 2020
landreev added a commit that referenced this issue Jul 27, 2020
landreev added a commit that referenced this issue Jul 27, 2020
…FT" to the url (?).

Was pre-existing - but why not. (#6918)
landreev added a commit that referenced this issue Jul 27, 2020
landreev added a commit that referenced this issue Jul 29, 2020
landreev added a commit that referenced this issue Aug 5, 2020
…for RestAssured to work;

but also some changes for the application proper to address some problematic things in the
execution framework of the publish commands. Will explain more in the PR. (#7159)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants