Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScopeManager can get confused when dealing with a heavy thread load. #2248

Closed
MarkTJohnson opened this issue Dec 3, 2021 · 19 comments · Fixed by #3240
Closed

ScopeManager can get confused when dealing with a heavy thread load. #2248

MarkTJohnson opened this issue Dec 3, 2021 · 19 comments · Fixed by #3240

Comments

@MarkTJohnson
Copy link
Contributor

Environment

Liquibase Version: 4.5

Liquibase Integration & Version: gradle

Liquibase Extension(s) & Version:

Database Vendor & Version: Sql Server 2018 at least

Operating System Type & Version: Linux inside Kubernetes pods

Description

We use Liquibase to perform migrations in a service hosted in a Kubernetes cluster, our service is multithreaded and we can updated several thousand databases at a time. We were on Liquibase 3.10 but were getting the occasional concurrentModification exception. I updated our service to utilize version 4.5 because it looked like the newer version handled that exception. I deployed the changes yesterday and saw 3557 migration failures when the processing window opened. This was the error (the scope names were all different)

liquibase.exception.LiquibaseException: java.lang.RuntimeException: Cannot end scope ehscwwaeie when currently at scope otgxsyxdtb at liquibase.Liquibase.runInScope(Liquibase.java:2409) at liquibase.Liquibase.update(Liquibase.java:211) at liquibase.Liquibase.update(Liquibase.java:197) at liquibase.Liquibase.update(Liquibase.java:193) at liquibase.Liquibase.update(Liquibase.java:185) at com.concur.reportdbservice.manager.ReportingSchemaMgr.update(ReportingSchemaMgr.java:66) at com.concur.reportdbservice.service.ReportingSchemaService.updateDatabase(ReportingSchemaService.java:46) at com.concur.reportdbservice.service.MigTaskService.entityMigrate(MigTaskService.java:48) at com.concur.reportdbservice.thread.MigMigrateJob.executeJob(MigMigrateJob.java:27) at com.concur.reportdbservice.thread.Job.run(Job.java:46) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) Caused by: java.lang.RuntimeException: Cannot end scope ehscwwaeie when currently at scope otgxsyxdtb at liquibase.Scope.exit(Scope.java:226) at liquibase.Scope.child(Scope.java:188) at liquibase.Scope.child(Scope.java:176) at liquibase.Scope.child(Scope.java:155) at liquibase.Liquibase.runInScope(Liquibase.java:2404)

Steps To Reproduce

We have 8 processing pods that send a request to 3 pods which use liquibase to perform the schema updates.
Each of the liquibase service pods runs a service that creates a new thread for each request coming in.

Actual Behavior

The error message was recorded above

Expected/Desired Behavior

We can run a heavy load of schema migrations successfully like was the case in version 3.10

@molivasdat
Copy link
Contributor

HI @MarkTJohnson Thanks for alerting us to this issue. We will add it to the list of issues that we are processing.

@ptkrv
Copy link

ptkrv commented Jan 11, 2022

We've faced the same problem too but we use MySQL and maven

@molivasdat
Copy link
Contributor

Hi @ptkrv Thanks for alerting us of the issue. Can you share how you are executing or using Liquibase? Are you using Kubernetes, pods and services as well or what is your use case with MySQL and maven?

@adamjones0101
Copy link

We are also experiencing this issue. We use Liquibase to perform upgrades in a Kubernetes environment on Linux. Our service is multithreaded and we usually upgrade only 10 to 50 databases in parallel yet this error is pervasive. We have used Liquibase version 3.8.5 for years without any issues but all of our attempts to upgrade to various levels of Version 4 have failed with this error. We use liquibase jar files, create a new Liquibase object and call the update method. PLEASE FIX THIS!!! This is a major regression for us. I am representing a fortune 500 company that depends on liquibase to maintain database integrity for our customers.

@ananni13
Copy link

ananni13 commented Apr 5, 2022

Any plans/updates?

This basically prevents any concurrent usage of the library, it's a pretty major regression.

@kataggart kataggart added this to To Do in Conditioning++ via automation Apr 5, 2022
@kataggart kataggart added the k8s label Apr 5, 2022
@adamjones0101
Copy link

The fix is actually quite trivial. In SingletonScopeManager just change "private Scope currentScope" to "private ThreadLocal currentScope = new ThreadLocal<>()" and make the appropriate changes to the getter and setter methods.

@nvoxland
Copy link
Contributor

nvoxland commented Apr 5, 2022

Changing it from a global scope to a thread scope would impact people using the threads differently. We have that SingletonScopeManager named as such under the plan to eventually make a ThreadLocalScopeManager for times that works better.

With the setups you have, does it work to make a class like:

public class ThreadLocalScopeManager extends ScopeManager {

    private ThreadLocal<Scope> currentScope = new ThreadLocal<>();

    @Override
    public synchronized Scope getCurrentScope() {
        return currentScope.get();
    }

    @Override
    protected Scope init(Scope scope) throws Exception {
        return scope;
    }

    @Override
    protected synchronized void setCurrentScope(Scope scope) {
        this.currentScope.set(scope);
    }
}

and then before calling anything else in Liquibase, call Scope.setScopeManager(new ThreadLocalScopeManager()) ?

@kataggart kataggart moved this from To Do to In discussion in Conditioning++ Apr 5, 2022
@Broscious
Copy link

For what it's worth, I ran into a similar issue with a similar use case of multi-threading upgrades moving from 3.8.9 to 4.8.0.

It seems your suggested @nvoxland caused us a stack overflow due to what looks like some license check.

Exception in thread "tenantDBChangeManagerPool-18" java.lang.StackOverflowError
	at liquibase.util.ObjectUtil.convert(ObjectUtil.java:261)
	at liquibase.util.SmartMap.get(SmartMap.java:46)
	at liquibase.Scope.get(Scope.java:279)
	at liquibase.Scope.getSingleton(Scope.java:316)
	at liquibase.configuration.ConfigurationDefinition.getCurrentConfiguredValue(ConfigurationDefinition.java:87)
	at liquibase.license.pro.DaticalTrueLicenseService.licenseIsInstalled(Unknown Source)
	at liquibase.license.pro.DaticalTrueLicenseService.licenseIsValid(Unknown Source)
	at liquibase.configuration.pro.EnvironmentValueProvider.lookupProvidedValue(Unknown Source)
	at liquibase.configuration.AbstractMapConfigurationValueProvider.getProvidedValue(AbstractMapConfigurationValueProvider.java:52)
	at liquibase.configuration.LiquibaseConfiguration.getCurrentConfiguredValue(LiquibaseConfiguration.java:123)
	at liquibase.configuration.ConfigurationDefinition.getCurrentConfiguredValue(ConfigurationDefinition.java:93)
	at liquibase.license.pro.DaticalTrueLicenseService.licenseIsInstalled(Unknown Source)
	at liquibase.license.pro.DaticalTrueLicenseService.licenseIsValid(Unknown Source)
	at liquibase.configuration.pro.EnvironmentValueProvider.lookupProvidedValue(Unknown Source)
	at liquibase.configuration.AbstractMapConfigurationValueProvider.getProvidedValue(AbstractMapConfigurationValueProvider.java:52)
	at liquibase.configuration.LiquibaseConfiguration.getCurrentConfiguredValue(LiquibaseConfiguration.java:123)
	at liquibase.configuration.ConfigurationDefinition.getCurrentConfiguredValue(ConfigurationDefinition.java:93)
	at liquibase.license.pro.DaticalTrueLicenseService.licenseIsInstalled(Unknown Source)
	at liquibase.license.pro.DaticalTrueLicenseService.licenseIsValid(Unknown Source)
	at liquibase.configuration.pro.EnvironmentValueProvider.lookupProvidedValue(Unknown Source)
	at liquibase.configuration.AbstractMapConfigurationValueProvider.getProvidedValue(AbstractMapConfigurationValueProvider.java:52)
	at liquibase.configuration.LiquibaseConfiguration.getCurrentConfiguredValue(LiquibaseConfiguration.java:123)
	at liquibase.configuration.ConfigurationDefinition.getCurrentConfiguredValue(ConfigurationDefinition.java:93)
	at liquibase.license.pro.DaticalTrueLicenseService.licenseIsInstalled(Unknown Source)
	...

I think this is because the suggested solution causes Scope.getCurrentScope() to instantiate a new root scope which instantiates some singletons for each new scope. I guess instantiating those multiple times may be causing some issue where the environment value provider ends up cyclically looking for a license; maybe due to the way the providers or resource allocators are registered.

Instead I made a slight modification to ensure that the thread local scopes are children of the singleton root scope as follows:

public class ThreadLocalScopeManager extends ScopeManager {

  private final ThreadLocal<Scope> currentScope = new ThreadLocal<>();
  private final Scope rootScope;

  ThreadLocalScopeManager() {
    this.rootScope = Scope.getCurrentScope();
  }

  @Override
  public synchronized Scope getCurrentScope() {
    Scope returnedScope = currentScope.get();

    if (returnedScope == null) {
      returnedScope = rootScope;
    }

    return returnedScope;
  }

  @Override
  protected Scope init(Scope scope) throws Exception {
    return scope;
  }

  @Override
  protected synchronized void setCurrentScope(Scope scope) {
    this.currentScope.set(scope);
  }
}

As long as the ScopeManager enforces a singleton root scope instantiation it's fine so including the current scope lookup in the constructor forces this behavior since Scope default to the SingletonScopeManager instantiating some root scope before the ThreadLocalScopeManager has to touch it.

Cursory tests seem to be working fine for multithreading upgrades.

@kataggart
Copy link
Contributor

related post on Forum

@bvremmeinfor
Copy link

Hi,
created a sample project with threading attempt here https://github.com/bvremmeinfor/liquibase-threading-demo

I don't think it's possible to do threading without a minor modification in Liquibase (SqlGeneratorFactory) - see sample project for details.

@jeevangali-agi
Copy link

I have too ran into the same problem. are we taking this up for 4.14.0?

@kataggart
Copy link
Contributor

Wanted to let folks know we are not ignoring this and see all the feedback. Right now there is not a plan to address this in next release, but it is an active issue we are reviewing/considering/discussing. Thanks as always for all your feedback.

@kataggart kataggart removed their assignment Jul 28, 2022
@kataggart kataggart removed this from In discussion in Conditioning++ Aug 22, 2022
@amritanshvns
Copy link

Ran into the same issue. Is there any update on this?

@chaitanyawho
Copy link

I'm facing this issue as well, is there a fix planned for this anytime soon?

@kataggart
Copy link
Contributor

Yes, @chaitanyawho it is in our backlog of planned issues to resolve. I unfortunately don't have a timeline to resolution.

@kataggart
Copy link
Contributor

FYI @nvoxland and team for our review; @scott-jackman notes in #3203 that this is a blocker for them upgrading, and they need to upgrade to receive the XML security fix we did a while back.

@kataggart
Copy link
Contributor

@nvoxland would your fixes here help resolve any of the issues reported here #2966 ? thanks!

@rogierslag
Copy link
Contributor

rogierslag commented Sep 20, 2022

Will a release be created for this bugfix so we can validate its behaviour in production?

@kataggart
Copy link
Contributor

@rogierslag yes - you can follow the status here #3240

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.