Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition and possible lock bypass if two nodes run on clean schema #1584

Closed
zholub opened this issue Dec 11, 2020 · 5 comments · Fixed by #2327
Closed

Race condition and possible lock bypass if two nodes run on clean schema #1584

zholub opened this issue Dec 11, 2020 · 5 comments · Fixed by #2327

Comments

@zholub
Copy link

zholub commented Dec 11, 2020

Environment

Liquibase Version:
3.10.x, 4.0.x, 4.1.x
Liquibase Integration & Version: <Pick one: CLI, maven, gradle, spring boot, servlet, etc.>
All affected
Database Vendor & Version:
Some parts is specific for Oracle, DB2 or Derby, others are vendor independent
Operating System Type & Version:
OS independent

Description

When liquibase run on clean schema it creates and initializes DatabaseChangeLogLock table before it can use it's standard locking mechanism. In clustered environment, applications can be deployed for the first time to multiple nodes at same time. This initialization process is not protected against race conditions.

Table creation part of this issue is already mentioned here: https://liquibase.jira.com/browse/CORE-2596
It tried to fix the problem by parsing the database error message for table already exists condition.

catch (DatabaseException e) {
  if ((e.getMessage() != null) && e.getMessage().contains("exists")) {
  //hit a race condition where the table got created by another node.
}

This solution does not work with all database and/or non English environment. A bug already reported here: #1036

DatabaseChangeLogLock table content initialization (insert) can fail with primary key violated error if another node inserts this record between selecting and inserting operations.
On Derby and DB2 database a DatabaseChangeLogLock table recreation can happen if it has wrong column type. It can delete an already acquired lock so can bypass locking.

Possible solutions:

  • Forbid running 'update' on schemas without DatabaseChangeLogLock table
  • Make liquibase table initialization process protected from race conditions

For the latter, i would suggest:

  • Generalizing race condition handling in DatabaseChangeLogLock table initialization:
    • check table/record existence
    • create/insert if table/record does not exist
    • Retry (GOTO 1) once if table creation or record insertion fails
  • Change DatabaseChangeLogLock table modification to table alter instead of drop/recreate and move it to the synchronized section (do it while having the lock)

Steps To Reproduce

Run multiple instances of liquibase update on an empty database schema at same time

Actual Behavior

  • Sometimes create table DATABASECHANGELOGLOCK fails with 'table already exists' error (On oracle database)
  • Sometimes insert into DATABASECHANGELOGLOCK fails with 'primary key violated' error
  • Sometimes multiple nodes start updating the schema (On Derby and Db2)
  • Sometimes liquibase creates and initializes it's tables successfully

Expected/Desired Behavior

  • Liquibase always creates and initializes it's tables successfully
@molivasdat
Copy link
Contributor

Hi @zholub Thanks for bringing this issue to our attention. We will add this to the list of issues we need to process.

@schrieveslaach
Copy link
Contributor

@molivasdat, @nvoxland, I created a PR that tries to fix the race condition. Do you mind and have a look?

@schrieveslaach
Copy link
Contributor

The PR ist ready for review and in my local tests I was able to start five instances of a service at the same time on a fresh DB. None of these instances crashes anymore.

@schrieveslaach
Copy link
Contributor

@nvoxland, @r2liquibase is there a change to review #1901

@kataggart kataggart added this to To Do in Conditioning++ via automation Jan 4, 2022
@kataggart kataggart removed this from To Do in Conditioning++ Aug 15, 2022
@kataggart kataggart linked a pull request Aug 15, 2022 that will close this issue
3 tasks
@kataggart
Copy link
Contributor

marked "still an issue" as this is associated with #2327

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment