-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
creation of rows in knex_migration_lock table is not safe concurrently #3538
Comments
I think this might be #3457 , but against MySQL rather than Postgres. |
Please add simple single file reproduction code. Explaining steps is not enough. It is normal behavior that |
That might be a bug (but not necessarily since actual locking is done at different stage). I just read migration / migration locking code once more and I still feel that it could be better... It seems to be also using implicit transactions for every database, which might cause problems with mysql, oracle and sqlite... |
Yes, but it is not normal behavior that the lock is never correctly released and migration cannot ever succeed in the future. Let me be more clear about the observed behavior as well as try to provide a test case. Unfortunately testing concurrency is challenging in a "single file" type of scenario, but I will do my best to provide something that can illustrate the issue we saw. In my reading of the code (linked, line by line, in my original PR), the issue is fairly clear to me. Much like the test case in #3457:
Please note that this may need to be run many times to observe the behavior, and of course depends on a high number of uncontrollable variables. Concurrency in a single-file test reduction without access to environmental setup is a very fickle thing. I have observed that the issue can be reproduced more readily if there is network latency to the DB server. Observed behavior: "Migration table is already locked" is thrown from ALL nodes. No node is able to "win" the lock and perform migration. Restarting the process in a single-concurrency environment also does not function, "Migration table is already locked" continues to be thrown. Here is the debug output from 3 concurrent processes, where you can observe the race between the SELECT and INSERT causing 3 nodes to insert the rows at the same time:
Here is the debug output from a single node trying to migrate after this behavior has occurred, where you can observe the existence of multiple rows in the locking table preventing the locking functionality from ever working again:
On inspection of the database, knex_migrations_lock contains multiple rows:
It is again highly evident, from reading the code at https://github.com/knex/knex/blob/master/lib/migrate/table-creator.js#L30 , how these rows were created, and again evident from reading the code at https://github.com/knex/knex/blob/master/lib/migrate/Migrator.js#L318 , which verifies that the number of rows updated is equal to one, how having multiple rows in the knex_migrations_lock table will prevent migrations from ever running. |
Has there been any update on a "fix" for this if even possible? It's been 2 years since a comment or status... |
Environment
Knex version: 0.19.5
Database + version: XtraDB 5.7
OS: CentOS
Bug
We have encountered a concurrency-related issue around the knex_migration_lock table. The issue we have observed is as follows:
3 nodes concurrently try to execute db.migrate.latest() against a database which has not yet been initialized for use with Knex.
The knex_migration_lock table is created with three entries, rather than only one, because of a data race here: https://github.com/knex/knex/blob/master/lib/migrate/table-creator.js#L30 . The race is between the return of
return getTable(trxOrKnex, lockTable, schemaName).select('*');
and the check ofdata.length
in!data.length && trxOrKnex.into(lockTableWithSchema).insert({ is_locked: 0 })
.Multiple nodes execute the (non-locking)
select
query simultaneously and all return 0 rows. Next, multiple nodes execute the (non-locking)insert
query simultaneously, leading to the creation of multiple sequential locking rows.After this has happened, no migrations can function until the table is reset, because the check in https://github.com/knex/knex/blob/master/lib/migrate/Migrator.js#L318 verifies that only one row is updated by the query which takes the lock.
Perhaps an upsert primitive could be used to insert the locking record into the table initially, or the first indexed lock row can be utilized as the semaphore for the lock?
Migration table is already locked
database connection to MySQL or PostgreSQL, then single file example which initializes
needed data and demonstrates the problem.
Executing
db.migrate.latest
in multiple truly concurrent processes in a loop will eventually trigger various migration concurrency/race issues, this being the most common one.The text was updated successfully, but these errors were encountered: