Add Mutex Lock for Database Migrations #191

kris-evans · 2019-10-29T19:42:11Z

We currently use Goose migrations as part of our deployments to Kubernetes. We like to deploy our database changes before we deploy our code so we do this as part of the startup sequence for the web service. (e.g. we setup our db connections, run any pending migrations, then bind and start our service).

We run into an issue during deploys where if two pods start up and a migration takes longer than a few seconds, it will cause both migrations to be executed simultaneously which can cause weird state in the database migration process.

Could we please add a lock to the database migrations table that would would make it impossible to accidentally run two migrations simultaneously?

VojtechVitek · 2019-10-30T19:56:21Z

Someone has already tried, but the PR was not accepted: #119

@mkevanz In our Kubernetes pipeline, we run a goose in a pod (metadata.name: goose) to completion, which pretty much enforces the singleton process via a unique pod name within the namespace. Have you considered that?

JDWardle · 2019-12-13T17:33:30Z

I've seen in other migration tools they will use advisory locks to accomplish this, more info on that here and here. You can see this being used in Rails and in the github.com/go-migrate/migrate tool. It looks like PR #119 wasn't accepted due to the lock being specific to a transaction. Advisory locks avoid that by allowing you to specify that it's only active while the session it was retrieved in is open which resolves any issues they were having with transactions. As well as the added benefit that Postgres will cleanup any advisory locks when the database session that holds it is closed. This solution is specific to Postgres, but I believe there are similar options for other databases.

JDWardle · 2019-12-13T18:09:51Z

You can accomplish this if you create your own CLI tool to run migrations:

const migrationLockID = 1234

// This lock will be released once the database connection is closed.
func lock(ctx context.Context, db *sql.DB) error {
	_, err := db.ExecContext(ctx, "select pg_advisory_lock($1);", migrationLockID)
	if err != nil {
		return err
	}
	return nil
}

// Any advisory locks retrieved during a database session will be released once
// that session is closed. But this function allows explicit unlocking at any
// point if needed.
func unlock(ctx context.Context, db *sql.DB) error {
	_, err := db.ExecContext(ctx, "select pg_advisory_unlock($1);", migrationLockID)
	if err != nil {
		return err
	}
	return nil
}

// MigrateUp will run all pending migrations. It retrieves an advisory lock on
// the database to ensure that only one process can be running migrations at a
// time.
func MigrateUp(ctx context.Context, db *sql.DB) (err error) {
	if err = lock(ctx, db); err != nil {
		return err
	}
	defer func() {
		// Explicitly release the advisory lock just in case the caller of this
		// function doesn't immediately close the database connection.
		err = unlock(ctx, db)
	}()

	if err = goose.Run("up", db, "./database/migrations"); err != nil {
		return err
	}

	return nil
}

mwmahlberg · 2019-12-24T08:29:18Z

@mkevanz As a side node, imho, this is not the Kubernetes way to do it. Setting up the environment, this is what init containers are there for. For obvious reasons, they need to be idempotent. The advantage however is that the init container execution and the app container execution are independent from each other. From a higher level perspective, it is even more questionable to run migrations with the same user under which you do the actual work. Personally, coming from a DBA background, I would not allow changes to the database by an applications default user.

mfridman · 2021-08-26T22:24:34Z

Agreed with Vojtech's answer, I've had a similar experience. #258 (comment)

For those that do choose to have multiple pods run migrations, but only one of them succeeds, then @JDWardle answer might be sufficient. Will add this to the docs:

#191 (comment)

dahu33 · 2022-05-18T10:39:00Z

The code in the #191 (comment) has severals bugs:

session level advisory locks can only be released in the same database connection in which it was obtained: in the code above it may not be the case since the database pool is used directly. This may cause the lock not to be released.
the goose error in MigrateUp is overwritten in the defer function by err = unlock(ctx, db), effectively hiding any migration error.

The code below should be fix the mentioned bugs but anyone using it should be aware that this solution is not perfect. Ideally the migrations should run in the same connection/session in which the exclusive session level advisory lock was obtained. If it runs in a different connection/session there is a risk for the lock to be released while the migration are running (if the connection/session in which the exclusive session level advisory lock was obtained is closed).

// pgAdvisoryLock obtains the given exclusive session level advisory lock id on the given connection.
func pgAdvisoryLock(conn *sql.Conn, lockID int64) error {
	_, err := conn.ExecContext(context.Background(), "select pg_advisory_lock($1);", lockID)
	if err != nil {
		return err
	}
	return nil
}

// pgAdvisoryUnLock releases the given exclusive session level advisory lock id on the given connection.
// IMPORTANT: session level advisory locks can only be released in the same database connection in which
// it was obtained.
func pgAdvisoryUnLock(conn *sql.Conn, lockID int64) error {
	_, err := conn.ExecContext(context.Background(), "select pg_advisory_unlock($1);", lockID)
	if err != nil {
		return err
	}
	return nil
}

func migrate(db *sql.DB) error {
	const lockID = 5887940537704921958 // CRC-64 hash of "goose" (but could be anything else)

	// pg_advisory_lock and pg_advisory_unlock have to be executed on the same connection!
	conn, err := db.Conn(context.Background())
	if err != nil {
		return err
	}
	defer conn.Close() // return the connection to the pool

	if err := pgAdvisoryLock(conn, lockID); err != nil {
		return err
	}

	if err := goose.Up(db, "db/migrations"); err != nil {
		_ = pgAdvisoryUnLock(conn, lockID) // unlock but discard any error
		return err
	}
	return pgAdvisoryUnLock(conn, lockID)
}

As a workaround until #335 is implemented and to run the migration inside the same connection/session in which the exclusive session level advisory lock was obtained, I wrote a "hack" that use the connection pool settings to make sure it uses and re-use only a single connection:

// pgAdvisoryLock obtains the given exclusive session level advisory lock id on the given connection.
func pgAdvisoryLock(db *sql.DB, lockID int64) error {
	_, err := db.ExecContext(context.Background(), "select pg_advisory_lock($1);", lockID)
	if err != nil {
		return err
	}
	return nil
}

// pgAdvisoryUnLock releases the given exclusive session level advisory lock id on the given connection.
// IMPORTANT: session level advisory locks can only be released in the same database connection in which
// it was obtained.
func pgAdvisoryUnLock(db *sql.DB, lockID int64) error {
	_, err := db.ExecContext(context.Background(), "select pg_advisory_unlock($1);", lockID)
	if err != nil {
		return err
	}
	return nil
}

func migrate(db *sql.DB) error {
	if err := goose.SetDialect("postgres"); err != nil {
		return err
	}

	// Below is a workaround until https://github.com/pressly/goose/issues/335 is implemented.
	// It basically wraps Goose around a Postgres exclusive session level advisory lock.
	// We have to change the database connection pool settings to maintain opened only a single connection
	// for the following reasons:
	// 1) Goose supports only the db.SQL struct (we cannot pass it a database connection directly)
	// 2) Session level advisory locks can only be released in the same database connection in which it was
	//    obtained
	// 3) We want the migration to run inside the same connection in which the exclusive session level advisory
	//    lock was obtained (to avoid risk for the lock to be released while the migration are running in the case
	//    the connection/session in which the exclusive session level advisory lock was obtained is closed).
	db.SetMaxOpenConns(1)
	db.SetMaxIdleConns(1)
	db.SetConnMaxIdleTime(0)
	db.SetConnMaxLifetime(0)

	const lockID = 5887940537704921958 // CRC-64 hash of "goose" (but could be anything else)
	if err := pgAdvisoryLock(db, lockID); err != nil {
		return err
	}

	if err := goose.Up(db, "db/migrations"); err != nil {
		_ = pgAdvisoryUnLock(db, lockID) // unlock but discard any error
		return err
	}
	return pgAdvisoryUnLock(db, lockID)
}

mfridman · 2023-12-14T03:32:57Z

This functionality has been added in v3.16.0 and higher. Requires a goose.Provider and the goose.WithSessionLocker option.

A bit more detail in this blog post:

https://pressly.github.io/goose/blog/2023/goose-provider/#database-locking

Thank you to everyone for the feedback, fruitful discussions, and contributions! Going to close this issue, but feel free to continue the discussion and looking forward to your feedback.

NOTE: only postgres is currently supported, but we can add implementations as needed.

mfridman closed this as completed Aug 26, 2021

mfridman mentioned this issue Aug 26, 2021

Setup docs to avoid bloating README #247

Closed

dahu33 mentioned this issue Apr 21, 2022

Add locking mechanism to prevent parallel execution as a safety measure #335

Closed

mfridman added this to the v3.10.0 milestone Jan 22, 2023

mfridman reopened this Jan 22, 2023

mfridman added the up next label Jan 22, 2023

mfridman mentioned this issue Apr 2, 2023

[/v4] - Add session level advisory lock (postgres only) #493

Merged

mfridman closed this as completed Dec 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Mutex Lock for Database Migrations #191

Add Mutex Lock for Database Migrations #191

kris-evans commented Oct 29, 2019

VojtechVitek commented Oct 30, 2019

JDWardle commented Dec 13, 2019

JDWardle commented Dec 13, 2019

mwmahlberg commented Dec 24, 2019

mfridman commented Aug 26, 2021

dahu33 commented May 18, 2022

mfridman commented Dec 14, 2023

Add Mutex Lock for Database Migrations #191

Add Mutex Lock for Database Migrations #191

Comments

kris-evans commented Oct 29, 2019

VojtechVitek commented Oct 30, 2019

JDWardle commented Dec 13, 2019

JDWardle commented Dec 13, 2019

mwmahlberg commented Dec 24, 2019

mfridman commented Aug 26, 2021

dahu33 commented May 18, 2022

mfridman commented Dec 14, 2023