Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Mutex Lock for Database Migrations #191

Closed
kris-evans opened this issue Oct 29, 2019 · 7 comments
Closed

Add Mutex Lock for Database Migrations #191

kris-evans opened this issue Oct 29, 2019 · 7 comments
Labels
Milestone

Comments

@kris-evans
Copy link

We currently use Goose migrations as part of our deployments to Kubernetes. We like to deploy our database changes before we deploy our code so we do this as part of the startup sequence for the web service. (e.g. we setup our db connections, run any pending migrations, then bind and start our service).

We run into an issue during deploys where if two pods start up and a migration takes longer than a few seconds, it will cause both migrations to be executed simultaneously which can cause weird state in the database migration process.

Could we please add a lock to the database migrations table that would would make it impossible to accidentally run two migrations simultaneously?

@VojtechVitek
Copy link
Collaborator

Someone has already tried, but the PR was not accepted: #119

@mkevanz In our Kubernetes pipeline, we run a goose in a pod (metadata.name: goose) to completion, which pretty much enforces the singleton process via a unique pod name within the namespace. Have you considered that?

@JDWardle
Copy link

I've seen in other migration tools they will use advisory locks to accomplish this, more info on that here and here. You can see this being used in Rails and in the github.com/go-migrate/migrate tool. It looks like PR #119 wasn't accepted due to the lock being specific to a transaction. Advisory locks avoid that by allowing you to specify that it's only active while the session it was retrieved in is open which resolves any issues they were having with transactions. As well as the added benefit that Postgres will cleanup any advisory locks when the database session that holds it is closed. This solution is specific to Postgres, but I believe there are similar options for other databases.

@JDWardle
Copy link

You can accomplish this if you create your own CLI tool to run migrations:

const migrationLockID = 1234

// This lock will be released once the database connection is closed.
func lock(ctx context.Context, db *sql.DB) error {
	_, err := db.ExecContext(ctx, "select pg_advisory_lock($1);", migrationLockID)
	if err != nil {
		return err
	}
	return nil
}

// Any advisory locks retrieved during a database session will be released once
// that session is closed. But this function allows explicit unlocking at any
// point if needed.
func unlock(ctx context.Context, db *sql.DB) error {
	_, err := db.ExecContext(ctx, "select pg_advisory_unlock($1);", migrationLockID)
	if err != nil {
		return err
	}
	return nil
}

// MigrateUp will run all pending migrations. It retrieves an advisory lock on
// the database to ensure that only one process can be running migrations at a
// time.
func MigrateUp(ctx context.Context, db *sql.DB) (err error) {
	if err = lock(ctx, db); err != nil {
		return err
	}
	defer func() {
		// Explicitly release the advisory lock just in case the caller of this
		// function doesn't immediately close the database connection.
		err = unlock(ctx, db)
	}()

	if err = goose.Run("up", db, "./database/migrations"); err != nil {
		return err
	}

	return nil
}

@mwmahlberg
Copy link

@mkevanz As a side node, imho, this is not the Kubernetes way to do it. Setting up the environment, this is what init containers are there for. For obvious reasons, they need to be idempotent. The advantage however is that the init container execution and the app container execution are independent from each other. From a higher level perspective, it is even more questionable to run migrations with the same user under which you do the actual work. Personally, coming from a DBA background, I would not allow changes to the database by an applications default user.

@mfridman
Copy link
Collaborator

Agreed with Vojtech's answer, I've had a similar experience. #258 (comment)

For those that do choose to have multiple pods run migrations, but only one of them succeeds, then @JDWardle answer might be sufficient. Will add this to the docs:

#191 (comment)

@dahu33
Copy link
Contributor

dahu33 commented May 18, 2022

The code in the #191 (comment) has severals bugs:

  1. session level advisory locks can only be released in the same database connection in which it was obtained: in the code above it may not be the case since the database pool is used directly. This may cause the lock not to be released.
  2. the goose error in MigrateUp is overwritten in the defer function by err = unlock(ctx, db), effectively hiding any migration error.

The code below should be fix the mentioned bugs but anyone using it should be aware that this solution is not perfect. Ideally the migrations should run in the same connection/session in which the exclusive session level advisory lock was obtained. If it runs in a different connection/session there is a risk for the lock to be released while the migration are running (if the connection/session in which the exclusive session level advisory lock was obtained is closed).

// pgAdvisoryLock obtains the given exclusive session level advisory lock id on the given connection.
func pgAdvisoryLock(conn *sql.Conn, lockID int64) error {
	_, err := conn.ExecContext(context.Background(), "select pg_advisory_lock($1);", lockID)
	if err != nil {
		return err
	}
	return nil
}

// pgAdvisoryUnLock releases the given exclusive session level advisory lock id on the given connection.
// IMPORTANT: session level advisory locks can only be released in the same database connection in which
// it was obtained.
func pgAdvisoryUnLock(conn *sql.Conn, lockID int64) error {
	_, err := conn.ExecContext(context.Background(), "select pg_advisory_unlock($1);", lockID)
	if err != nil {
		return err
	}
	return nil
}

func migrate(db *sql.DB) error {
	const lockID = 5887940537704921958 // CRC-64 hash of "goose" (but could be anything else)

	// pg_advisory_lock and pg_advisory_unlock have to be executed on the same connection!
	conn, err := db.Conn(context.Background())
	if err != nil {
		return err
	}
	defer conn.Close() // return the connection to the pool

	if err := pgAdvisoryLock(conn, lockID); err != nil {
		return err
	}

	if err := goose.Up(db, "db/migrations"); err != nil {
		_ = pgAdvisoryUnLock(conn, lockID) // unlock but discard any error
		return err
	}
	return pgAdvisoryUnLock(conn, lockID)
}

As a workaround until #335 is implemented and to run the migration inside the same connection/session in which the exclusive session level advisory lock was obtained, I wrote a "hack" that use the connection pool settings to make sure it uses and re-use only a single connection:

// pgAdvisoryLock obtains the given exclusive session level advisory lock id on the given connection.
func pgAdvisoryLock(db *sql.DB, lockID int64) error {
	_, err := db.ExecContext(context.Background(), "select pg_advisory_lock($1);", lockID)
	if err != nil {
		return err
	}
	return nil
}

// pgAdvisoryUnLock releases the given exclusive session level advisory lock id on the given connection.
// IMPORTANT: session level advisory locks can only be released in the same database connection in which
// it was obtained.
func pgAdvisoryUnLock(db *sql.DB, lockID int64) error {
	_, err := db.ExecContext(context.Background(), "select pg_advisory_unlock($1);", lockID)
	if err != nil {
		return err
	}
	return nil
}

func migrate(db *sql.DB) error {
	if err := goose.SetDialect("postgres"); err != nil {
		return err
	}

	// Below is a workaround until https://github.com/pressly/goose/issues/335 is implemented.
	// It basically wraps Goose around a Postgres exclusive session level advisory lock.
	// We have to change the database connection pool settings to maintain opened only a single connection
	// for the following reasons:
	// 1) Goose supports only the db.SQL struct (we cannot pass it a database connection directly)
	// 2) Session level advisory locks can only be released in the same database connection in which it was
	//    obtained
	// 3) We want the migration to run inside the same connection in which the exclusive session level advisory
	//    lock was obtained (to avoid risk for the lock to be released while the migration are running in the case
	//    the connection/session in which the exclusive session level advisory lock was obtained is closed).
	db.SetMaxOpenConns(1)
	db.SetMaxIdleConns(1)
	db.SetConnMaxIdleTime(0)
	db.SetConnMaxLifetime(0)

	const lockID = 5887940537704921958 // CRC-64 hash of "goose" (but could be anything else)
	if err := pgAdvisoryLock(db, lockID); err != nil {
		return err
	}

	if err := goose.Up(db, "db/migrations"); err != nil {
		_ = pgAdvisoryUnLock(db, lockID) // unlock but discard any error
		return err
	}
	return pgAdvisoryUnLock(db, lockID)
}

@mfridman
Copy link
Collaborator

This functionality has been added in v3.16.0 and higher. Requires a goose.Provider and the goose.WithSessionLocker option.

A bit more detail in this blog post:

https://pressly.github.io/goose/blog/2023/goose-provider/#database-locking

Thank you to everyone for the feedback, fruitful discussions, and contributions! Going to close this issue, but feel free to continue the discussion and looking forward to your feedback.

NOTE: only postgres is currently supported, but we can add implementations as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants