Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hold Auth init lock for the duration of initialization #29593

Merged
merged 1 commit into from
Jul 27, 2023

Conversation

rosstimothy
Copy link
Contributor

Auth now uses RunWhileLocked instead of AcquireLock to ensure that the initialization lock is held until the bootstrapping process is completed. Prior, Auth only held the lock for 30s which could allow multiple Auth instances to attempt bootstrapping simultaneously. Initialization should complete prior to 30s in most cases, but it is not guaranteed, especially on first boot when CAs are being generated and a large data migration may be needed.

Auth now uses `RunWhileLocked` instead of `AcquireLock` to ensure
that the initialization lock is held until the bootstrapping
process is completed. Prior, Auth only held the lock for 30s which
could allow multiple Auths to attempt bootstrapping simultaneously.
Initialization should complete prior to 30s in most cases, but it
is not guarateed, especially on first boot when CAs are being
generated and a large data migration may be needed.
@rosstimothy rosstimothy marked this pull request as ready for review July 25, 2023 19:04
@github-actions github-actions bot requested review from ravicious and Tener July 25, 2023 19:04
Backend: cfg.Backend,
LockName: domainName,
TTL: 30 * time.Second,
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does RunWhileLocked release the lock as soon as the operation is completed, or only after the TTL elapses?

If the former, what's the harm in making this (for example) 60s?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the lock is released before returning from RunWhileLocked:

// lock.Release should be called with separate ctx. If someone cancels via ctx
// RunWhileLocked method, we want to at least try releasing lock.
releaseLockCtx, releaseLockCancel := context.WithTimeout(context.Background(), cfg.ReleaseCtxTimeout)
defer releaseLockCancel()
if err := lock.Release(releaseLockCtx, cfg.Backend); err != nil {
return trace.NewAggregate(fnErr, err)
}

In the event that the Auth instance holding the lock is terminated during initialization a longer TTL would block any other instances from initializing until they could acquire the lock. The longer TTL would result in less backend activity caused by lengthy migrations though.

@@ -320,7 +333,7 @@ func Init(ctx context.Context, cfg InitConfig, opts ...ServerOption) (*Server, e
// singletons). However, we need to keep them around while Telekube uses them.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is Telekube?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷 - I have intentionally left the legacy/odd things in Init alone while focusing on improving start up time and reliability of initialization. I was planning to circle back to cfg.Roles, cfg.Authorities, cfg.ReverseTunnels and migrateLegacyResources to determine if they can be eliminated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was the name of teleport when it was running as a component of Gravity. Some deep cuts.

@rosstimothy
Copy link
Contributor Author

Friendly ping @ravicious @Tener

@rosstimothy rosstimothy added this pull request to the merge queue Jul 27, 2023
Merged via the queue into master with commit 6eb0551 Jul 27, 2023
27 checks passed
@rosstimothy rosstimothy deleted the tross/auth_init_lock branch July 27, 2023 18:04
@public-teleport-github-review-bot

@rosstimothy See the table below for backport results.

Branch Result
branch/v11 Failed
branch/v12 Failed
branch/v13 Create PR

rosstimothy added a commit that referenced this pull request Jul 27, 2023
Auth now uses `RunWhileLocked` instead of `AcquireLock` to ensure
that the initialization lock is held until the bootstrapping
process is completed. Prior, Auth only held the lock for 30s which
could allow multiple Auths to attempt bootstrapping simultaneously.
Initialization should complete prior to 30s in most cases, but it
is not guarateed, especially on first boot when CAs are being
generated and a large data migration may be needed.
rosstimothy added a commit that referenced this pull request Jul 27, 2023
Auth now uses `RunWhileLocked` instead of `AcquireLock` to ensure
that the initialization lock is held until the bootstrapping
process is completed. Prior, Auth only held the lock for 30s which
could allow multiple Auths to attempt bootstrapping simultaneously.
Initialization should complete prior to 30s in most cases, but it
is not guarateed, especially on first boot when CAs are being
generated and a large data migration may be needed.
github-merge-queue bot pushed a commit that referenced this pull request Jul 28, 2023
* Add configuration with optional timeout to AcquireLock (#24559)

* Add configuration with optional timeout to AcquireLock

* rename to RetryInterval

* backport RunWhileLocked changes from #25639

* Hold Auth init lock for the duration of initialization (#29593)

Auth now uses `RunWhileLocked` instead of `AcquireLock` to ensure
that the initialization lock is held until the bootstrapping
process is completed. Prior, Auth only held the lock for 30s which
could allow multiple Auths to attempt bootstrapping simultaneously.
Initialization should complete prior to 30s in most cases, but it
is not guarateed, especially on first boot when CAs are being
generated and a large data migration may be needed.

---------

Co-authored-by: Tobiasz Heller <14020794+tobiaszheller@users.noreply.github.com>
github-merge-queue bot pushed a commit that referenced this pull request Jul 28, 2023
* Add configuration with optional timeout to AcquireLock (#24559)

* Add configuration with optional timeout to AcquireLock

* rename to RetryInterval

* backport RunWhileLocked changes from #25639

* Hold Auth init lock for the duration of initialization (#29593)

Auth now uses `RunWhileLocked` instead of `AcquireLock` to ensure
that the initialization lock is held until the bootstrapping
process is completed. Prior, Auth only held the lock for 30s which
could allow multiple Auths to attempt bootstrapping simultaneously.
Initialization should complete prior to 30s in most cases, but it
is not guarateed, especially on first boot when CAs are being
generated and a large data migration may be needed.

---------

Co-authored-by: Tobiasz Heller <14020794+tobiaszheller@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants