Skip to content

(PUP-2958) Protect the SSL state machine with an ssl lockfile#7530

Merged
kris-bosland merged 5 commits intopuppetlabs:masterfrom
joshcooper:rerevert-ssl-lock
May 31, 2019
Merged

(PUP-2958) Protect the SSL state machine with an ssl lockfile#7530
kris-bosland merged 5 commits intopuppetlabs:masterfrom
joshcooper:rerevert-ssl-lock

Conversation

@joshcooper
Copy link
Contributor

@joshcooper joshcooper commented May 21, 2019

Commit 926e0e6 locked the agent lockfile before running the SSL state machine, but had to be reverted in 3e82d0f due to puppet infrastructure locking the agent lock and running puppet ssl bootstrap.

These commits introduce a new ssl lockfile so that the ssl and catalog application lock files have different scopes.

Pass cert and ssl providers to the state machine so that we can allow/expect on
those instances rather than relying on any_instance, which is more brittle.
Use break instead of return to avoid non-local return.
Previously the Timeout::Error referred to a non-existent `@path` instance
variable. Refer to the `path` argument instead and update the spec test to
match.
Puppet will poll for the lockfile, up to the maximum `timeout` which defaults to
5 minutes. Add debugging messages so it's clear what's going on when running
with `puppet agent -td`.
@joshcooper joshcooper requested a review from a team May 21, 2019 01:17
@puppetcla
Copy link

CLA signed by all contributors.

@joshcooper
Copy link
Contributor Author

jenkins please test this on windows2016-64a with servertests

#
# @private
class Puppet::SSL::StateMachine
LOCKFILE_TIMEOUT_SECS = 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this long enough? Can it be user configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I debated making it configurable. One thing is right now there isn't any feedback if the agent is stuck polling for the lock, unless you're running with debug. I imagine that could cause some confusion. Thinking about making the retry message a warning instead:

Warning: Failed to lock '/Users/josh/.puppetlabs/etc/puppet/ssl/ssl.lock' retrying in 1.94 milliseconds

Thoiughts? Then I could increase the default timeout and make it configurable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the retry message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it configurable. I didn't make the message a warning, because I'm concerned it could cause transient failures in CI, due to tests matching puppet agent output, especially in PE.

#
# @private
class Puppet::SSL::StateMachine
LOCKFILE_TIMEOUT_SECS = 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the retry message.

@joshcooper joshcooper force-pushed the rerevert-ssl-lock branch from d8c09d9 to 48c64d3 Compare May 29, 2019 20:14
@jtappa
Copy link
Contributor

jtappa commented May 29, 2019

At least one of the AppVeyor failures looks legit

@joshcooper joshcooper force-pushed the rerevert-ssl-lock branch from 48c64d3 to cea456d Compare May 30, 2019 18:05
Adds a new `Puppet[:ssl_lockfile]` setting of type `:string`. We don't use
`:file`, because we don't want puppet to create it when applying settings
catalogs.

Protect the ssl state machine with the ssl lockfile.
@joshcooper joshcooper force-pushed the rerevert-ssl-lock branch from cea456d to be4abce Compare May 30, 2019 23:43
@joshcooper
Copy link
Contributor Author

joshcooper commented May 30, 2019

I reworked this to not use File#flock as that led to many platform specific issues with POSIX file locking and Windows. For now, we create an ssl specific lockfile (using the same locking mechanism as is used for the catalog application lockfile). If we fail to acquire the lock, then we fail and don't retry. We have a separate epic to improve agent locking, so I think it's best to solve the "stale lock recovery" issue in that epic.

@joshcooper joshcooper requested a review from kris-bosland May 31, 2019 04:12
@kris-bosland kris-bosland merged commit e8a59b3 into puppetlabs:master May 31, 2019
@joshcooper joshcooper deleted the rerevert-ssl-lock branch May 31, 2019 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants