-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect hardware or enrollment change #1492
Conversation
d80ee7a
to
b38f990
Compare
6eec17b
to
cfae2be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty good looking. I think we should hold off on merging until 1.3 is stable. But it looks about right
if err := tx.DeleteBucket([]byte(s.bucketName)); err != nil { | ||
return fmt.Errorf("deleting bucket: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this is called a bucket is deleted? Docs say:
DeleteBucket deletes a bucket. Returns an error if the bucket cannot be found or if the key represents a non-bucket value.
My guess is that we have to:
// If the bucket doesn't exist, it will error. And there's nothing to delete
if _, err := tx.Bucket([]byte(s.bucketName)); err != nil {
return nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like if a bucket doesn't exist, we should be alerted about that and bubble up the error? We expect the buckets to always exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so. After all, no buckets means we can't have enough info to decide to wipe the DB. 😆
But I wonder what happens if some of the buckets exist.
ee/agent/reset.go
Outdated
} | ||
|
||
if storedValue != nil && currentValue != string(storedValue) { | ||
k.Slogger().Log(context.TODO(), slog.LevelInfo, "hardware-identifying value has changed", "key", string(dataKey)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k.Slogger().Log(context.TODO(), slog.LevelInfo, "hardware-identifying value has changed", "key", string(dataKey)) | |
k.Slogger().Log(context.TODO(), slog.LevelInfo, "db-identifying value has changed", "key", string(dataKey)) |
I imagine this is being logged here, since it's the only place logging which key has changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value doesn't identify the database, it identifies the hardware (UUID or serial) or enrollment/rollout (tenant munemo). I'll update to hardware- or enrollment-identifying value has changed
since that's more precise, but let me know if it's still confusing/imprecise!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change triggers wiping the db, so db-identifying
was my lazy cojoining. 🤷
Maybe we just say what it is.
ee/agent/reset.go
Outdated
wipeDatabase(k) | ||
|
||
// Store the backup data | ||
if err := k.HostDataStore().Set(hostDataKeyOldHostData, backup); err != nil { | ||
k.Slogger().Log(ctx, slog.LevelWarn, "could not store database backup", "err", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If wipe fails, do we still want to store the backups?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted error handling for wipeDatabase and addressed this in 252dcc1
ee/agent/reset.go
Outdated
// takeDatabaseBackup retrieves the data we want to preserve from various db stores | ||
// as a snapshot of this db, appends it to previous snapshots if they exist, and | ||
// returns the collection of backup data. | ||
func takeDatabaseBackup(k types.Knapsack) ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't call this a backup
. (It's not, it's more of an oldDatabaseRecord
or tombstone
or something like that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that take backup
is misleading. I don't want to overload tombstone
with the k2 meaning (esp since we have a tombstone_id inside the returned data), and a database record sounds like a single item in a database rather than a collection of items to me. snapshot
, which I use in the docblock and I think in the tests too, feels a little less incorrect but still misleading.
Maybe prepareDatabaseResetRecord
? Now that we have both a reset reason and a reset timestamp, maybe it makes more sense to rename the oldHostData
struct to dbResetRecord
.
Going to go with that for now, but let me know if you have other ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about the method name, and it might make sense to change anything else. Maybe oldDatabaseRecord
? dbWipeRecord
or dbResetRecord
are good too.
Maybe I like dbResetRecord
best. Or maybe it's all good enough
ee/agent/reset.go
Outdated
localPubKey, err := getLocalPubKey(k) | ||
if err != nil { | ||
k.Slogger().Log(context.TODO(), slog.LevelWarn, "could not get local pubkey from store", "err", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but I know there are (and will be) multiple keys. So at some point we'll need this to become an array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The oldHostData
expects an array already! 🙂 We set PubKeys: [][]byte{localPubKey},
as an array of keys below. From elsewhere in the code, it looks like we usually pull this singular local pubkey, and separately pull the hardware key -- I think in the future we'd pull other keys here, and then dump them all into the PubKeys
array below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm blocking the merge here. The code is fine, but I'm not ready to have this merge, and am using the github tool to block it
Going to limit the scope of this PR to report and not wipe the database for now -- changes incoming. |
dbe0ad0
to
50bf07b
Compare
cmd/launcher/launcher.go
Outdated
@@ -258,6 +258,8 @@ func runLauncher(ctx context.Context, cancel func(), slogger, systemSlogger *mul | |||
signalListener := newSignalListener(sigChannel, cancel, logger) | |||
runGroup.Add("sigChannel", signalListener.Execute, signalListener.Interrupt) | |||
|
|||
agent.ResetDatabaseIfNeeded(ctx, k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we call this something else since it's not actually resetting anymore, or maybe leave a comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, thank you, very good point! I will update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed and added a comment as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NICE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 Let's try!
Relates to #1346
If hardware changes (serial or hardware UUID), or if a rollout changes (enrollment secret changes), log the change. This PR also prepares for backing up and resetting the database, but does not perform those tasks yet.