Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect hardware or enrollment change #1492

Merged
merged 28 commits into from
Dec 29, 2023

Conversation

RebeccaMahany
Copy link
Contributor

@RebeccaMahany RebeccaMahany commented Dec 4, 2023

Relates to #1346

If hardware changes (serial or hardware UUID), or if a rollout changes (enrollment secret changes), log the change. This PR also prepares for backing up and resetting the database, but does not perform those tasks yet.

pkg/agent/reset.go Outdated Show resolved Hide resolved
pkg/agent/reset.go Outdated Show resolved Hide resolved
pkg/agent/reset.go Outdated Show resolved Hide resolved
pkg/agent/reset.go Outdated Show resolved Hide resolved
pkg/agent/knapsack/knapsack.go Outdated Show resolved Hide resolved
pkg/agent/reset.go Outdated Show resolved Hide resolved
pkg/agent/reset.go Outdated Show resolved Hide resolved
pkg/agent/reset.go Outdated Show resolved Hide resolved
pkg/agent/reset.go Outdated Show resolved Hide resolved
pkg/agent/reset.go Outdated Show resolved Hide resolved
pkg/agent/reset.go Outdated Show resolved Hide resolved
pkg/agent/reset.go Outdated Show resolved Hide resolved
pkg/agent/reset.go Outdated Show resolved Hide resolved
ee/agent/reset.go Outdated Show resolved Hide resolved
ee/agent/reset.go Outdated Show resolved Hide resolved
ee/agent/reset.go Outdated Show resolved Hide resolved
ee/agent/reset.go Outdated Show resolved Hide resolved
Copy link
Contributor

@directionless directionless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty good looking. I think we should hold off on merging until 1.3 is stable. But it looks about right

Comment on lines +128 to +130
if err := tx.DeleteBucket([]byte(s.bucketName)); err != nil {
return fmt.Errorf("deleting bucket: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this is called a bucket is deleted? Docs say:

DeleteBucket deletes a bucket. Returns an error if the bucket cannot be found or if the key represents a non-bucket value.

My guess is that we have to:

// If the bucket doesn't exist, it will error. And there's nothing to delete
if _, err := tx.Bucket([]byte(s.bucketName)); err != nil {
return nil
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like if a bucket doesn't exist, we should be alerted about that and bubble up the error? We expect the buckets to always exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so. After all, no buckets means we can't have enough info to decide to wipe the DB. 😆

But I wonder what happens if some of the buckets exist.

ee/agent/storage/bbolt/stores_bbolt.go Outdated Show resolved Hide resolved
ee/agent/reset.go Show resolved Hide resolved
}

if storedValue != nil && currentValue != string(storedValue) {
k.Slogger().Log(context.TODO(), slog.LevelInfo, "hardware-identifying value has changed", "key", string(dataKey))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
k.Slogger().Log(context.TODO(), slog.LevelInfo, "hardware-identifying value has changed", "key", string(dataKey))
k.Slogger().Log(context.TODO(), slog.LevelInfo, "db-identifying value has changed", "key", string(dataKey))

I imagine this is being logged here, since it's the only place logging which key has changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value doesn't identify the database, it identifies the hardware (UUID or serial) or enrollment/rollout (tenant munemo). I'll update to hardware- or enrollment-identifying value has changed since that's more precise, but let me know if it's still confusing/imprecise!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change triggers wiping the db, so db-identifying was my lazy cojoining. 🤷

Maybe we just say what it is.

Comment on lines 76 to 81
wipeDatabase(k)

// Store the backup data
if err := k.HostDataStore().Set(hostDataKeyOldHostData, backup); err != nil {
k.Slogger().Log(ctx, slog.LevelWarn, "could not store database backup", "err", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If wipe fails, do we still want to store the backups?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted error handling for wipeDatabase and addressed this in 252dcc1

// takeDatabaseBackup retrieves the data we want to preserve from various db stores
// as a snapshot of this db, appends it to previous snapshots if they exist, and
// returns the collection of backup data.
func takeDatabaseBackup(k types.Knapsack) ([]byte, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't call this a backup. (It's not, it's more of an oldDatabaseRecord or tombstone or something like that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that take backup is misleading. I don't want to overload tombstone with the k2 meaning (esp since we have a tombstone_id inside the returned data), and a database record sounds like a single item in a database rather than a collection of items to me. snapshot, which I use in the docblock and I think in the tests too, feels a little less incorrect but still misleading.

Maybe prepareDatabaseResetRecord? Now that we have both a reset reason and a reset timestamp, maybe it makes more sense to rename the oldHostData struct to dbResetRecord.

Going to go with that for now, but let me know if you have other ideas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about the method name, and it might make sense to change anything else. Maybe oldDatabaseRecord ? dbWipeRecord or dbResetRecord are good too.

Maybe I like dbResetRecord best. Or maybe it's all good enough

ee/agent/reset.go Outdated Show resolved Hide resolved
Comment on lines 232 to 235
localPubKey, err := getLocalPubKey(k)
if err != nil {
k.Slogger().Log(context.TODO(), slog.LevelWarn, "could not get local pubkey from store", "err", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but I know there are (and will be) multiple keys. So at some point we'll need this to become an array

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The oldHostData expects an array already! 🙂 We set PubKeys: [][]byte{localPubKey}, as an array of keys below. From elsewhere in the code, it looks like we usually pull this singular local pubkey, and separately pull the hardware key -- I think in the future we'd pull other keys here, and then dump them all into the PubKeys array below.

ee/agent/reset.go Outdated Show resolved Hide resolved
@RebeccaMahany RebeccaMahany marked this pull request as ready for review December 7, 2023 14:46
zackattack01
zackattack01 previously approved these changes Dec 7, 2023
Copy link
Contributor

@zackattack01 zackattack01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

Copy link
Contributor

@directionless directionless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm blocking the merge here. The code is fine, but I'm not ready to have this merge, and am using the github tool to block it

@RebeccaMahany RebeccaMahany changed the title Back up and wipe the database if hardware changes [Do not merge until after v1.3.0 goes to stable] Back up and wipe the database if hardware changes Dec 7, 2023
@RebeccaMahany RebeccaMahany changed the title [Do not merge until after v1.3.0 goes to stable] Back up and wipe the database if hardware changes Back up and wipe the database if hardware changes Dec 21, 2023
@RebeccaMahany
Copy link
Contributor Author

RebeccaMahany commented Dec 21, 2023

@directionless this is ready to merge now that 1.3.2 is released to stable

Going to limit the scope of this PR to report and not wipe the database for now -- changes incoming.

@RebeccaMahany RebeccaMahany reopened this Dec 21, 2023
@RebeccaMahany RebeccaMahany changed the title Back up and wipe the database if hardware changes [Do not merge] Back up and wipe the database if hardware changes Dec 22, 2023
@RebeccaMahany RebeccaMahany changed the title [Do not merge] Back up and wipe the database if hardware changes Detect hardware or enrollment change Dec 22, 2023
@@ -258,6 +258,8 @@ func runLauncher(ctx context.Context, cancel func(), slogger, systemSlogger *mul
signalListener := newSignalListener(sigChannel, cancel, logger)
runGroup.Add("sigChannel", signalListener.Execute, signalListener.Interrupt)

agent.ResetDatabaseIfNeeded(ctx, k)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we call this something else since it's not actually resetting anymore, or maybe leave a comment here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, thank you, very good point! I will update

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed and added a comment as well

Copy link
Contributor

@James-Pickett James-Pickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NICE

Copy link
Contributor

@directionless directionless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Let's try!

@RebeccaMahany RebeccaMahany added this pull request to the merge queue Dec 29, 2023
Merged via the queue into kolide:main with commit ac7bbca Dec 29, 2023
25 checks passed
@RebeccaMahany RebeccaMahany deleted the becca/reset-db branch December 29, 2023 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants