[Draft] Add an API to support snapshots in ZFS #2633

vk-en · 2022-05-23T06:14:06Z

This PR adds an API that allows you to process commands for working with snapshots in ZFS and send information about them.

This PR is the first part of two and will make it easier to test the functionality for working with snapshots before merging into the main branch.

api/proto/info/info.proto

api/proto/config/storage.proto

api/proto/info/info.proto

api/proto/metrics/metrics.proto

api/proto/config/storage.proto

rvs

Wait, so what's the proposed use case for this?

vk-en · 2022-06-10T16:08:39Z

Wait, so what's the proposed use case for this?

The implementation of the snapshot functionality is in PR #2607 (Will be updated after the API merge), if you are talking about it.

rvs · 2022-06-11T11:14:12Z

The implementation of the snapshot functionality is in PR #2607 (Will be updated after the API merge), if you are talking about it.

I haven't seen any explanation in that other PR either.

api/proto/config/storage.proto

eriknordmark · 2022-07-21T09:20:11Z

api/proto/info/info.proto

+  Z_SNAPSHOT_STATE_DELETING = 3;
+  // This state is used to send information to the controller about a
+  // snapshot that was implicitly deleted after a rollback snapshot
+  // or volume delete command.


If the volume is deleted it means all snapshots for that volume are also deleted. If we are going to include those as IMPLICTLY_DELETED, then for how long time do you expect EVE-OS to keep sending them as IMPLICITLY_DELETED?
It might be simpler to just omit snapshots for volumes which have been deleted from the info message.

If the volume is deleted it means all snapshots for that volume are also deleted. If we are going to include those as IMPLICTLY_DELETED, then for how long time do you expect EVE-OS to keep sending them as IMPLICITLY_DELETED?

I'm assuming that after the rollback we report that the snapshot has been implicitly deleted, and the next time we don't expect to get a SnapshotConfig for it.

It might be simpler to just omit snapshots for volumes which have been deleted from the info message.

I think we can't rely on the controller to know that newer snapshots have been deleted on rollback.
If the controller deletes older snapshots before the rollback, then reporting implicit deletion will not break the logic, but if the controller does not delete it, then it will be critical.

I'm assuming that after the rollback we report that the snapshot has been implicitly deleted, and the next time we don't expect to get a SnapshotConfig for it.

It doesn't make sense for EVE-OS to make such an assumption. For one reason there can be races where you report the info with the implicitly deleted at the same time as you fetch the EdgeDevConfig. And with network outages there could be an hour or a week from processing a rollback until the info and a subsequent config are handled. Last but not least, EVE-OS should be robust even if the controller doesn't remove the snapshots from the config even after a rollback. (I'd expect a well-behaved controller to remove them as part of putting the rollbackcmd in the config, but EVE-OS code should not assume such things.)

If we want to avoid a race condition for such a case, we can add a condition an additional check for the presence of a logical volume, before sending "implicitly deleted", and if the logical volume for the snapshot does not already exist, then do not send a message with such a status.

As for the rest, we can't delete what is already implicitly deleted, so EVE will mark messages as deleted and not notify about them in the next informational messages, but if we still continue to receive configurations for implicitly deleted messages, then it is likely on the controller that something went wrong and we should probably send an update on snapshots that are no longer there.

In any case, this should not affect the stability of EVE, except that there is a question of informing.

Wouldn't this be a reliable option for EVE-OS?

This isn't part of the API definition but the EVE-OS implementation but I'm asking to make sure we won't hit issues in the implementation. If a controller is suboptimal and does not remove the newer SnapshotConfig when doing a rollback to something old.
For example, start with uuid=1 for volume_uuid=X, then add a second SnapshotConfig with uuid=2 for volume_uuid=X and then later it can send a RollbackCmd with snapshot_uuid=1, and the SnapshotConfig with uuid=2 for volume_uuid=X remains. How does EVE-OS know that it has already done (and implicitly deleted) uuid=2? Does it need to persist the implicitly deleted SnapshotConfig across reboots so that it can tell it should ignore that stale uuid=2 from the controller?

How does EVE-OS know that it has already done (and implicitly deleted) uuid=2? Does it need to persist the implicitly deleted SnapshotConfig across reboots so that it can tell it should ignore that stale uuid=2 from the controller?

Yes, in the implementation is supposed to save statuses for all snapshots in EVE. As such, EVE must always be aware of the status of snapshots that have been removed.

api/proto/info/info.proto

api/proto/metrics/metrics.proto

api/proto/info/info.proto

eriknordmark · 2022-07-27T11:57:55Z

api/proto/info/info.proto

+  // This state is used to send information to the controller about a
+  // snapshot that was implicitly deleted after a rollback snapshot
+  // or volume delete command. After sending a message with this status once,
+  // EVE no longer expects to receive the configuration (SnapshotConfig) for this snapshot.


Can't make that assumption.
EVE-OS will ignore the snapshot config for the entries which have been implictly deleted.

That's right, EVE-OS will ignore the snapshot configuration for entries that have been implicitly deleted. We can't do anything with a snapshot that doesn't exist.
But we can send a message about its status if for some reason the controller does not know that it was implicitly deleted.

eriknordmark · 2022-07-27T11:58:45Z

api/proto/metrics/metrics.proto

+// unique to (and thus used by) other snapshots.
+message ZMetricSnapshot {
+  // Snapshot UUID
+  string uuid = 1;


Makes sense calling this field snapshot_uuid as well.

giggsoff · 2022-08-02T11:06:36Z

api/proto/config/storage.proto

+  string snapshot_uuid = 1;
+  string volume_uuid = 2;    // UUID of the volume to rollback


Do we need snapshot_uuid and volume_uuid here? As I can see, RollbackCmd included into SnapshotConfig. SnapshotConfig has both fields inside. What is the goal to duplicate fields?

Seems it was in comment from @eriknordmark #2633 (comment). Do we plan to reuse RollbackCmd somewhere else?

api/proto/info/info.proto

giggsoff · 2022-08-02T11:33:16Z

api/proto/metrics/metrics.proto

+  // The amount of space that is "logically" accessible by this dataset.
+  // See the referenced property. The logical space ignores the effect of
+  // the compression and copies properties, giving a quantity closer to
+  // the amount of data that applications see.
+  // However, it does include space consumed by metadata.
+  // (in bytes)
+  uint64 logicalreferenced = 7;


I cannot understand from the comment for logicalreferenced what is the amount of data that applications see in case of snapshot.
More general question: is it possible to calculate overhead of the particular snapshot at a time? I mean, we have something like current usage and maximum space limit for zvols (volumes). Is there some field that will indicate that snapshot X use N bytes, snapshot Y use M bytes from the persist storage pool at some moment of time?

api/proto/metrics/metrics.proto

Signed-off-by: Vitaliy Kuznetsov <ahil52_25@mail.ru>

rouming · 2023-01-19T11:51:31Z

@OhmSpectator could you please take a look on this? Do you need this for your current snapshot implementation? Or we can close this?

OhmSpectator · 2023-01-19T12:11:26Z

Leave it for now, please. I can use it as one of the references. Maybe we can add some tag to the PR.

eriknordmark · 2023-01-24T00:39:23Z

@OhmSpectator I made it into a draft for now

eriknordmark · 2023-05-18T07:26:20Z

Done using this as a reference for #3216

vk-en requested review from eriknordmark and rvs as code owners May 23, 2022 06:14

vk-en force-pushed the zfsSnapAPI branch 3 times, most recently from b6a00ae to 0c62a82 Compare May 23, 2022 09:38

giggsoff reviewed May 23, 2022

View reviewed changes

api/proto/info/info.proto Outdated Show resolved Hide resolved

vk-en force-pushed the zfsSnapAPI branch 2 times, most recently from 8ea5b1a to ab556b0 Compare May 23, 2022 12:38

eriknordmark reviewed May 23, 2022

View reviewed changes

api/proto/config/storage.proto Outdated Show resolved Hide resolved