-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Add an API to support snapshots in ZFS #2633
Conversation
b6a00ae
to
0c62a82
Compare
8ea5b1a
to
ab556b0
Compare
888f708
to
4d9e662
Compare
29801ab
to
fa4c17c
Compare
802c29a
to
e4d0e38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, so what's the proposed use case for this?
The implementation of the snapshot functionality is in PR #2607 (Will be updated after the API merge), if you are talking about it. |
I haven't seen any explanation in that other PR either. |
api/proto/info/info.proto
Outdated
Z_SNAPSHOT_STATE_DELETING = 3; | ||
// This state is used to send information to the controller about a | ||
// snapshot that was implicitly deleted after a rollback snapshot | ||
// or volume delete command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the volume is deleted it means all snapshots for that volume are also deleted. If we are going to include those as IMPLICTLY_DELETED, then for how long time do you expect EVE-OS to keep sending them as IMPLICITLY_DELETED?
It might be simpler to just omit snapshots for volumes which have been deleted from the info message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the volume is deleted it means all snapshots for that volume are also deleted. If we are going to include those as IMPLICTLY_DELETED, then for how long time do you expect EVE-OS to keep sending them as IMPLICITLY_DELETED?
I'm assuming that after the rollback we report that the snapshot has been implicitly deleted, and the next time we don't expect to get a SnapshotConfig for it.
It might be simpler to just omit snapshots for volumes which have been deleted from the info message.
I think we can't rely on the controller to know that newer snapshots have been deleted on rollback.
If the controller deletes older snapshots before the rollback, then reporting implicit deletion will not break the logic, but if the controller does not delete it, then it will be critical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming that after the rollback we report that the snapshot has been implicitly deleted, and the next time we don't expect to get a SnapshotConfig for it.
It doesn't make sense for EVE-OS to make such an assumption. For one reason there can be races where you report the info with the implicitly deleted at the same time as you fetch the EdgeDevConfig. And with network outages there could be an hour or a week from processing a rollback until the info and a subsequent config are handled. Last but not least, EVE-OS should be robust even if the controller doesn't remove the snapshots from the config even after a rollback. (I'd expect a well-behaved controller to remove them as part of putting the rollbackcmd in the config, but EVE-OS code should not assume such things.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to avoid a race condition for such a case, we can add a condition an additional check for the presence of a logical volume, before sending "implicitly deleted", and if the logical volume for the snapshot does not already exist, then do not send a message with such a status.
As for the rest, we can't delete what is already implicitly deleted, so EVE will mark messages as deleted and not notify about them in the next informational messages, but if we still continue to receive configurations for implicitly deleted messages, then it is likely on the controller that something went wrong and we should probably send an update on snapshots that are no longer there.
In any case, this should not affect the stability of EVE, except that there is a question of informing.
Wouldn't this be a reliable option for EVE-OS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't part of the API definition but the EVE-OS implementation but I'm asking to make sure we won't hit issues in the implementation. If a controller is suboptimal and does not remove the newer SnapshotConfig when doing a rollback to something old.
For example, start with uuid=1 for volume_uuid=X, then add a second SnapshotConfig with uuid=2 for volume_uuid=X and then later it can send a RollbackCmd with snapshot_uuid=1, and the SnapshotConfig with uuid=2 for volume_uuid=X remains. How does EVE-OS know that it has already done (and implicitly deleted) uuid=2? Does it need to persist the implicitly deleted SnapshotConfig across reboots so that it can tell it should ignore that stale uuid=2 from the controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does EVE-OS know that it has already done (and implicitly deleted) uuid=2? Does it need to persist the implicitly deleted SnapshotConfig across reboots so that it can tell it should ignore that stale uuid=2 from the controller?
Yes, in the implementation is supposed to save statuses for all snapshots in EVE. As such, EVE must always be aware of the status of snapshots that have been removed.
api/proto/info/info.proto
Outdated
// This state is used to send information to the controller about a | ||
// snapshot that was implicitly deleted after a rollback snapshot | ||
// or volume delete command. After sending a message with this status once, | ||
// EVE no longer expects to receive the configuration (SnapshotConfig) for this snapshot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't make that assumption.
EVE-OS will ignore the snapshot config for the entries which have been implictly deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, EVE-OS will ignore the snapshot configuration for entries that have been implicitly deleted. We can't do anything with a snapshot that doesn't exist.
But we can send a message about its status if for some reason the controller does not know that it was implicitly deleted.
api/proto/metrics/metrics.proto
Outdated
// unique to (and thus used by) other snapshots. | ||
message ZMetricSnapshot { | ||
// Snapshot UUID | ||
string uuid = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense calling this field snapshot_uuid as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected
string snapshot_uuid = 1; | ||
string volume_uuid = 2; // UUID of the volume to rollback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need snapshot_uuid
and volume_uuid
here? As I can see, RollbackCmd
included into SnapshotConfig
. SnapshotConfig
has both fields inside. What is the goal to duplicate fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems it was in comment from @eriknordmark #2633 (comment). Do we plan to reuse RollbackCmd somewhere else?
api/proto/metrics/metrics.proto
Outdated
// The amount of space that is "logically" accessible by this dataset. | ||
// See the referenced property. The logical space ignores the effect of | ||
// the compression and copies properties, giving a quantity closer to | ||
// the amount of data that applications see. | ||
// However, it does include space consumed by metadata. | ||
// (in bytes) | ||
uint64 logicalreferenced = 7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot understand from the comment for logicalreferenced
what is the amount of data that applications see
in case of snapshot.
More general question: is it possible to calculate overhead of the particular snapshot at a time? I mean, we have something like current usage and maximum space limit for zvols (volumes). Is there some field that will indicate that snapshot X use N bytes, snapshot Y use M bytes from the persist storage pool at some moment of time?
Signed-off-by: Vitaliy Kuznetsov <ahil52_25@mail.ru>
Signed-off-by: Vitaliy Kuznetsov <ahil52_25@mail.ru>
@OhmSpectator could you please take a look on this? Do you need this for your current snapshot implementation? Or we can close this? |
Leave it for now, please. I can use it as one of the references. Maybe we can add some tag to the PR. |
@OhmSpectator I made it into a draft for now |
Done using this as a reference for #3216 |
This PR adds an API that allows you to process commands for working with snapshots in ZFS and send information about them.
This PR is the first part of two and will make it easier to test the functionality for working with snapshots before merging into the main branch.