Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ws_restore: Restore WS between filesystems #73

Open
URZ-HD opened this issue Feb 4, 2021 · 2 comments
Open

ws_restore: Restore WS between filesystems #73

URZ-HD opened this issue Feb 4, 2021 · 2 comments

Comments

@URZ-HD
Copy link
Contributor

URZ-HD commented Feb 4, 2021

Hi,
we are migrating users from one filesystem (work) to another (gpfs) and set the old system to

allocatable no
extendable no

and make the new filesystem the default one..

But some users have expired workspaces, which they want to restore and migrate to the new filesystem.
It looks like "ws_restore" is not able to do this because in every combination the workspace could be found:

> ws_list
id: perftest-beegfs
[...]
     filesystem name      : work
     available extensions : 10
id: acltest
[...]
     filesystem name      : gpfs
     available extensions : 10

> ws_restore -l
work:
hd_qq150-io500-1612449887
        unavailable since Thu Feb  4 15:44:47 2021
gpfs:

> ws_restore hd_qq150-io500-1612449887 acltest
you are human
Error: workspace does not exist.

> ws_restore -F work hd_qq150-io500-1612449887 acltest
you are human
Error: workspace does not exist.

> ws_restore -F gpfs hd_qq150-io500-1612449887 acltest
you are human
Error: workspace does not exist.

The restoration to the existing workspaces "perftest-beegfs", which is on the same filesystem works as expected.

I' not sure if this is a bug or a feature, because restoring between filesystems will be more than only a fast "mv".
But in both cases it is a real problem for users.

The only way to solve this was to enable the allocation of "work" temporarily.

@holgerBerger
Copy link
Owner

You guessed right, it is a mv, and it should be fast.
So the two workspaces (the expired one and the new one) have to be in same filesystem.
The thinking behind it was if I remember it right that anything else then a mv could last long,
and user interrupting it with ctrl-c could leave some pretty strange states.
actually it would probably require some kind of multi stage implementation,
which could be pretty complex.

@URZ-HD
Copy link
Contributor Author

URZ-HD commented Feb 4, 2021

Ok, to prevent user interruption is of course a crucial part during the restore process.
But maybe the error message could be more clear if user try to restore between Filesystems ?

And additionally ... the admin can configure the "restorable" and "allocatable" parameter seperately. But if allocation is not allowed, a user has no way to restore a workspace successfully (unless an previous ws is still existing on the filesystem).

So maybe some additional checks are useful for the ws_restore command, e.g:

  • if target filesystem is specified and existing, check if the filesystem matches with the given deleted workspace
  • when listing the restorable workspaces (-l), consider only workspaces if user has an available target workspace (maybe mark the other ones as not-restoreable but still existing)
  • or if no target workspace is given, list only the workspaces on allocatable filesystems (and mark the other ones as not-restorable)

I think these additonal checks are very helpfull if you have more than one or two filesystems in your cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants