Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Storage Management API #15295

Merged
merged 42 commits into from Mar 20, 2023
Merged

Conversation

davelopez
Copy link
Contributor

@davelopez davelopez commented Jan 11, 2023

This adds some new API endpoints to simplify the user's monitoring and cleanup of used storage.

Before these changes, the Storage Dashboard (#13113) in the client had to do some extra logic and workarounds to use the existing API. This also offers the opportunity to improve the performance of some of the requests.

I'm still a bit unsure about the route naming, so far this is the convention I will follow unless there are other ideas:
/api/storage/{object}/{condition}/[summary], where {object} can be either history or datasets and {condition} refers to the way they are discovered, for example:

  • For deleted and non-purged histories (discarded) the relevant endpoints will look like this:
    GET /api/storage/histories/discarded/summary
    GET /api/storage/histories/discarded
    
  • For "big" datasets (bigger than a certain user-defined amount) it will look like this:
    GET /api/storage/datasets/big/summary
    GET /api/storage/datasets/big
    
  • For old histories and datasets, etc... you get the idea :)

The summary endpoints will just return the total number of objects and total size that is taken by them,

{
  "total_size": 4242283,
  "total_items": 2
}

while the regular endpoint will return a paginated list of objects with the following information for each object:

[
  {
      "id": "79966582feb6c081",
      "name": "The name of the history",
      "type": "history", // or "dataset"
      "size": 2444027,
      "update_time": "2022-12-17T09:54:59.292171"
  },
  //...
]

There is also an endpoint for purging objects in batch ``, you can pass a list of object IDs and it will purge all of them and return detailed information about the result:

{
  "total_item_count": 5,
  "success_item_count": 4,
  "total_free_bytes": 546987560,
  "errors": [
    {
      "item_id": "79966582feb6c081",
      "error": "This item couldn't be cleaned for some reason"
    }
  ]
}

TODO

  • Implement endpoints for discarded histories
  • Implement endpoints for discarded datasets
  • Adapt client to new endpoints

How to test the changes?

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@nsoranzo
Copy link
Member

I think /api/storage/{object}/{condition}/ feels more "natural" than /api/storage/{condition}/{object}/ but maybe that's just me. I haven't looked at the PR code, so maybe there's a technical reason for the order.

@davelopez
Copy link
Contributor Author

Thanks, Nicola, there is no technical reason for the order. I just thought that it reads more naturally this way:
GET /api/storage/discarded/histories/summary -> "Give me the discarded histories summary" or "Give me the big datasets summary" since the {condition} will be an adjective.
While:
GET /api/storage/histories/discarded/summary -> "Give me the histories discarded summary" doesn't read as nicely, but it may be more in line with what we already have 🤔

So I'm 100% fine with either option (or a totally different one) 😄

@nsoranzo
Copy link
Member

I guess /api/storage/histories could return something like:

{
    "big": 34,
    "discarded": 43,
    "other_condition": N,
}

and similarly for datasets.

@davelopez
Copy link
Contributor Author

davelopez commented Jan 11, 2023

Ok, that is a good point. I won't implement this general endpoint for now, but it might be helpful in the future, so I will change the order to accommodate it.

By the way, I thought the deprecated module was standard but apparently, it isn't. Is it worth installing or better just drop the commit Mark manual service encoding/decoding as deprecated?

@davelopez davelopez force-pushed the add_storage_api branch 2 times, most recently from b21dba2 to 1c67f4b Compare January 27, 2023 13:32
@davelopez davelopez marked this pull request as ready for review January 31, 2023 14:18
@github-actions github-actions bot added this to the 23.1 milestone Jan 31, 2023
@davelopez davelopez force-pushed the add_storage_api branch 2 times, most recently from 612b311 to 8bcedd3 Compare February 3, 2023 10:36
@itisAliRH
Copy link
Member

ReviewCleanupDialog shows the wrong pagination numbers after permanently deleting items:

storage.dashboard.mp4

select all checkbox state is not correct when clicked:

storage.dashboard.2.mp4

@davelopez
Copy link
Contributor Author

Thank you for having a look @itisAliRH! 👍

ReviewCleanupDialog shows the wrong pagination numbers after permanently deleting items

Does this happen with Celery disabled? With Celery enabled, it looks like when the cleanup request finishes and reports the results celery tasks in the background are still processing the actual purging of the datasets... so when you get the updated summary after the cleanup it still thinks that some of the items are not purged. Then when you open the review dialog some more might have been purged by then... this is definitely a bug. I need to think about how to get the requests in sync... 🤔

select all checkbox state is not correct when clicked

This is intentional behavior, the checkbox at the top left is only for the current "page" due to the items being paginated. If you really want to select "all" then you need to click the "select all X items" link at the top right. But it looks like it may be confusing... any UX ideas to communicate this better to the user?

@davelopez
Copy link
Contributor Author

I think I fixed the issue by splitting the cleanup of datasets into 2 phases:
1- Mark all HDAs as purged in a single transaction
2- Launch a single task for removing all the underlying datasets from the store (or alternatively just synchronously if celery is disabled)

This should make every subsequent request consistent independently of the status of the scheduled task.
The cleanup result also will be more accurate now, meaning that removing a copy of a dataset will actually report that has freed 0 bytes since it was just a copy. I will try to add more information to the cleanup result dialog so users don't get confused by these "differences" in expectations of storage space recovered.

@itisAliRH can you give it another try with these changes? Thank you!

@itisAliRH
Copy link
Member

I think I fixed the issue by splitting the cleanup of datasets into 2 phases: 1- Mark all HDAs as purged in a single transaction 2- Launch a single task for removing all the underlying datasets from the store (or alternatively just synchronously if celery is disabled)

This should make every subsequent request consistent independently of the status of the scheduled task. The cleanup result also will be more accurate now, meaning that removing a copy of a dataset will actually report that has freed 0 bytes since it was just a copy. I will try to add more information to the cleanup result dialog so users don't get confused by these "differences" in expectations of storage space recovered.

@itisAliRH can you give it another try with these changes? Thank you!

I just tried it again, and the first issue is fixed, thanks @davelopez!

This is intentional behavior, the checkbox at the top left is only for the current "page" due to the items being paginated. If you really want to select "all" then you need to click the "select all X items" link at the top right. But it looks like it may be confusing... any UX ideas to communicate this better to the user?

I see, but in this case, the select all icon shows the wrong status. Even when clicking on the "select all X items", shows the indeterminate status, not the full check.

Screen.Recording.2023-02-09.at.12.00.53.mp4

@davelopez
Copy link
Contributor Author

davelopez commented Feb 10, 2023

I see, but in this case, the select all icon shows the wrong status. Even when clicking on the "select all X items", shows the indeterminate status, not the full check.

Thanks for noticing this! I've removed the limit for items that can be reviewed at once and forced the general check box in the table to behave like the "select all X items" link. I think I had to do all this juggling to deal with pagination because the request for the items was a bit slow. Now with the new API, in my local tests, with hundreds of items is surprisingly fast so I could overcome the select all checkbox limitation, and the selection should be simpler now.

@itisAliRH give it another try when you have time and let me know if now you like it better 😉

@itisAliRH
Copy link
Member

I see, but in this case, the select all icon shows the wrong status. Even when clicking on the "select all X items", shows the indeterminate status, not the full check.

Thanks for noticing this! I've removed the limit for items that can be reviewed at once and forced the general check box in the table to behave like the "select all X items" link. I think I had to do all this juggling to deal with pagination because the request for the items was a bit slow. Now with the new API, in my local tests, with hundreds of items is surprisingly fast so I could overcome the select all checkbox limitation, and the selection should be simpler now.

@itisAliRH give it another try when you have time and let me know if now you like it better wink

@davelopez Thanks! I prefer to select all items on the current page by clicking on the general checkbox and select all items on all pages by clicking select all X items. Having two ways to do the same behaviour would be confusing.

ReviewCleanupDialog shows the wrong pagination numbers after permanently deleting items:

This problem came back again :(

@davelopez
Copy link
Contributor Author

Ok, I think the least confusing option here is to remove the select all link and leave just the checkbox. Enabling the checkbox for selecting individual pages can be confusing too and the link was just to try to workaround the pagination issue, which seems to not be a problem anymore.

@davelopez davelopez force-pushed the add_storage_api branch 2 times, most recently from 752d748 to 1d31285 Compare February 13, 2023 13:43
@davelopez davelopez marked this pull request as ready for review February 13, 2023 14:09
@davelopez
Copy link
Contributor Author

I think this one is ready for review, especially the backend part around the new database queries and the way datasets are purged now in 2 phases. I'm happy to rethink this if this approach is not desirable.

The client part is mostly refactorings and simplifications to adapt to the new API, and also a few fixes/enhancements around the cleanable items review dialog.

No point in keeping it required, since the encoding/decoding should be done directly in the model parsing going forward.
When listing datasets, the `total_size` can be None in some error statuses, so make it 0 by default.
To avoid interactions with other tests that may alter the state of datasets or histories
This reverts commit edfa629.

Maybe worth adding it back later.
Purge associations and datasets in 2 steps:
First synchronously mark all associations as purged in one transaction.
Then purge internal datasets in a celery task or synchronously if not available.

This will ensure that the associations are marked as purged when the request finishes keeping the results consistent between subsequent requests.
The new API is much faster now, so maybe this is an early optimization that hinders the usability.
The checkbox toggle now runs asynchronously so we must wait for the promise to finish.
It can happen that storage space is not recovered but items were permanently deleted if those items were copies of other items.
This makes sure that the summary data gets re-requested after a potential change in the results.
@davelopez davelopez marked this pull request as draft March 7, 2023 12:51
@davelopez
Copy link
Contributor Author

This requires some adjustments to make it compatible with #14073 (working on it)

@davelopez davelopez marked this pull request as ready for review March 7, 2023 18:35
@jmchilton jmchilton merged commit e831a7d into galaxyproject:dev Mar 20, 2023
39 of 40 checks passed
@jmchilton
Copy link
Member

Very nice polish - thank you!

@davelopez davelopez deleted the add_storage_api branch March 20, 2023 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants