Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a retry mechanism for partially completed uploads #151

Open
cwognum opened this issue Jul 22, 2024 · 1 comment
Open

Add a retry mechanism for partially completed uploads #151

cwognum opened this issue Jul 22, 2024 · 1 comment
Labels
feature Annotates any PR that adds new features; Used in the release process
Milestone

Comments

@cwognum
Copy link
Collaborator

cwognum commented Jul 22, 2024

Is your feature request related to a problem? Please describe.

Uploading a dataset to the Hub comprises multiple, independent steps. The first step creates an entry in our database, whereas any subsequent steps upload the actual files to the Hub's storage backend. If any of these subsequent steps fail, the dataset will be visible on the Hub to its owner, but will never be marked as ready and remains trapped in limbo.

When you delete such a dataset, you will never be able to recreate a dataset with the same name because Polaris uses soft-deletion and artifact names are unique.

Describe the solution you'd like

For any dataset that has been created in the Hub's database, but for which some of the file uploads failed, it should be possible to retry the failed uploads. It should also be clearly communicated that this is the recommended next step once an upload fails.

Describe alternatives you've considered

Alternatively, we could:

  • Switch from soft-deletes to hard-deletes. That way, a user could just delete a failed upload and try again. My worry is that this leads to a worse overall user experience, because data cannot be recovered if it is accidentally deleted.
  • Update our mechanism to ensure uniqueness of the slug across non-deleted only artifacts. This could get technically complex, e.g. what if a user wants to recover a delete artifact but has created a new dataset with the same name in the meantime?

Additional context

This issue came up in #147

@cwognum cwognum added the feature Annotates any PR that adds new features; Used in the release process label Jul 22, 2024
@cwognum cwognum added this to the XL Datasets milestone Aug 13, 2024
@jstlaurent
Copy link
Contributor

Hard delete

The main blocker is that we need to clean up the bucket data, which we don't do now. If we have a deletion workflow that cleans up everything, then there's no problem with a hard delete anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Annotates any PR that adds new features; Used in the release process
Projects
None yet
Development

No branches or pull requests

2 participants