-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
polaris.utils.errors.PolarisHubError: Error opening Zarr store
at dataset upload
#147
Comments
Uploading Zarr is a multi-stage process, so it can happen that you do see the dataset on the Hub already, even though a later step in the upload process fails. However, you should always see a banner at the top stating that the dataset upload has not completed yet. |
It worked now for another dataset. The one that failed at upload was broken, with the banner being there for 16+ hours. I deleted it in the UI in order to upload it again. Now I get
Do I need to refresh something to make the delete be registered? |
@fteufel |
Still something hanging
|
@fteufel |
Hey there! I'm going to close this issue because it's no longer related to the original issue that was raised. But before I do, I wanted to provide some context and hopefully solve your issue in the process @fteufel: We made the decision to implement deletion on the Hub as a soft-delete: You won't see deleted content on the Hub anymore, but the actual files still exist. That way we could restore deleted artifacts if needed. This does imply, however, that you cannot create a dataset with the same name as a previously deleted artifact. In an effort to unblock you quickly, @zhu0619 manually hard-deleted the artifact from our database (solving this), but did not yet delete the associated files that had already completed uploading (causing this). So where do we go next?
This situation is an exceptional case. Moving forward, we never want to manually delete content because such a manual process is too error-prone and risks our data integrity. For just this once, however, given that we already deleted the entry from our database, we decided to also manually delete the associated, orphaned files from our storage backend. @fteufel This implies that uploading your dataset should work now! If it doesn't, however, please reach out over Discord! Given that it's such an exceptional case, that's the better place to get personal support. |
Ok, understand now - would be great to have that spelled out in the confirmation popup you get when deleting. But it failed again :(
|
@fteufel I see some data made it to our storage backend. I'm not sure why the upload failed midway through, but my best explanation is that your login expired as the upload happened or that you hit some timeout. You're helping us stress test the system here - The downside of working mostly with small molecules is that most datasets I'm used to are small! 😅 Lacking a formal retry mechanism, could you complete the upload using: First, refresh your login token:
Then: from polaris.hub.client import PolarisHubClient
# Create the exact same dataset locally again
dataset = ...
with PolarisHubClient() as client:
# Increase the timeout
client.settings.default_timeout = (100, 2000)
# Open the destination Zarr archive again
dest = client.open_zarr_file(
owner="mlls",
name="bend-chromatin-accessibility",
path="polarisfs://data.zarr",
mode="w",
as_consolidated=False,
)
# Copy the files to the destination, skipping any files that already exist.
# With this code, you will also see additional print output during the process that may help us debug if it fails again.
logger.info("Copying Zarr archive to the Hub. This may take a while.")
zarr.copy_store(
source=dataset.zarr_root.store.store,
dest=dest.store,
log=print,
if_exists="skip",
) |
But can also just wait until there is a retry mechanism for now. |
Thanks! #144 will also add more informative error messages for this case, helping us to investigate. |
I added myself temporarily to the
Since we haven't implemented the delete operation, overwriting the archive fails. This would be fixed by changing it to |
Polaris version
dev
Python Version
3.10
Operating System
Linux
Installation
pip
Description
I'm trying to upload a zarr dataset. Not doing anything special as far as I can tell - I think the zarr upload fails, but the dataset gets created on the hub anyway. Not sure what's going wrong
Steps to reproduce
Additional output
No response
The text was updated successfully, but these errors were encountered: