Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

polaris.utils.errors.PolarisHubError: Error opening Zarr store at dataset upload #147

Closed
fteufel opened this issue Jul 21, 2024 · 13 comments · Fixed by #146
Closed

polaris.utils.errors.PolarisHubError: Error opening Zarr store at dataset upload #147

fteufel opened this issue Jul 21, 2024 · 13 comments · Fixed by #146
Labels
bug Something isn't working

Comments

@fteufel
Copy link
Contributor

fteufel commented Jul 21, 2024

Polaris version

dev

Python Version

3.10

Operating System

Linux

Installation

pip

Description

I'm trying to upload a zarr dataset. Not doing anything special as far as I can tell - I think the zarr upload fails, but the dataset gets created on the hub anyway. Not sure what's going wrong

2024-07-21 17:05:44.769 | INFO     | polaris._mixins:md5sum:27 - Computing the checksum. This can be slow for large datasets.
Finding all files in the Zarr archive: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1027/1027 [00:00<00:00, 2670.07it/s]
💥 ERROR: Failed to upload dataset. 
Traceback (most recent call last):
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/polaris/hub/client.py", line 330, in open_zarr_file
    return zarr.open(store, mode=mode)
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/zarr/convenience.py", line 123, in open
    return open_group(_store, mode=mode, **kwargs)
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/zarr/hierarchy.py", line 1581, in open_group
    init_group(store, overwrite=True, path=path, chunk_store=chunk_store)
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/zarr/storage.py", line 682, in init_group
    _init_group_metadata(store=store, overwrite=overwrite, path=path, chunk_store=chunk_store)
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/zarr/storage.py", line 704, in _init_group_metadata
    rmdir(store, path)
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/zarr/storage.py", line 212, in rmdir
    store.rmdir(path)
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/zarr/storage.py", line 1548, in rmdir
    if self.fs.isdir(store_path):
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/fsspec/spec.py", line 705, in isdir
    return self.info(path)["type"] == "directory"
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/fsspec/spec.py", line 665, in info
    out = self.ls(self._parent(path), detail=True, **kwargs)
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/polaris/hub/polarisfs.py", line 94, in ls
    response.raise_for_status()
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/httpx/_models.py", line 761, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '404 Not Found' for url 'https://polarishub.io/api/v1/storage/dataset/mlls/BEND_chromatin_accessibility/ls'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/novo/users/fegt/BEND/scripts/upload_polaris_datasets.py", line 236, in <module>
    dataset.upload_to_hub(owner='mlls')
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/polaris/dataset/_dataset.py", line 372, in upload_to_hub
    self.client.upload_dataset(self, access=access, owner=owner)
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/polaris/hub/client.py", line 587, in upload_dataset
    dest = self.open_zarr_file(
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/polaris/hub/client.py", line 333, in open_zarr_file
    raise PolarisHubError("Error opening Zarr store") from e
polaris.utils.errors.PolarisHubError: Error opening Zarr store

Steps to reproduce

# i have a `df` and a `mulithot_labels` array

root = zarr.open('chromatin.zarr', "w")
root.array("labels", multihot_labels) # np array (n_samples, 125)
zarr.consolidate_metadata('chromatin.zarr') # this seems necessary, not sure why

df['label'] = [f'labels#{i}' for i in range(len(df))]

annotations = {
    "sequence": ColumnAnnotation(
        # modality="dna",
        description="The nucleotide sequence of the DNA region",
        # user_attributes={"unit": "mL/min/kg"},
    ),
    "strand": ColumnAnnotation(
        description="The strand of the DNA region",
    ),
    "chromosome": ColumnAnnotation(
        description="The chromosome of the DNA region",
    ),
    "start": ColumnAnnotation(
        description="The start coordinate of the DNA region",
    ),
    "end": ColumnAnnotation(
        description="The end coordinate of the DNA region",
    ),
    "label": ColumnAnnotation(
        description="The labels indicating the chromatin accessibility of the DNA region in the cell lines",
        is_pointer=True
    ),
}


dataset = Dataset(
    # The table is the core data-structure required to construct a dataset
    table=df.loc[:, ["sequence", "strand", "chromosome", "start", "end", "label"]],
    # Additional meta-data on the dataset level.
    name="BEND_chromatin_accessibility",
    description="Multilabel classification of chromatin accessibility in cell lines from the BEND benchmark",
    source="https://doi.org/10.1038/nature11247",
    annotations=annotations,
    curation_reference="https://arxiv.org/abs/2311.12570",
    owner=HubOwner(user_id="fteufel", slug="fteufel"),
    user_attributes={"year": "2023"},
    zarr_root_path="chromatin.zarr",
    license="CC-BY-4.0"
)

print(dataset.get_data(row=1, col='label'))

dataset.upload_to_hub(owner='mlls')

Additional output

No response

@fteufel fteufel added the bug Something isn't working label Jul 21, 2024
@cwognum
Copy link
Collaborator

cwognum commented Jul 21, 2024

Thanks for reporting, @fteufel !

I think this would be fixed by #146. It's because we use the dataset name, rather than the slug (i.e. only lowercase letters and dashes).

@cwognum
Copy link
Collaborator

cwognum commented Jul 21, 2024

Uploading Zarr is a multi-stage process, so it can happen that you do see the dataset on the Hub already, even though a later step in the upload process fails. However, you should always see a banner at the top stating that the dataset upload has not completed yet.

@cwognum
Copy link
Collaborator

cwognum commented Jul 22, 2024

#Thanks to @zhu0619, #146 is now merged and this fix was included in release 0.7.3!

Could you try upgrading Polaris to the latest version and let me know if the issue persists?

@fteufel
Copy link
Contributor Author

fteufel commented Jul 22, 2024

It worked now for another dataset.

The one that failed at upload was broken, with the banner being there for 16+ hours. I deleted it in the UI in order to upload it again. Now I get

  "message": "Dataset 'bend-chromatin-accessibility', with slug 'bend-chromatin-accessibility', already exists"

Do I need to refresh something to make the delete be registered?

@zhu0619
Copy link
Contributor

zhu0619 commented Jul 22, 2024

@fteufel
The reason is the metadata is already registered in the database.
I just removed it from the database. You can try again.

@fteufel
Copy link
Contributor Author

fteufel commented Jul 22, 2024

Still something hanging

2024-07-22 16:36:47.580 | INFO     | polaris._mixins:md5sum:27 - Computing the checksum. This can be slow for large datasets.
Finding all files in the Zarr archive: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1027/1027 [00:00<00:00, 2661.58it/s]
💥 ERROR: Failed to upload dataset. 
Traceback (most recent call last):
  File "/novo/users/fegt/BEND/scripts/upload_polaris_datasets.py", line 243, in <module>
    dataset.upload_to_hub(owner='mlls')
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/polaris/dataset/_dataset.py", line 377, in upload_to_hub
    self.client.upload_dataset(self, access=access, owner=owner)
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/polaris/hub/client.py", line 581, in upload_dataset
    hub_response.raise_for_status()
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/httpx/_models.py", line 761, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '405 Method Not Allowed' for url 'https://polarishub.io/storage/dataset/mlls/bend-chromatin-accessibility/table.parquet'

@zhu0619
Copy link
Contributor

zhu0619 commented Jul 22, 2024

@fteufel
We are working on a solution to enable dataset updates and aim to make it available as soon as possible.
For now, another quick solution for this is to change your dataset name.

@cwognum
Copy link
Collaborator

cwognum commented Jul 22, 2024

Hey there! I'm going to close this issue because it's no longer related to the original issue that was raised.

But before I do, I wanted to provide some context and hopefully solve your issue in the process @fteufel: We made the decision to implement deletion on the Hub as a soft-delete: You won't see deleted content on the Hub anymore, but the actual files still exist. That way we could restore deleted artifacts if needed.

This does imply, however, that you cannot create a dataset with the same name as a previously deleted artifact. In an effort to unblock you quickly, @zhu0619 manually hard-deleted the artifact from our database (solving this), but did not yet delete the associated files that had already completed uploading (causing this).

So where do we go next?

  1. For cases like yours where an upload only partially completes, we want to make it easier to retry uploading the files that failed. I created an issue for this: Add a retry mechanism for partially completed uploads #151 . Please use that issue to share any thoughts or ideas on how this should(n't) work.
  2. We need to do a better job at clearly communicating that a dataset name is unique (and thus explain the consequences of deleting an artifact). Any suggestion on where and how you would have expected such information?

This situation is an exceptional case. Moving forward, we never want to manually delete content because such a manual process is too error-prone and risks our data integrity. For just this once, however, given that we already deleted the entry from our database, we decided to also manually delete the associated, orphaned files from our storage backend. @fteufel This implies that uploading your dataset should work now!

If it doesn't, however, please reach out over Discord! Given that it's such an exceptional case, that's the better place to get personal support.

@cwognum cwognum closed this as completed Jul 22, 2024
@fteufel
Copy link
Contributor Author

fteufel commented Jul 23, 2024

Ok, understand now - would be great to have that spelled out in the confirmation popup you get when deleting.

But it failed again :(

⠹ Uploading dataset...2024-07-23 09:55:01.849 | INFO     | polaris.hub.client:upload_dataset:602 - Copying Zarr archive to the Hub. This may take a while.
💥 ERROR: Failed to upload dataset. 
Traceback (most recent call last):
  File "/novo/users/fegt/BEND/scripts/upload_polaris_datasets.py", line 243, in <module>
    dataset.upload_to_hub(owner='mlls')
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/polaris/dataset/_dataset.py", line 377, in upload_to_hub
    self.client.upload_dataset(self, access=access, owner=owner)
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/polaris/hub/client.py", line 603, in upload_dataset
    zarr.copy_store(
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/zarr/convenience.py", line 756, in copy_store
    dest[dest_key] = data
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/zarr/storage.py", line 1470, in __setitem__
    self.map[key] = value
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/fsspec/mapping.py", line 175, in __setitem__
    self.fs.pipe_file(key, maybe_convert(value))
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/polaris/hub/polarisfs.py", line 223, in pipe_file
    raise PolarisHubError("Could not get signed URL from Polaris Hub.")
polaris.utils.errors.PolarisHubError: Could not get signed URL from Polaris Hub.

@cwognum
Copy link
Collaborator

cwognum commented Jul 23, 2024

@fteufel I see some data made it to our storage backend. I'm not sure why the upload failed midway through, but my best explanation is that your login expired as the upload happened or that you hit some timeout. You're helping us stress test the system here - The downside of working mostly with small molecules is that most datasets I'm used to are small! 😅

Lacking a formal retry mechanism, could you complete the upload using:

First, refresh your login token:

polaris login --overwrite

Then:

from polaris.hub.client import PolarisHubClient

# Create the exact same dataset locally again
dataset = ...

with PolarisHubClient() as client: 
    # Increase the timeout
    client.settings.default_timeout = (100, 2000)

    # Open the destination Zarr archive again
    dest = client.open_zarr_file(
        owner="mlls",
        name="bend-chromatin-accessibility",
        path="polarisfs://data.zarr",
        mode="w",
        as_consolidated=False,
    )
    
    # Copy the files to the destination, skipping any files that already exist. 
    # With this code, you will also see additional print output during the process that may help us debug if it fails again.
    logger.info("Copying Zarr archive to the Hub. This may take a while.")
    zarr.copy_store(
        source=dataset.zarr_root.store.store,
        dest=dest.store,
        log=print,
        if_exists="skip",
    )

@fteufel
Copy link
Contributor Author

fteufel commented Jul 23, 2024

    dest = client.open_zarr_file(
  File "/novo/users/fegt/miniconda3/envs/bend/lib/python3.10/site-packages/polaris/hub/client.py", line 333, in open_zarr_file
    raise PolarisHubError("Error opening Zarr store") from e
polaris.utils.errors.PolarisHubError: Error opening Zarr store

But can also just wait until there is a retry mechanism for now.

@cwognum
Copy link
Collaborator

cwognum commented Jul 23, 2024

Thanks! #144 will also add more informative error messages for this case, helping us to investigate.

@cwognum
Copy link
Collaborator

cwognum commented Jul 23, 2024

I added myself temporarily to the mlls organization and I see the issue! It's because of mode="w". From the Zarr docs:

‘w’ means create (overwrite if exists);

Since we haven't implemented the delete operation, overwriting the archive fails.

This would be fixed by changing it to mode="a".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants