Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: 馃幐 upgrade datasets to 2.3.1 #375

Merged
merged 7 commits into from
Jun 15, 2022
Merged

feat: 馃幐 upgrade datasets to 2.3.1 #375

merged 7 commits into from
Jun 15, 2022

Conversation

severo
Copy link
Collaborator

@severo severo commented Jun 15, 2022

No description provided.

@severo
Copy link
Collaborator Author

severo commented Jun 15, 2022

It seems like we have an issue with the gated datasets:

https://github.com/huggingface/datasets-server/runs/6895195217?check_suite_focus=true

E           aiohttp.client_exceptions.ClientResponseError: 401, message='Unauthorized', url=URL('https://huggingface.co/datasets/severo/dummy_gated/resolve/99194748bed3625a941aaf785740df02ca5762c9/data/train-00000-of-00001.parquet')

cc @lhoestq: do you have an idea? Linked to https://github.com/huggingface/datasets/pull/4472/files?

@severo severo changed the title feat: 馃幐 upgrade datasets to 2.3.0 feat: 馃幐 upgrade datasets to 2.3.1 Jun 15, 2022
@lhoestq
Copy link
Member

lhoestq commented Jun 15, 2022

This is fixed in 2.3 ;)

It is due to the hub changing the error code for unauthenticated requests to gated repos

@severo
Copy link
Collaborator Author

severo commented Jun 15, 2022

Hmmm: I get this error with datasets 2.3.0, and still have it in 2.3.1. It's something else, I'll investigate to make a reproducible example

@lhoestq
Copy link
Member

lhoestq commented Jun 15, 2022

I'm getting this error on my side Oo

ValueError: Arrow type extension<arrow.py_extension_type<pyarrow.lib.UnknownExtensionType>> does not have a datasets dtype equivalent.

We were using extension types in the very beginning of the Image feature type, and we dropped it quickly because it was super buggy. You might have to regenerate this dataset

@severo
Copy link
Collaborator Author

severo commented Jun 15, 2022

yes, I get this too when loading without the streaming mode. I'll create another test dataset.

@severo severo merged commit 3991247 into main Jun 15, 2022
@severo severo deleted the upgrade-datasets branch June 15, 2022 13:44
@severo
Copy link
Collaborator Author

severo commented Jun 15, 2022

I deployed and also launched a cache refresh for all the datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants