New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backwards compatibility broken for cached datasets that use .filter()
#2943
Comments
Hi ! I guess the caching mechanism should have considered the new |
If it's easy enough to implement, then yes please 😄 But this issue can be low-priority, since I've only encountered it in a couple of |
Well it can cause issue with anyone that updates |
I just merged a fix, let me know if you're still having this kind of issues :) We'll do a release soon to make this fix available |
Definitely works on several manual cases with our dummy datasets, thank you @lhoestq ! |
Fixed by #2947. |
Describe the bug
After upgrading to datasets
1.12.0
, some cached.filter()
steps from1.11.0
started failing withValueError: Keys mismatch: between {'indices': Value(dtype='uint64', id=None)} and {'file': Value(dtype='string', id=None), 'text': Value(dtype='string', id=None), 'speaker_id': Value(dtype='int64', id=None), 'chapter_id': Value(dtype='int64', id=None), 'id': Value(dtype='string', id=None)}
Related feature: #2836
❓ This is probably a
wontfix
bug, since it can be solved by simply cleaning the related cache dirs, but the workaround could be useful for someone googling the error :)Workaround
Remove the cache for the given dataset, e.g.
rm -rf ~/.cache/huggingface/datasets/librispeech_asr
.Steps to reproduce the bug
Delete
~/.cache/huggingface/datasets/librispeech_asr
if it exists.pip install datasets==1.11.0
and run the following snippet:pip install datasets==1.12.1
and re-run the code againExpected results
Same result as with the previous
datasets
version.Actual results
Environment info
datasets
version: 1.12.1The text was updated successfully, but these errors were encountered: