-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Open
Description
Describe the bug
Hello,
I'm trying to convert an HF dataset into a TFRecord so I'm importing tensorflow and datasets to do so.
Depending in what order I'm importing those librairies, my code hangs forever and is unkillable (CTRL+C doesn't work, I need to kill my shell entirely).
Thank you for your help
🙏
Steps to reproduce the bug
If you run the following script, this will hang forever :
import tensorflow as tf
import datasets
dataset = datasets.load_dataset("imagenet-1k", split="validation", streaming=True)
print(next(iter(dataset)))however running the following will work fine (I just changed the order of the imports) :
import datasets
import tensorflow as tf
dataset = datasets.load_dataset("imagenet-1k", split="validation", streaming=True)
print(next(iter(dataset)))Expected behavior
I'm expecting the script to reach the end and my case print the content of the first item in the dataset
{'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=408x500 at 0x70C646A03110>, 'label': 91}
Environment info
$ datasets-cli env
- `datasets` version: 3.3.2
- Platform: Linux-6.8.0-1017-aws-x86_64-with-glibc2.35
- Python version: 3.11.7
- `huggingface_hub` version: 0.29.1
- PyArrow version: 19.0.1
- Pandas version: 2.2.3
- `fsspec` version: 2024.12.0
I'm also using tensorflow==2.18.0.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels