Skip to content

Import order crashes script execution #7419

@DamienMatias

Description

@DamienMatias

Describe the bug

Hello,

I'm trying to convert an HF dataset into a TFRecord so I'm importing tensorflow and datasets to do so.
Depending in what order I'm importing those librairies, my code hangs forever and is unkillable (CTRL+C doesn't work, I need to kill my shell entirely).

Thank you for your help
🙏

Steps to reproduce the bug

If you run the following script, this will hang forever :

import tensorflow as tf
import datasets

dataset = datasets.load_dataset("imagenet-1k", split="validation", streaming=True)
print(next(iter(dataset)))

however running the following will work fine (I just changed the order of the imports) :

import datasets
import tensorflow as tf

dataset = datasets.load_dataset("imagenet-1k", split="validation", streaming=True)
print(next(iter(dataset)))

Expected behavior

I'm expecting the script to reach the end and my case print the content of the first item in the dataset

{'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=408x500 at 0x70C646A03110>, 'label': 91}

Environment info

$ datasets-cli env
- `datasets` version: 3.3.2
- Platform: Linux-6.8.0-1017-aws-x86_64-with-glibc2.35
- Python version: 3.11.7
- `huggingface_hub` version: 0.29.1
- PyArrow version: 19.0.1
- Pandas version: 2.2.3
- `fsspec` version: 2024.12.0

I'm also using tensorflow==2.18.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions