Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type error when importing datasets on Kaggle #6753

Closed
jtv199 opened this issue Mar 24, 2024 · 7 comments
Closed

Type error when importing datasets on Kaggle #6753

jtv199 opened this issue Mar 24, 2024 · 7 comments

Comments

@jtv199
Copy link

jtv199 commented Mar 24, 2024

Describe the bug

When trying to run

import datasets
print(datasets.__version__)

It generates the following error

TypeError: expected string or bytes-like object

It looks like It cannot find the valid versions of fsspec

though fsspec version is fine when I checked Via command

import fsspec
print(fsspec.__version__)
​
# output: 2024.3.1

Detailed crash report

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 1
----> 1 import datasets
      2 print(datasets.__version__)

File /opt/conda/lib/python3.10/site-packages/datasets/__init__.py:18
      1 # ruff: noqa
      2 # Copyright 2020 The HuggingFace Datasets Authors and the TensorFlow Datasets Authors.
      3 #
   (...)
     13 # See the License for the specific language governing permissions and
     14 # limitations under the License.
     16 __version__ = "2.18.0"
---> 18 from .arrow_dataset import Dataset
     19 from .arrow_reader import ReadInstruction
     20 from .builder import ArrowBasedBuilder, BeamBasedBuilder, BuilderConfig, DatasetBuilder, GeneratorBasedBuilder

File /opt/conda/lib/python3.10/site-packages/datasets/arrow_dataset.py:66
     63 from multiprocess import Pool
     64 from tqdm.contrib.concurrent import thread_map
---> 66 from . import config
     67 from .arrow_reader import ArrowReader
     68 from .arrow_writer import ArrowWriter, OptimizedTypedSequence

File /opt/conda/lib/python3.10/site-packages/datasets/config.py:41
     39 # Imports
     40 DILL_VERSION = version.parse(importlib.metadata.version("dill"))
---> 41 FSSPEC_VERSION = version.parse(importlib.metadata.version("fsspec"))
     42 PANDAS_VERSION = version.parse(importlib.metadata.version("pandas"))
     43 PYARROW_VERSION = version.parse(importlib.metadata.version("pyarrow"))

File /opt/conda/lib/python3.10/site-packages/packaging/version.py:49, in parse(version)
     43 """
     44 Parse the given version string and return either a :class:`Version` object
     45 or a :class:`LegacyVersion` object depending on if the given version is
     46 a valid PEP 440 version or a legacy version.
     47 """
     48 try:
---> 49     return Version(version)
     50 except InvalidVersion:
     51     return LegacyVersion(version)

File /opt/conda/lib/python3.10/site-packages/packaging/version.py:264, in Version.__init__(self, version)
    261 def __init__(self, version: str) -> None:
    262 
    263     # Validate the version and parse it into pieces
--> 264     match = self._regex.search(version)
    265     if not match:
    266         raise InvalidVersion(f"Invalid version: '{version}'")

TypeError: expected string or bytes-like object

Steps to reproduce the bug

  1. run !pip install -U datasets on kaggle
  2. check datasets is installed via
import datasets
print(datasets.__version__)

Expected behavior

Expected to print datasets version, like 2.18.0

Environment info

Running on Kaggle, latest enviornment , here is the notebook https://www.kaggle.com/code/jtv199/mistrial-7b-part2

@Edgar454
Copy link

Edgar454 commented Mar 24, 2024

I have the same problem
It seems that it only appears when you are using GPU
It seems to work fine with the 2.17 version though

@RechieKho
Copy link

Same here.

@msusol
Copy link

msusol commented Mar 29, 2024

I have the same problem
It seems that it only appears when you are using GPU
It seems to work fine with the 2.17 version though

I downgraded from 2.18 to 2.17, and it works with CPU/GPU .. except now pyarrow complains

...
File /opt/conda/lib/python3.10/site-packages/pyarrow/array.pxi:830, in pyarrow.lib._PandasConvertible.to_pandas()

File /opt/conda/lib/python3.10/site-packages/pyarrow/table.pxi:3989, in pyarrow.lib.Table._to_pandas()

ImportError: cannot import name table_to_blockmanager

see also https://www.kaggle.com/competitions/pii-detection-removal-from-educational-data/discussion/487474#2722594

@jtv199
Copy link
Author

jtv199 commented Mar 30, 2024

Solved for me by downgrading !pip install -U datasets==2.16.0 Works with gpu aswell

@jtv199 jtv199 closed this as completed Mar 30, 2024
@RechieKho
Copy link

I think you should remain open this issue. It works at the previous version but not the latter versions. It is possible as a bug that the maintainer could take note for.

@msusol
Copy link

msusol commented Mar 30, 2024

Solved for me by downgrading !pip install -U datasets==2.16.0 Works with gpu as well

Verified it's working w/ GPU if I make these 3 updates.

datasets==2.16.0
fsspec==2023.10.0
gcsfs==2023.10.0

but the issue shouldn't be closed, this is just a workaround until they get the issue with 2.18.0 resolved.

See also: https://www.kaggle.com/competitions/pii-detection-removal-from-educational-data/discussion/487474

@CQHofsns
Copy link

CQHofsns commented Apr 4, 2024

Solved for me by downgrading !pip install -U datasets==2.16.0 Works with gpu as well

Verified it's working w/ GPU if I make these 3 updates.

datasets==2.16.0
fsspec==2023.10.0
gcsfs==2023.10.0

but the issue shouldn't be closed, this is just a workaround until they get the issue with 2.18.0 resolved.

See also: https://www.kaggle.com/competitions/pii-detection-removal-from-educational-data/discussion/487474

This also works for me, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants