You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug dask_cudf.read_parquet is unable to read a list of parquet files from a remote bucket, even though it is able to acquire the list of files from the bucket.
(rapids) root@cf56b55ceac7:~# conda list |grep fsspec
fsspec 2021.5.0 pypi_0 pypi
It appears that the vanilla container image rapidsai/rapidsai:0.19-cuda11.2-runtime-ubuntu18.04-py3.8 uses fsspec 2021.4.0. fsspec must have been updated to 2021.5.0 by something I installed while running the jupyter notebooks. My bad.
But yes, #8275 looks like exactly what is happening.
Update: Installing the dask/gcfcs library installs the 2021.5.0 version, and has as a prerequisite fsspec==2021.5.0 . The latter is causing the system to try and read from a local directory.
I have forcibly installed fsspec==2021.4.0 everywhere, including the containers, and that seems to be working.
Describe the bug
dask_cudf.read_parquet
is unable to read a list of parquet files from a remote bucket, even though it is able to acquire the list of files from the bucket.Steps/Code to reproduce bug
Throws an FileNotFoundError as it is trying to read from local directory:
Expected behavior
This should behave like
dask_cudf.read_csv
, which seems to work:dask.dataframe.read_parquet
also works.Environment overview (please complete the following information)
docker pull
&docker run
commands usedrapidsai/rapidsai:0.19-cuda11.2-runtime-ubuntu18.04-py3.8
image.Environment details
Cudf versions in both the pods and the local client from
conda list|grep cudf
the following:Additional context
It should read the following files in the remote bucket due to the wildcard :
The text was updated successfully, but these errors were encountered: