-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
# df = pd.read_parquet("gs://cloud-samples-data/bigquery/us-states/us-states.parquet") # works
df = pd.read_parquet(["gs://cloud-samples-data/bigquery/us-states/us-states.parquet"]) # fails
df = pd.read_parquet(["gs://cloud-samples-data/bigquery/us-states/us-states.parquet"], storage_options={"token":"anon"}) # fails with a different errorIssue Description
When passing in a list of files (remote) to read with read_parquet I see failures vs when I pass in the file path directly.
When I don't pass in storage options here's the error:
File /.venv/lib/python3.11/site-packages/pyarrow/dataset.py:368, in <listcomp>(.0)
361 is_local = (
362 isinstance(filesystem, (LocalFileSystem, _MockFileSystem)) or
363 (isinstance(filesystem, SubTreeFileSystem) and
364 isinstance(filesystem.base_fs, LocalFileSystem))
365 )
367 # allow normalizing irregular paths such as Windows local paths
--> 368 paths = [filesystem.normalize_path(_stringify_path(p)) for p in paths]
370 # validate that all of the paths are pointing to existing *files*
371 # possible improvement is to group the file_infos by type and raise for
372 # multiple paths per error category
373 if is_local:
File /.venv/lib/python3.11/site-packages/pyarrow/_fs.pyx:1012, in pyarrow._fs.FileSystem.normalize_path()
File /.venv/lib/python3.11/site-packages/pyarrow/error.pxi:155, in pyarrow.lib.pyarrow_internal_check_status()
File /.venv/lib/python3.11/site-packages/pyarrow/error.pxi:92, in pyarrow.lib.check_status()
ArrowInvalid: Expected a local filesystem path, got a URI: 'gs://cloud-samples-data/bigquery/us-states/us-states.parquet'When I pass in any storage options
File .venv/lib/python3.11/site-packages/pandas/io/parquet.py:258, in PyArrowImpl.read(self, path, columns, filters, use_nullable_dtypes, dtype_backend, storage_options, filesystem, **kwargs)
256 if manager == "array":
257 to_pandas_kwargs["split_blocks"] = True
--> 258 path_or_handle, handles, filesystem = _get_path_or_handle(
259 path,
260 filesystem,
261 storage_options=storage_options,
262 mode="rb",
263 )
264 try:
265 pa_table = self.api.parquet.read_table(
266 path_or_handle,
267 columns=columns,
(...) 270 **kwargs,
271 )
File .venv/lib/python3.11/site-packages/pandas/io/parquet.py:129, in _get_path_or_handle(path, fs, storage_options, mode, is_dir)
123 fs, path_or_handle = fsspec.core.url_to_fs(
124 path_or_handle, **(storage_options or {})
125 )
126 elif storage_options and (not is_url(path_or_handle) or mode != "rb"):
127 # can't write to a remote url
128 # without making use of fsspec at the moment
--> 129 raise ValueError("storage_options passed with buffer, or non-supported URL")
131 handles = None
132 if (
133 not fs
134 and not is_dir
(...) 139 # fsspec resources can also point to directories
140 # this branch is used for example when reading from non-fsspec URLs
ValueError: storage_options passed with buffer, or non-supported URLExpected Behavior
read_parquet succeeding whether we pass in a single path to a directory/file or a list of files to read.
Workarounds:
Creating and passing in a filesystem object explicitly in the read_parquet works or reading files one by one and concatenating is another option.
Installed Versions
pandas : 2.3.3
numpy : 1.26.4
pytz : 2025.2
dateutil : 2.9.0.post0
pip : None
Cython : None
sphinx : None
IPython : 9.6.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.14.2
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.12.0
html5lib : None
hypothesis : None
gcsfs : 2024.12.0
jinja2 : 3.1.6
lxml.etree : 5.4.0
matplotlib : None
numba : 0.61.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : 22.0.0
pyreadstat : None
pytest : 8.4.2
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.16.3
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2025.2
qtpy : None
pyqt5 : None