LazyFrame() not omitting hive partition columns #16404
Labels
bug
Something isn't working
needs triage
Awaiting prioritization by a maintainer
python
Related to Python Polars
Checks
Reproducible example
Log output
Issue description
I'm trying to read parquet files from S3 that have a Hive partition '/year=YYYY/month=MM/day=DD/hour=HH/' using the .read() method from pyarrow, but it fails, stating that one of the partition columns doesn't exist. However, if I exclude the partition columns and provide a list of columns that are actually present in the file, it reads without any issues. According to the documentation, the read() method ignores Hive partition columns. However, polars LazyFrame() still attempts to read in hive partition columns.
Expected behavior
LazyFrame() should omit the hive partition columns.
Installed versions
--------Version info---------
Polars: 0.20.26
Index type: UInt32
Platform: Windows-10-10.0.22000-SP0
Python: 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
----Optional dependencies----
adbc_driver_manager: 0.10.0
cloudpickle: 2.2.1
connectorx: 0.3.2
deltalake:
fastexcel:
fsspec: 2024.5.0
gevent: 23.7.0
hvplot: 0.9.2
matplotlib: 3.7.2
nest_asyncio: 1.5.6
numpy: 1.23.4
openpyxl: 3.0.10
pandas: 2.1.3
pyarrow: 16.1.0
pydantic: 2.6.3
pyiceberg:
pyxlsb:
sqlalchemy: 1.4.51
torch:
xlsx2csv:
xlsxwriter:
The text was updated successfully, but these errors were encountered: