BUG: fix support to read parquet files with list columns #597

theroggy · 2025-11-18T02:27:55Z

In PR #556 support for list-type colums was added, with tests for .geojson files. However, list columns in .parquet files are apparently returned/treated differently by GDAL than list columns in .geojson. This PR takes care of handling .parquet files correctly as well and adds tests for this case.

Remarks:

for ".parquet" files, list columns are returned already as lists without having to parse them. However, the lists returned are ndarrays rather than python lists, which is a small difference compared to .geojson files. As discussed below we keep this behaviour.
use_arrow or not gives differences as well, e.g. in None being returned versus np.nan.
A test for nested columns in a parquet file was added, but is skipped when use_arrow=False for now as in this case the columns are flattened, and its not clear how we want to deal with this. To be further discussed/followed up in BUG: reading file with JSON ogr subtype is broken with use_arrow=True #592
When a .parquet file contains list fields with None values in the list, these None values are returned as 0 or "" when read with use_arrow=False which is incorrect. With use_arrow=True, these None values are returned as np.nan, which is fine. This has been reported here: Parquet: list field types with None values in the list give issues OSGeo/gdal#13448

reference #592

…lways create them on-the-fly

…t reading tests

jorisvandenbossche

Thanks for looking into this!

pyogrio/_compat.py

pyogrio/geopandas.py

pyogrio/tests/conftest.py

pyogrio/tests/test_geopandas_io.py

jorisvandenbossche · 2025-11-18T16:45:32Z

for ".parquet" files, list columns are returned already as lists without having to parse them. However, the lists returned are ndarrays rather than python lists. This PR changes the behaviour for .geojson files so they also return ndarrays, but it could as well be changed the other way around.

Also commented that inline, but personally I would just leave this as is and not try to exactly reconcile (in the end the issue here is that pandas does not have a proper list type, and once it has that, this will change anyway)

use_arrow or not gives differences as well, e.g. in None being returned versus np.nan.

Similarly here, I think we can just accept this as differences with the pyarrow->pandas conversion

jorisvandenbossche · 2025-11-18T16:53:51Z

TODO: after adding the libgdal-arrow-parquet package in a conda CI env a test, test_read_dataframe_arrow_dtypes started failing. If libgdal-core is limited to < 3.12, the error disappears again?

So it might be related just to the version of libgdal-core, and not this PR / the fact that libgdal-arrow-parquet was added?

The last test run on main was still using libgdal 3.11

theroggy · 2025-11-19T10:10:04Z

TODO: after adding the libgdal-arrow-parquet package in a conda CI env a test, test_read_dataframe_arrow_dtypes started failing. If libgdal-core is limited to < 3.12, the error disappears again?

So it might be related just to the version of libgdal-core, and not this PR / the fact that libgdal-arrow-parquet was added?

The last test run on main was still using libgdal 3.11

I started the tests on main manually, and they use libgdal 3.12 now, but they passed. I also tried adding libgdal-arrow-parquet in main (#599), but this still didn't fail.

In this PR I tried moving around the pyarrow imports in different ways... but it keeps failing. If I comment out the new tests it stops failing, but if they are enabled it breaks... I moved them down the tests_... file so they are - I suppose - executed after the breaking test, but that doesn't help either.

Seems like a flaky thing is general :-(, so I wonder how it behaves "in the wild" in real code...

jorisvandenbossche · 2025-11-19T15:19:15Z

Seems like a flaky thing is general :-(, so I wonder how it behaves "in the wild" in real code...

Managed to reproduce it locally with the test suite, taking a look

…-parquet-list-columns

pyogrio/tests/test_geopandas_io.py

jorisvandenbossche · 2025-11-20T10:31:31Z

Thanks @theroggy!

ENH: add support for parquet list columns

e8dbd62

theroggy changed the title ~~ENH: add support for parquet list columns~~ BUG: fix support to read parquet files with list columns Nov 18, 2025

theroggy changed the title ~~BUG: fix support to read parquet files with list columns~~ BUG: also support to read parquet files with list columns Nov 18, 2025

theroggy added 14 commits November 18, 2025 10:35

Skip parquet if driver not available

d812965

Update test_geopandas_io.py

8a92f9a

Fix linter issue

f98c0e8

Skip some tests if pyarrow.parquet is not available

b87e799

Skip tests if parquet file cannot be created

89df381

Use a saved version of the test parquet files as it is a problem to a…

3d637e8

…lways create them on-the-fly

Give error instead of skip if test file cannot be created

a429118

Fix tests for minimal ci env

68fd8f9

Add libgdal-arrow-parquet to the lastest env to be able to run parque…

9bb9472

…t reading tests

Skip test for nested columns in parquet without arrow for now

e3f2555

Update test_geopandas_io.py

880182e

Update latest.yml

fc5a4d7

Update latest.yml

74fdf5c

Update latest.yml

c64c8d9

theroggy marked this pull request as ready for review November 18, 2025 15:08

jorisvandenbossche reviewed Nov 18, 2025

View reviewed changes

theroggy added 9 commits November 18, 2025 18:38

Only check GDAL_HAS_PARQUET in the tests

59fc5dc

Apply feedback

6c7738f

Update conftest.py

30513c5

Update test_geopandas_io.py

425c5d0

Remove some redundant code

cb115b1

Only import pyarrow, parquet if needed in conftest

099f19f

Update conftest.py

223e686

Delay calling list_drivers

4f2ac0e

Try importing pyarrow sooner

036df1e

rollback previous commit as it didn't help

27d1bee

theroggy added 8 commits November 19, 2025 11:39

Comment new tests

01a72be

Try uncommenting GDAL_HAS_PARQUET

a100f2f

Update test_geopandas_io.py

9aa348a

Move test_read_arrow_dtypes up + reenable new tests

0060a3f

Comment new tests again

97634cc

Comment code to create parquet test files

0650ef9

Activate new tests again

9963d0b

Move new tests down

7916ebc

theroggy added 5 commits November 19, 2025 19:07

Add fix for ArrowDtype error

e4ef8bb

Small improvements

7c62698

Revert reordering of tests

a4f95f5

Revert commenting parquet file creation code

a996beb

Update test_geopandas_io.py

da4f693

theroggy added this to the 0.12.0 milestone Nov 19, 2025

theroggy changed the title ~~BUG: also support to read parquet files with list columns~~ BUG: fix support to read parquet files with list columns Nov 19, 2025

theroggy requested a review from jorisvandenbossche November 19, 2025 18:50

This was referenced Nov 19, 2025

BUG: fix issue reading with use_arrow=True after having read a Parquet file #601

Merged

Parquet: list field types with None values in the list give issues OSGeo/gdal#13448

Closed

jorisvandenbossche added 2 commits November 20, 2025 10:51

Merge remote-tracking branch 'upstream/main' into ENH-add-support-for…

45ee07a

…-parquet-list-columns

fixup merge

424341d

jorisvandenbossche approved these changes Nov 20, 2025

View reviewed changes

pyogrio/tests/test_geopandas_io.py Show resolved Hide resolved

Add link to gdal issue for None values in list

0835b58

jorisvandenbossche merged commit d4f51b3 into geopandas:main Nov 20, 2025
3 of 25 checks passed

theroggy deleted the ENH-add-support-for-parquet-list-columns branch November 20, 2025 10:31

theroggy mentioned this pull request Nov 20, 2025

BUG: for parquet files with list type columns, None values in such lists are not read correctly #603

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: fix support to read parquet files with list columns #597

BUG: fix support to read parquet files with list columns #597

Uh oh!

theroggy commented Nov 18, 2025 •

edited

Loading

Uh oh!

jorisvandenbossche left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jorisvandenbossche commented Nov 18, 2025 •

edited

Loading

Uh oh!

jorisvandenbossche commented Nov 18, 2025

Uh oh!

theroggy commented Nov 19, 2025 •

edited

Loading

Uh oh!

jorisvandenbossche commented Nov 19, 2025

Uh oh!

Uh oh!

Uh oh!

jorisvandenbossche commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

BUG: fix support to read parquet files with list columns #597

BUG: fix support to read parquet files with list columns #597

Uh oh!

Conversation

theroggy commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jorisvandenbossche commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorisvandenbossche commented Nov 18, 2025

Uh oh!

theroggy commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorisvandenbossche commented Nov 19, 2025

Uh oh!

Uh oh!

Uh oh!

jorisvandenbossche commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

theroggy commented Nov 18, 2025 •

edited

Loading

jorisvandenbossche commented Nov 18, 2025 •

edited

Loading

theroggy commented Nov 19, 2025 •

edited

Loading