Raise a helpful error for failed partial loading of list-struct columns #403

dougbrn · 2025-11-04T21:52:31Z

Resolves #394 using the lowest-effort approach of checking and throwing an error. I decided not to do the higher-effort approach of handling this dynamically, because there isn't a more performant way to do it than doing a full load and then performing column selection, so I think the user should be aware that their data requires that approach and adjusts on their side intentionally.

On the checking, I have set it up to read the parquet schema when it's possible that we may be doing a partial load ("." in the name) and checking to see if that column is a base column, or if it's a nested column. If it's a nested column then I have it simply return the error if it's not a struct. Notably, this is not going the full distance of verifying that something is a struct-list, which I'm not sure if that's better here or not. But it does handle the main case of catching list-structs.

codecov · 2025-11-04T21:55:31Z

Codecov Report

❌ Patch coverage is 90.32258% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.27%. Comparing base (a7d8124) to head (7e6f939).
⚠️ Report is 14 commits behind head on main.

Files with missing lines	Patch %	Lines
src/nested_pandas/nestedframe/io.py	90.32%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #403      +/-   ##
==========================================
- Coverage   97.33%   97.27%   -0.07%     
==========================================
  Files          19       19              
  Lines        2062     2089      +27     
==========================================
+ Hits         2007     2032      +25     
- Misses         55       57       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-11-04T22:01:06Z

Before [`a7d8124`]	After [`5245a03`]	Ratio	Benchmark (Parameter)
504±200ms	439±200ms	~0.87	benchmarks.ReadFewColumnsHTTPS.time_run
29.3±1ms	30.3±1ms	1.03	benchmarks.AssignSingleDfToNestedSeries.time_run
48.7±0.7ms	50.1±0.4ms	1.03	benchmarks.ReassignHalfOfNestedSeries.time_run
11.5±0.2ms	11.7±0.3ms	1.02	benchmarks.NestedFrameAddNested.time_run
1.26G	1.29G	1.02	benchmarks.ReadFewColumnsS3.peakmem_run
1.28±0.01ms	1.30±0.01ms	1.01	benchmarks.NestedFrameReduce.time_run
134M	134M	1.00	benchmarks.CountNestedBy.peakmem_run
66.0±0.2ms	66.2±0.8ms	1.00	benchmarks.CountNestedBy.time_run
101M	101M	1.00	benchmarks.NestedFrameAddNested.peakmem_run
106M	106M	1.00	benchmarks.NestedFrameQuery.peakmem_run

Click here to view all benchmarks.

hombit

Thanks! I'm a little worried about doing one more read just for that. Maybe we can wrap pyarrow's error, so we read the schema only if pyarrow fails with an error about a missing column?

dougbrn · 2025-11-04T22:44:48Z

I'm worried about that too, which is why here I'm only doing the schema read if we suspect a partial load. The wrapping idea is interesting, you think we should catch the value error and then do a schema investigation to return a better message?

hombit · 2025-11-04T22:58:39Z

@dougbrn yes, I think it would be the perfect solution. I think we can just try to rely on error messages for that. I think it should be fine until we test both lowest and highest pyarrow versions on CI.

dougbrn · 2025-11-04T23:09:20Z

@hombit Now doing the schema check only after a failed read, the result here is that both the original error message and the nested-pandas error message are present

dougbrn · 2025-11-04T23:55:02Z

screenshot example:

src/nested_pandas/nestedframe/io.py

tests/nested_pandas/nestedframe/test_io.py

dougbrn added 5 commits November 4, 2025 11:41

first implementation as a warning

bb42d31

error implementation; needs to move into deeper reading logic

8104880

move to inner reading logic

17509c0

improve error message

f4d15b8

fix quotation overload

145a800

dougbrn added 4 commits November 4, 2025 14:03

add unit test

2889f3a

move test file to own directory

2f4ddc9

only check schema if there's a potential partial load

9aae0b2

handle columns=None

54acc25

dougbrn requested a review from hombit November 4, 2025 22:34

add note to docstring

c4687b7

hombit reviewed Nov 4, 2025

View reviewed changes

dougbrn added 2 commits November 4, 2025 15:00

only check schema after a read failure

8374b5e

add another test

1d99e93

dougbrn requested a review from hombit November 4, 2025 23:07

hombit approved these changes Nov 5, 2025

View reviewed changes

src/nested_pandas/nestedframe/io.py Outdated Show resolved Hide resolved

src/nested_pandas/nestedframe/io.py Outdated Show resolved Hide resolved

tests/nested_pandas/nestedframe/test_io.py Show resolved Hide resolved

wrap read_table with partial load checking and use everywhere

7e6f939

dougbrn merged commit cae486d into main Nov 5, 2025
10 of 12 checks passed

dougbrn deleted the list_struct_partial_loads branch November 5, 2025 17:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Raise a helpful error for failed partial loading of list-struct columns #403

Raise a helpful error for failed partial loading of list-struct columns #403

Uh oh!

dougbrn commented Nov 4, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 4, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 4, 2025 •

edited

Loading

Uh oh!

hombit left a comment

Uh oh!

dougbrn commented Nov 4, 2025

Uh oh!

hombit commented Nov 4, 2025

Uh oh!

dougbrn commented Nov 4, 2025 •

edited

Loading

Uh oh!

dougbrn commented Nov 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Raise a helpful error for failed partial loading of list-struct columns #403

Raise a helpful error for failed partial loading of list-struct columns #403

Uh oh!

Conversation

dougbrn commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hombit left a comment

Choose a reason for hiding this comment

Uh oh!

dougbrn commented Nov 4, 2025

Uh oh!

hombit commented Nov 4, 2025

Uh oh!

dougbrn commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dougbrn commented Nov 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dougbrn commented Nov 4, 2025 •

edited

Loading

codecov bot commented Nov 4, 2025 •

edited

Loading

github-actions bot commented Nov 4, 2025 •

edited

Loading

dougbrn commented Nov 4, 2025 •

edited

Loading