Skip to content

Conversation

@hombit
Copy link
Collaborator

@hombit hombit commented Oct 27, 2025

Closes #392

@hombit hombit requested a review from gitosaurus October 27, 2025 16:09
@codecov
Copy link

codecov bot commented Oct 27, 2025

Codecov Report

❌ Patch coverage is 69.69697% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.33%. Comparing base (c0ba0f2) to head (1e15f09).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/nested_pandas/nestedframe/io.py 69.69% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #393      +/-   ##
==========================================
- Coverage   97.79%   97.33%   -0.47%     
==========================================
  Files          19       19              
  Lines        2042     2062      +20     
==========================================
+ Hits         1997     2007      +10     
- Misses         45       55      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Oct 27, 2025

Before [960c0ec] After [8bc53a3] Ratio Benchmark (Parameter)
11.1±0.3ms 11.9±0.7ms 1.07 benchmarks.NestedFrameAddNested.time_run
1.26±0.02ms 1.30±0.01ms 1.03 benchmarks.NestedFrameReduce.time_run
174M 180M 1.03 benchmarks.ReadFewColumnsHTTPS.peakmem_run
1.3G 1.34G 1.03 benchmarks.ReadFewColumnsS3.peakmem_run
258M 260M 1.01 benchmarks.ReassignHalfOfNestedSeries.peakmem_run
253M 253M 1 benchmarks.AssignSingleDfToNestedSeries.peakmem_run
134M 134M 1 benchmarks.CountNestedBy.peakmem_run
64.2±0.3ms 63.9±0.5ms 1 benchmarks.CountNestedBy.time_run
103M 103M 1 benchmarks.NestedFrameAddNested.peakmem_run
108M 107M 1 benchmarks.NestedFrameQuery.peakmem_run

Click here to view all benchmarks.

@hombit hombit requested a review from gitosaurus October 28, 2025 14:55
Copy link
Contributor

@gitosaurus gitosaurus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! This looks good to me, but I have one remaining concern about whether this is supporting more than was supported before.

@gitosaurus gitosaurus self-requested a review October 28, 2025 19:29
Copy link
Contributor

@gitosaurus gitosaurus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@hombit hombit marked this pull request as draft October 28, 2025 22:02
@hombit
Copy link
Collaborator Author

hombit commented Oct 28, 2025

Actually reading directories with column selection doesn't work right now, working on another reimplementation

@hombit
Copy link
Collaborator Author

hombit commented Oct 29, 2025

Reimplemented again, now is_dir is called on almost every path, which, unfortunately, would cause an extra round-trip. We can try to fix it on HATS land by passing the path and filesystem separately, and adding a trailing "/" to directory paths.

It is also really poorly tested, but hopefully will be better covered by cloud tests.

@hombit hombit enabled auto-merge (squash) October 29, 2025 18:01
@hombit hombit merged commit 8917d07 into main Oct 29, 2025
10 of 12 checks passed
@hombit hombit deleted the read-web-dir branch October 29, 2025 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Restore ability to read Parquet files in S3 directories

3 participants