[DataLoader] Remove pyiceberg fork dependency and ArrivalOrder API#504
Merged
cbb330 merged 1 commit intolinkedin:mainfrom Mar 18, 2026
Merged
[DataLoader] Remove pyiceberg fork dependency and ArrivalOrder API#504cbb330 merged 1 commit intolinkedin:mainfrom
cbb330 merged 1 commit intolinkedin:mainfrom
Conversation
Revert pyiceberg from the sumedhsakdeo/iceberg-python fork back to the stable pyiceberg~=0.11.0 release from PyPI. Remove all usage of the fork-only APIs: ArrivalOrder, ScanOrder, TaskOrder, and the batch_size parameter on DataLoaderSplit and OpenHouseDataLoader. - pyproject.toml: drop allow-direct-references, pin pyiceberg~=0.11.0 - data_loader_split.py: remove ArrivalOrder import and order= kwarg, remove batch_size param - data_loader.py: remove batch_size param - Delete test_arrival_order.py - Remove batch_size tests from test_data_loader.py, test_data_loader_split.py, and integration_tests.py - Regenerate uv.lock
robreeves
approved these changes
Mar 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Remove the temporary dependency on the
sumedhsakdeo/iceberg-pythonfork and all APIs it exposed (ArrivalOrder,ScanOrder,TaskOrder,batch_size). Reverts pyiceberg to the stablepyiceberg~=0.11.0release from PyPI.The PEP 508 direct reference (
pyiceberg @ git+https://github.com/sumedhsakdeo/iceberg-python@<sha>) gets baked into the published wheel'sRequires-Distmetadata, so any downstream consumer that installsopenhouse-dataloaderis forced to resolve pyiceberg from a personal GitHub fork. This fails ELR in downstream consumer environments, which only permit dependencies sourced from approved registries (PyPI, internal Artifactory). The fork pin cannot be approved through ELR because it is not a published release of a recognized OSS project.Changes
pyproject.toml— Drop[tool.hatch.metadata] allow-direct-references, revert pyiceberg from the fork SHA back topyiceberg~=0.11.0.data_loader_split.py— RemoveArrivalOrderimport andorder=ArrivalOrder(...)kwarg fromto_record_batches(). Removebatch_sizeparameter fromDataLoaderSplit.__init__.data_loader.py— Removebatch_sizeparameter fromOpenHouseDataLoader.__init__.uv.lock— Regenerated to resolve pyiceberg from PyPI.Tests — Delete
test_arrival_order.py(entire file tested fork-only APIs). Removebatch_sizetests fromtest_data_loader.py,test_data_loader_split.py, andintegration_tests.py.Testing Done
make verifypasses — 135 tests pass, lint, format, and mypy all green.Additional Information
batch_sizeis removed fromOpenHouseDataLoaderandDataLoaderSplit. Any callers passingbatch_sizewill need to remove that argument.