refactor: consolidate split logic in base class#20
Merged
ziv-lazarov-nagish merged 1 commit intonagishfrom Apr 16, 2026
Merged
Conversation
4 tasks
AmitMY
requested changes
Apr 16, 2026
Contributor
AmitMY
left a comment
There was a problem hiding this comment.
the split we use for dgs is a fixed one, used in research (not only our research)
using the split.json is preferred here - because it allows us to compare to our previous work, and to other's work as it comes. as i understand here we would create a new split
Contributor
Author
correct. i was thinking on a single way to split it across all |
7f15fe5 to
c96a608
Compare
- Add split_bucket() and assign_split() to common.py for hash-based splits - Add _init_split_tracking(), _track_and_filter(), get_split_manifest() to BaseSegmentationDataset — removes duplication across all datasets - DGS: keep fixed splits.json (research-standard split), migrate to base class tracking infra and pathlib - Platform: use assign_split() + base class helpers - Remove unused collate_fn/DataLoader imports and duplicate manifest collection in train.py
c96a608 to
e83c8f0
Compare
AmitMY
approved these changes
Apr 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
split_bucket()andassign_split()tocommon.py— deterministic hash-based split assignment_init_split_tracking(),_track_and_filter(),get_split_manifest()toBaseSegmentationDataset— removes duplication across all dataset classessplits.json(research-standard split), migrate to base class tracking infra and pathlibassign_split()+ base class helpers, remove local_split_bucketcollate_fn/DataLoaderimports and duplicate manifest collection intrain.pyDepends on #19.
Changed files
datasets/common.py—split_bucket,assign_split, base class split trackingdatasets/dgs/dataset.py— use base class tracking while keeping fixed splits.json, migrate to pathlibdatasets/annotation_platform/dataset.py— useassign_split+ base class helperstrain.py— remove unused imports and duplicate manifesttests/— update imports and dummy datasetTest plan
ruff check .passespytestpasses (61 tests)