fix(dataloader): prevent data drop and padding during validation for accurate metrics by Anguo-star · Pull Request #1100 · inclusionAI/AReaL

Anguo-star · 2026-03-27T09:05:15Z

fix(dataloader): use EvalDistributedSampler for validation to ensure accurate metrics

Description

This PR fixes a critical bug where the validation dataloader was producing inaccurate evaluation metrics due to hardcoded data dropping and standard distributed padding.

Previously, create_dataloader hardcoded drop_last=True and used the standard DistributedSampler for all dataset configs. The standard DistributedSampler pads the dataset to make it evenly divisible by the number of replicas, which causes some validation samples to be evaluated twice. Combined with drop_last=True, this means the validation set was both artificially inflated (via padding) and truncated (via dropping), leading to biased evaluation results.

Changes made:

Introduced EvalDistributedSampler: A custom sampler designed specifically for evaluation that prevents dataset padding. It ensures every sample in the dataset is evaluated exactly once across the cluster.
Updated create_dataloader: Added logic to check if the config is a ValidDatasetConfig. If true, it dynamically applies the new EvalDistributedSampler and forces drop_last=False.

Related Issue

Fixes #1095

Type of Change

Checklist

I have read the Contributing Guide
Pre-commit hooks pass (pre-commit run --all-files)
Relevant tests pass; new tests added for new functionality
Documentation updated (if applicable; built with ./docs/build_all.sh)
Branch is up to date with main
Self-reviewed via /review-pr command
This PR was created by a coding agent via /create-pr
This PR is a breaking change

Breaking Change Details (if applicable):

Additional Context

Need help? Check the Contributing Guide or ask in
GitHub Discussions!

…accurate metrics(pre-commit checked)

gemini-code-assist

Code Review

This pull request introduces a new EvalDistributedSampler class in areal/utils/dataloader.py designed for evaluation tasks. Unlike the standard DistributedSampler, this implementation avoids padding the dataset, ensuring that each sample is evaluated exactly once across the cluster even if the dataset size is not evenly divisible by the number of replicas. The create_dataloader function was updated to use this new sampler when a ValidDatasetConfig is provided. I have no feedback to provide.

garrett4wade

@Anguo-star Thanks for the fix!

FYI the new EvalDistributedSampler class may not be mandatory with the single-controller usage (i.e., "python3 train.py" instead of the legacy "python3 -m areal.infra.launcher.local train.py"), because the data is only loaded in the controller process. The fix is good for backward compatiblity. Merging anyways.

…rash In single-controller mode with drop_last=False (as set by PR #1100 for accurate validation metrics), the last validation batch may contain fewer sequences than dp_size. This causes balanced_greedy_partition to raise "The length of nums must be divisible by K" during evaluation dispatch. Truncate trailing items in _dispatch_tensors when the batch size is not evenly divisible by dp_size, preserving compatibility with both drop_last=False validation and strict equal-partition dispatch.

…ivisible batches When drop_last=False, the last eval batch may not be divisible by dp_size, causing balanced_greedy_partition to crash (#1100/#1109). Instead of truncating remainder items (which biases eval metrics), pad with zero-contribution dummy items before dispatch so that all DP replicas participate in collective ops without deadlock. Key changes: - Add make_dummy_eval_item in areal/utils/data.py - Add _pad_eval_batch in train_controller.py for evaluate_* methods - Move total_loss_weight assertion after all_reduce (allow local zero) - Add zero-weight fast path in FSDP/Megatron/Archon engines - Add tests for padding, dummy schema, and zero-weight loss path Refs: #1100, #1109, #898

…ivisible batches When drop_last=False, the last eval batch may not be divisible by dp_size, causing balanced_greedy_partition to crash. Instead of truncating remainder items (which biases metrics), pad the batch with zero-contribution dummy items before dispatch. Extend dispatch to support group_size-aware partitioning so RW chosen/rejected pairs stay on the same DP rank. Key changes: - Add make_dummy_eval_item() for zero-weight schema-compatible items - Add _pad_eval_batch() for eval-only dummy padding before dispatch - Add zero-weight fast path in FSDP/Megatron/Archon engines - Move total_loss_weight assertion after all_reduce (allow local zero) - Add group_size-aware dispatch for RW chosen/rejected pairs - Exclude dummy sequences from RW loss weight via count_nonzero - Add 20+ tests covering padding, dispatch, and zero-weight paths Refs: #1095, #1100, #898

fix(dataloader): prevent data drop and padding during validation for …

d45976a

…accurate metrics(pre-commit checked)

gemini-code-assist bot reviewed Mar 27, 2026

View reviewed changes

garrett4wade approved these changes Mar 30, 2026

View reviewed changes

garrett4wade merged commit 0405b5c into inclusionAI:main Mar 30, 2026

rchardx mentioned this pull request Mar 30, 2026

fix: harden padded distributed eval across training engines #1109

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(dataloader): prevent data drop and padding during validation for accurate metrics#1100

fix(dataloader): prevent data drop and padding during validation for accurate metrics#1100
garrett4wade merged 1 commit intoinclusionAI:mainfrom
Anguo-star:dev_ag

Anguo-star commented Mar 27, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

garrett4wade left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Anguo-star commented Mar 27, 2026

Description

Related Issue

Type of Change

Checklist

Additional Context

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

garrett4wade left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants