Support empty batches for arbitrary dataset structures #534

ffuuugor · 2022-10-31T16:26:05Z

For context see discussion in #530 (and thanks @joserapa98 for pointing out the issue)

At the moment (to be precise, after #530 will have been merged) Opacus can support empty batches only for datasets with a simple structure - every record should be a tuple of a simple type: either tensor or a primitive type.

For instance, datasets with records like this (Tensor, int) or this (Tensor, Tensor) are supported. However datasets like this (Tensor, (int, int)) are not.

Pytorch adresses similar problem with the following piece of code:

if isinstance(elem, collections.abc.Mapping):
    try:
        return elem_type({key: collate([d[key] for d in batch], collate_fn_map=collate_fn_map) for key in elem})
    except TypeError:
        # The mapping type may not support `__init__(iterable)`.
        return {key: collate([d[key] for d in batch], collate_fn_map=collate_fn_map) for key in elem}
elif isinstance(elem, tuple) and hasattr(elem, '_fields'):  # namedtuple
    return elem_type(*(collate(samples, collate_fn_map=collate_fn_map) for samples in zip(*batch)))
elif isinstance(elem, collections.abc.Sequence):
    # check to make sure that the elements in batch have consistent size
    it = iter(batch)
    elem_size = len(next(it))
    if not all(len(elem) == elem_size for elem in it):
        raise RuntimeError('each element in list of batch should be of equal size')
    transposed = list(zip(*batch))  # It may be accessed twice, so we use a list.


    if isinstance(elem, tuple):
        return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed]  # Backwards compatibility.
    else:
        try:
            return elem_type([collate(samples, collate_fn_map=collate_fn_map) for samples in transposed])
        except TypeError:
            # The sequence type may not support `__init__(iterable)` (e.g., `range`).
            return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed]

We need to adapt it to our needs and make sure DPDataLoader can handle datasets of arbitrary structure.

Relevant code pointer:

opacus/opacus/data_loader.py

Line 31 in 7393ae4

def wrap_collate_with_empty(

The text was updated successfully, but these errors were encountered:

ffuuugor mentioned this issue Oct 31, 2022

Support empty batches #530

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support empty batches for arbitrary dataset structures #534

Support empty batches for arbitrary dataset structures #534

ffuuugor commented Oct 31, 2022

Support empty batches for arbitrary dataset structures #534

Support empty batches for arbitrary dataset structures #534

Comments

ffuuugor commented Oct 31, 2022