Skip to content

Add with_rank to Dataset.from_generator #7213

@muthissar

Description

@muthissar

Feature request

Add with_rank to Dataset.from_generator similar to Dataset.map and Dataset.filter.

Motivation

As for Dataset.map and Dataset.filter, this is useful when creating cache files using multi-GPU, where the rank can be used to select GPU IDs. For now, rank can be added in the gen_kwars argument; however, this, in turn, includes the rank when computing the fingerprint.

Your contribution

Added #7199 which passes rank based on the job_id set by num_proc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions