Skip to content

Extend Dry Run Mode to Cover Trainer Initialization in TorchTitan #2044

@fegin

Description

@fegin

The recent pull request [#2012](#2012) introduces a dry run mode for TorchTitan. However, the current implementation restricts the dry run functionality to the configuration system only. This limitation means that other components, such as the Trainer.__init__() method, are not covered by the dry run mode.

To enhance the utility of dry run mode, we should consider leveraging the fake PG (Process Group) mode. By doing so, the dry run mode can be extended to encompass the entire Trainer.__init__() process, allowing for more comprehensive validation and testing without requiring a full environment setup.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions