Skip to content

多机多卡训练问题 #197

@Xu-Chen

Description

@Xu-Chen

多机多卡训练时,会遇到日志文件 jsonl 找不到的问题

     if state.is_local_process_zero and self.training_bar is not None:
            jsonl_path = os.path.join(args.output_dir, 'logging.jsonl')
            with open(jsonl_path, 'a', encoding='utf-8') as f:
                f.write(json.dumps(logs) + '\n')

这里会报 文件不存在,我觉得是多级多卡时,没有在work 节点创建目录文件导致的

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions