Skip to content

Commit cc897ec

Browse files
Lzhang-hubloadams
andauthored
resolve KeyError: 'PDSH_SSH_ARGS_APPEND' (#5318)
when start job with `deepspeed --hostfile hostfile --master_addr $MASTER_IP --ssh_port 20023 src/train_bash.py ` get error: KeyError: 'PDSH_SSH_ARGS_APPEND' in https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/launcher/multinode_runner.py#L77 because PDSH_SSH_ARGS_APPEND not in environment. --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
1 parent b5e2045 commit cc897ec

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

deepspeed/launcher/multinode_runner.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,8 @@ def name(self):
7474
def get_cmd(self, environment, active_resources):
7575
environment['PDSH_RCMD_TYPE'] = 'ssh'
7676
if self.args.ssh_port is not None: # only specify ssh port if it is specified
77-
environment["PDSH_SSH_ARGS_APPEND"] += f" -p {self.args.ssh_port}"
77+
environment["PDSH_SSH_ARGS_APPEND"] = f"{environment.get('PDSH_SSH_ARGS_APPEND', '')} \
78+
-p {self.args.ssh_port}"
7879

7980
active_workers = ",".join(active_resources.keys())
8081
logger.info("Running on the following workers: %s" % active_workers)

0 commit comments

Comments
 (0)