You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 16, 2023. It is now read-only.
Which is kind of fine, but falls down for a few cases:
When there are no parameter servers (i.e. single node) the ps hosts should be an empty array, but in this case it's just an empty string.
The variables for hosts and workers are comma separated and the TF code parses it as JSON, so would ideally be an array type inside this string.
The 'task.type' property can be 'master', 'worker' or 'ps' but that doesn't seem to have an appropriate environment variable and I had to pass the option via command line args
More generally though, providing this configuration via a TF_CONFIG environment variable would significantly lower the bar to get distributed training working in TensorFlow and Azure Batch. It would also simplify command line arg parameters and mean just the appropriate data directories would need to be passed and mean the same arguments could be used across master, worker and ps potentially simplify the tensorflowSettings property further.