You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, Vega does not support horovd in the NAS phase.
In the NAS phase, dask.distributed is used to distribute different networks to different nodes for parallel search.
The fullytrain phase supports horovd. In this case, you need to set distributed to True, delete the models_folder parameter, and set the model_desc_file parameter to perform fulltrain for a specific network. See the following:
When I try to slove this problem for esr_ea algorithm by this way, The error shows that : "/root/.local/lib/python3.6/site-packages/vega/core/pipeline/horovod/run_cluster_horovod_train.sh: No such file or directory," . Could you tell me how to deal with it?
I run example code, like that
python run_pipeline.py classification/classify.yml
with one new config itemtrainer:distributed=True
Some errors occur about horovod.
ValueError: Horovod has not been initialized; use hvd.init().
.Does horovod is not supported in this example?
The text was updated successfully, but these errors were encountered: