-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training stopped working (probably because a change Microsoft applied) #87
Comments
Following |
More information.... If my assumption is correct than as a temp fix it would be possible to use DeepSpeed versions that are earlier than v0.9.0. Does anyone know how to enforce vall-e to get the DeepSpeed dependency version that is earlier than v0.9.0? Note: I could see that vall_e has a file called setup.py that contains this line "deepspeed>=0.7.7" maybe we can try updating it... |
Found this temporary fix: I changed setup.py under "/notebooks/vall-e/setup.py" (don't get confused, there is another setup.py file under /notebooks/setup.py) and changed the line "deepspeed>=0.7.7" to "deepspeed>=0.7.7,<0.9.0". This enforces the installation to use the old deepspeed version that doesn't contain the change that caused the problem. NOTE THAT THIS IS ONLY A TEMPORARY FIX. The right thing to do is to change vall-e's code to suite the updated deepspeed code. I hope someone is up to this challenge! |
Whenwever I run training (e.g. python -m vall_e.train yaml=config/libri/ar.yml) I get the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/notebooks/vall-e/vall_e/train.py", line 128, in
main()
File "/notebooks/vall-e/vall_e/train.py", line 119, in main
trainer.train(
File "/notebooks/vall-e/vall_e/utils/trainer.py", line 125, in train
engines = engines_loader()
File "/notebooks/vall-e/vall_e/train.py", line 21, in load_engines
model=trainer.Engine(
File "/notebooks/vall-e/vall_e/utils/engines.py", line 22, in init
super().init(None, *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 264, in init
self._do_sanity_check()
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 988, in _do_sanity_check
if self.optimizer_name() is not None:
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 648, in optimizer_name
return (self.client_optimizer.class.name if self.client_optimizer else self._config.optimizer_name)
AttributeError: 'NoneType' object has no attribute 'optimizer_name'
At first look it seems that a change was applied to microsoft's DeepSpeed code. when Micorosoft's module is initialized it looks for a config object that contains the attribute optimizer_name.
vall_e uses DeepSpeed and initializes it as part of the class 'Engine' in utils/engines.py but it does not pass the required config parameter.
I suspect this is the DeepSpeed commit that caused the problem: microsoft/DeepSpeed@47f9f13
Can anyone help?
Can anyone help?
The text was updated successfully, but these errors were encountered: