New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: type object 'ChunkManager' has no attribute 'search_chunk_size' #2166
Comments
ah, i checked it, under the colossalai/gemini/chunk/manager, type object 'ChunkManager' really has no attribute 'search_chunk_size' i wanna kown where he go? |
Sorry for the bug. This is because the mismatch between your colossalai and lightning version.
|
thank u for that, man, i'll try it later @feifeibear |
Hello, did this solution work for you? |
@Alfred-Duncan Is your problem solved? |
@feifeibear I have got same problem. I wanted to finetune stable diffusion 2.0 model, as the step, I install the colossalai up to 0.1.12, and install the lightning from the code source, but it is wrong, "type object 'ChunkManager' has no attribute 'search_chunk_size'" And can i finetune SD 2.0 model now ? |
Yeah, it worked on this problem, but after that, there's another problem happening |
@Alfred-Duncan Could you please post another issue for the other problem? |
Hey, the bug occurs when your colossalai is lower than v0.1.10. I guess you did not correctly install 0.1.12. Can you
|
already post that bro, check the issues after this one, but nearly training success |
@Alfred-Duncan Thanks, I've already seen the issue. The related personnel will be back this afternoon. We will try to reproduce your bug ASAP. |
@FrankieDong |
@Thomas2419 |
I find the problem, and reinstall the lightning from the source, and it 's ok, but another problem occur, I am solving now. |
@FrankieDong nice! I closed the issue. |
馃悰 Describe the bug
when i training the diffusion model
that happened:
Setting up LambdaLR scheduler...
Traceback (most recent call last):
File "/home/tongange/ColossalAI/examples/images/diffusion/main.py", line 804, in
trainer.fit(model, data)
File "/root/anaconda3/envs/ldm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 578, in fit
call._call_and_handle_interrupt(
File "/root/anaconda3/envs/ldm/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/root/anaconda3/envs/ldm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 620, in _fit_impl
self._run(model, ckpt_path=self.ckpt_path)
File "/root/anaconda3/envs/ldm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1038, in _run
self.strategy.setup(self)
File "/root/anaconda3/envs/ldm/lib/python3.9/site-packages/pytorch_lightning/strategies/colossalai.py", line 333, in setup
self.setup_precision_plugin()
File "/root/anaconda3/envs/ldm/lib/python3.9/site-packages/pytorch_lightning/strategies/colossalai.py", line 270, in setup_precision_plugin
chunk_size = self.chunk_size or ChunkManager.search_chunk_size(
AttributeError: type object 'ChunkManager' has no attribute 'search_chunk_size'
Setting up LambdaLR scheduler...
/root/anaconda3/envs/ldm/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py:437: UserWarning: Error handling mechanism for deadlock detection is uninitialized. Skipping check.
rank_zero_warn("Error handling mechanism for deadlock detection is uninitialized. Skipping check.")
Summoning checkpoint.
Traceback (most recent call last):
File "/home/tongange/ColossalAI/examples/images/diffusion/main.py", line 804, in
trainer.fit(model, data)
File "/root/anaconda3/envs/ldm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 578, in fit
call._call_and_handle_interrupt(
File "/root/anaconda3/envs/ldm/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 36, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/root/anaconda3/envs/ldm/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 88, in launch
return function(*args, **kwargs)
File "/root/anaconda3/envs/ldm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 620, in _fit_impl
self._run(model, ckpt_path=self.ckpt_path)
File "/root/anaconda3/envs/ldm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1038, in _run
self.strategy.setup(self)
File "/root/anaconda3/envs/ldm/lib/python3.9/site-packages/pytorch_lightning/strategies/colossalai.py", line 333, in setup
self.setup_precision_plugin()
File "/root/anaconda3/envs/ldm/lib/python3.9/site-packages/pytorch_lightning/strategies/colossalai.py", line 270, in setup_precision_plugin
chunk_size = self.chunk_size or ChunkManager.search_chunk_size(
AttributeError: type object 'ChunkManager' has no attribute 'search_chunk_size'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tongange/ColossalAI/examples/images/diffusion/main.py", line 806, in
melk()
File "/home/tongange/ColossalAI/examples/images/diffusion/main.py", line 789, in melk
trainer.save_checkpoint(ckpt_path)
File "/root/anaconda3/envs/ldm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1900, in save_checkpoint
self._checkpoint_connector.save_checkpoint(filepath, weights_only=weights_only, storage_options=storage_options)
File "/root/anaconda3/envs/ldm/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 512, in save_checkpoint
_checkpoint = self.dump_checkpoint(weights_only)
File "/root/anaconda3/envs/ldm/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 444, in dump_checkpoint
"state_dict": self._get_lightning_module_state_dict(),
File "/root/anaconda3/envs/ldm/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 526, in _get_lightning_module_state_dict
state_dict = self.trainer.strategy.lightning_module_state_dict()
File "/root/anaconda3/envs/ldm/lib/python3.9/site-packages/pytorch_lightning/strategies/colossalai.py", line 383, in lightning_module_state_dict
assert isinstance(self.model, ZeroDDP)
AssertionError
Environment
i use the way bellow to train, all the steps are same:
https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion
The text was updated successfully, but these errors were encountered: