Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding the issue of setup(model, optimizer) in fabric FSDP #3

Closed
Williamwsk opened this issue Oct 8, 2023 · 2 comments
Closed

Comments

@Williamwsk
Copy link

When I follow your code guidance, the environment configuration is consistent with yours, but fabric.setup(model, optimizer) reports an error. I see that your code is also written like this. Have you ever encountered this problem? Looking forward to your reply!

The source code is as follows:
`import torch
import torchvision as tv
import lightning as L

def main():
fabric = L.Fabric(accelerator='cuda', precision='bf16-mixed', devices=torch.cuda.device_count(), strategy='fsdp')
fabric.launch()

model = tv.models.resnet18()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
model, optimizer = fabric.setup(model, optimizer)
# model = fabric.setup_module(model)
# optimizer = fabric.setup_optimizers(optimizer)
dataset = tv.datasets.CIFAR10("data", download=True, transform=tv.transforms.ToTensor())
dataloader = torch.utils.data.DataLoader(dataset, batch_size=8)
dataloader = fabric.setup_dataloaders(dataloader)

if name == 'main':
main()`

The error is reported as follows:
Traceback (most recent call last): File "/project/Code/Skin/aa.py", line 23, in <module> model, optimizer = fabric.setup(model, optimizer) File "/opt/conda/envs/wsk-py310/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 198, in setup self._validate_setup(module, optimizers) File "/opt/conda/envs/wsk-py310/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 856, in _validate_setup main() File "/project/Code/Skin/aa.py", line 14, in main model, optimizer = fabric.setup(model, optimizer) File "/opt/conda/envs/wsk-py310/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 198, in setup raise RuntimeError( RuntimeError: The Fabricrequires the model and optimizer(s) to be set up separately. Create and set up the model first throughmodel = self.setup_model(model). Then create the optimizer and set it up: optimizer = self.setup_optimizer(optimizer). self._validate_setup(module, optimizers) File "/opt/conda/envs/wsk-py310/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 856, in _validate_setup raise RuntimeError( RuntimeError: The Fabricrequires the model and optimizer(s) to be set up separately. Create and set up the model first throughmodel = self.setup_model(model). Then create the optimizer and set it up: optimizer = self.setup_optimizer(optimizer).

@rasbt
Copy link
Owner

rasbt commented Oct 8, 2023

Thanks for the note and sorry about the hassle. I think that's because I had the current dev version installed, which has not been released yet. The new version allows you to combine the two lines

    model = fabric.setup_module(model)
    optimizer = fabric.setup_optimizers(optimizer)

into a single line

    model, optimizer = fabric.setup(model, optimizer)

But for the latest stable release, the 2 separate lines are necessary. I updated the code accordingly.

@rasbt rasbt closed this as completed Oct 8, 2023
@Williamwsk
Copy link
Author

Williamwsk commented Oct 9, 2023

Thanks for the note and sorry about the hassle. I think that's because I had the current dev version installed, which has not been released yet. The new version allows you to combine the two lines

    model = fabric.setup_module(model)
    optimizer = fabric.setup_optimizers(optimizer)

into a single line

    model, optimizer = fabric.setup(model, optimizer)

But for the latest stable release, the 2 separate lines are necessary. I updated the code accordingly.

I tried the modification method you mentioned, but the optimizer still reported an error:

File "/project/Code/Skin/aa.py", line 21, in <module>
    main()
  File "/project/Code/Skin/aa.py", line 15, in main
    optimizer = fabric.setup_optimizers(optimizer)
  File "/opt/conda/envs/wsk-py310/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 289, in setup_optimizers
    optimizers = [self._strategy.setup_optimizer(optimizer) for optimizer in optimizers]
  File "/opt/conda/envs/wsk-py310/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 289, in <listcomp>
    optimizers = [self._strategy.setup_optimizer(optimizer) for optimizer in optimizers]
  File "/opt/conda/envs/wsk-py310/lib/python3.10/site-packages/lightning/fabric/strategies/fsdp.py", line 213, in setup_optimizer
    raise ValueError(
ValueError: The optimizer does not seem to reference any FSDP parameters. HINT: Make sure to create the optimizer after setting up the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants