Skip to content

Conversation

@zhentaocc
Copy link
Contributor

@zhentaocc zhentaocc commented Oct 21, 2021

Resolved 1st task in #3171.

  • Add a base class to convert pytorch model to pytorch-lightning.
  • Add Trainer.compile for user to use.
  • Add a unit tests.

):
r"""
Create an instance from torch.utils.data.Dataset.
Override pl.LightningDataModule.from_datasets for cpu usage, setting pin_memory as False by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's strange, I wonder why.

from torch.utils.data import DataLoader, Dataset, IterableDataset


class LightningModuleWrapper(pl.LightningModule):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have planned to put LightningModuleWrapper to nano.



class LightningModuleWrapper(pl.LightningModule):
def __init__(self, model_creator, configs: dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be better if we just accept three creator.

  1. model creator
  2. loss creator
  3. optim creator

and a config

@zhentaocc
Copy link
Contributor Author

  • Refactored pytorch lightning wrapper as a decorator on nn.Module.
  • Moved the decorator to nano.

@zhentaocc
Copy link
Contributor Author

Resolved 1st task in #3171.

  • Added a base class to wrap pytorch model as pytorch-lightning.

  • Added a base class to create pytorch lightning datamodule from dataset and give an entry to modify pin_memory. Default value for pin_memory is False instead of True.

  • Added a unit test for Lightning Wrapper usage on Vanilla_LSTM_pytorch:

    1. Enabled forward() to handle extra arguments
    2. test on both nano and pl trainer

@zhentaocc zhentaocc closed this Oct 27, 2021
@zhentaocc zhentaocc reopened this Oct 27, 2021
@zhentaocc zhentaocc changed the title Add Pytorch-Lightning Wrapper Add pytorch-lightning decorator to nano Oct 27, 2021
@TheaperDeng TheaperDeng linked an issue Oct 27, 2021 that may be closed by this pull request
return getattr(torch.optim, config.get("optim", "Adam"))(model.parameters(), lr=config.get("lr", 0.001))


@lightning_support.lightning_module(loss_creator, optimizer_creator, config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is too complex for the user; I think we can simply do something like:

class MyModel(torch.nn.moudule):
   ...
model = MyModel(...)
loss = nn.CrossEntropyLoss()
opt= optim.Adam(...)
trainer.fit(model, loss, opt, train_data, val_data)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user (let's say chronos) have a nn.module, and he/she wants to use the training opt in our bigdl.nano trainer, and onnxruntime inference (which is a support on pl lightning module). He can't use this api design to do both of them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user (let's say chronos) have a nn.module, and he/she wants to use the training opt in our bigdl.nano trainer, and onnxruntime inference (which is a support on pl lightning module). He can't use this api design to do both of them.

You may be able to do something like:

loss = nn.CrossEntropyLoss()
opt= optim.Adam(...)

@pl_module(loss, opt)
class MyModel(torch.nn.moudule):
   ...

model = MyModel(...)
trainer.fit(model, train_data, val_data)

But how do you plan to add onnxruntime support in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do it like this

loss = nn.CrossEntropyLoss()
opt= optim.Adam(...)

@onnxruntime()
@pl_module(loss, opt)
class MyModel(torch.nn.moudule):
   ...

model = MyModel(...)
trainer.fit(model, train_data, val_data)

as I stated in #3272

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me

Copy link
Contributor

@jason-dai jason-dai Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I think we may make this PR only focus the decorator support just to make it easier to review and clearer. And we will raise another PR for the trainer API's change.

Which one do you think will have better user experience?

loss = nn.CrossEntropyLoss()
opt= optim.Adam(...)

@onnxruntime()
@pl_module(loss, opt)
class MyModel(torch.nn.moudule):
   ...
model = MyModel(...)

or

class MyModel(torch.nn.moudule):
   ...
model = MyModel(...)

loss = nn.CrossEntropyLoss()
opt= optim.Adam(...)
nano_model = trainer.compile(model, loss, opt, onnx=True)

Copy link
Contributor

@TheaperDeng TheaperDeng Nov 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the trainer.compile has a better user experience since you can not create a opt instance before the model instance is built, which leads to some config dict in the decorator case.

Still the method involve a abnormal usage (i.e. trainer.compile). We need to give thorough and detailed user guide and in-code-warning.

btw, we don't need to implement onnx=True parameter in this PR since I have not merged the onnx PR. I will do it later after this PR has been merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the trainer.compile has a better user experience since you can not create a opt instance before the model instance is built, which leads to some config dict in the decorator case.

Still the method involve a abnormal usage (i.e. trainer.compile). We need to give thorough and detailed user guide and in-code-warning.

btw, we don't need to implement onnx=True parameter in this PR since I have not merged the onnx PR. I will do it later after this PR has been merged.

What do you mean by abnormal usage? I think the point is

  1. If the user simply has a PyTorch model, he or she can directly use it in nano.pytorch.trainer methods (if the default behavior of fit, test, predict, etc., works for him or her)

  2. If the user needs more complex behavior (e.g., onnxruntime support), he or she needs to explictly convert it to pl_module, and we can provide an API based on either decorator or compile method.

Copy link
Contributor

@TheaperDeng TheaperDeng Nov 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the trainer.compile has a better user experience since you can not create a opt instance before the model instance is built, which leads to some config dict in the decorator case.
Still the method involve a abnormal usage (i.e. trainer.compile). We need to give thorough and detailed user guide and in-code-warning.
btw, we don't need to implement onnx=True parameter in this PR since I have not merged the onnx PR. I will do it later after this PR has been merged.

What do you mean by abnormal usage? I think the point is

  1. If the user simply has a PyTorch model, he or she can directly use it in nano.pytorch.trainer methods (if the default behavior of fit, test, predict, etc., works for him or her)
  2. If the user needs more complex behavior (e.g., onnxruntime support), he or she needs to explictly convert it to pl_module, and we can provide an API based on either decorator or compile method.

I think we can just ask all users with a nn.module to do this:

model = Net()
model_pl = trainer.compile(model, loss, opt, onnx=bool)
# then use model_pl (pl.lightningmodule) to do anything else (e.g. `fit`, `test`, `predict` with trainer)

abnormal usage means that trainer.compile is not a easy name for users to come up with directly, we need examples, quickstarts and user guides to guide them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not have to be trainer.compile; maybe something similar to ray.distribtued, such as nano.pl_module

@zhentaocc zhentaocc requested a review from TheaperDeng October 28, 2021 07:41
@zhentaocc
Copy link
Contributor Author

Modifications:

  • Pass loss, optim directly
  • Moved the file under pytorch folder
  • Renamed file, functions and classed
  • Added docstring

@jason-dai
Copy link
Contributor

jason-dai commented Oct 29, 2021

I don't think we should implementation multiple interfaces for the same use case - it just confuses the users; instead, just provide one single interface for one use case.

@zhentaocc
Copy link
Contributor Author

zhentaocc commented Nov 1, 2021

I don't think we should implementation multiple interfaces for the same use case - it just confuses the users; instead, just provide one single interface for one use case.

Sure, we need to decide which usage is much more common and friendly to users. And at the same time onnxruntime support should be aligned if we choose to wrap a torch model instance instead of a torch class.
But in the other way, I implemented another function which can either be a decorator or a usual function on a torch model instance:

@to_lightning(loss, torch.optim.Adam, lr=0.01)
class Net(nn.Module):
    pass
pl_model = Net()

or

class Net(nn.Module):
    pass
pl_model = to_lightning(loss, torch.optim.Adam, lr=0.01)(Net())

I suppose we may have more extensions coming on pytorchlightning model. Maybe we can wrap all these extensions into one so user don't have to decide which function to use:

def composed(*decs):
    def deco(f):
        for dec in reversed(decs):
            f = dec(f)
        return f
    return deco

def preprocess(loss=None, optimizer=None, config=None, onnx=True):
    return composed(
        onnxruntime(onnx),
        to_lightning(loss, optimizer, **config)
    )
pl_model = preprocess(loss, torch.optim.Adam, {"lr": 0.01}, onnx=True)(model)

We can also integrate this function into trainer.compile().
What's your thoughts on this, providing a unified interface or just choose one from it?

@jason-dai
Copy link
Contributor

I don't think we should implementation multiple interfaces for the same use case - it just confuses the users; instead, just provide one single interface for one use case.

Sure, we need to decide which usage is much more common and friendly to users. And at the same time onnxruntime support should be aligned if we choose to wrap a torch model instance instead of a torch class. But in the other way, I implemented another function which can either be a decorator or a usual function on a torch model instance:

@to_lightning(loss, torch.optim.Adam, lr=0.01)
class Net(nn.Module):
    pass
pl_model = Net()

or

class Net(nn.Module):
    pass
pl_model = to_lightning(loss, torch.optim.Adam, lr=0.01(Net())

I suppose we may have more extensions coming on pytorchlightning model. Maybe we can wrap all these extensions into one so user don't have to decide which function to use:

def composed(*decs):
    def deco(f):
        for dec in reversed(decs):
            f = dec(f)
        return f
    return deco

def preprocess(loss=None, optimizer=None, config=None, onnx=True):
    return composed(
        onnxruntime(onnx),
        to_lightning(loss, optimizer, **config)
    )
pl_model = preprocess(loss, torch.optim.Adam, {"lr": 0.01}, onnx=True)(model)

We can also integrate this function into trainer.compile(). What's your thoughts on this, providing a unified interface or just choose one from it?

pl_model = to_lightning(loss, torch.optim.Adam, lr=0.01(Net()) - is this correct? I think we want to have something similar to ray.remote, but support below cases:

  • Class definition

    Class Net(torch.nn.module):
        ...
    model = Net()
  • Model instance

    model = torch.vision.ResNet18()
  • Model creator function

    def model_creator():
       model = torch.vision.ResNet18()
       return model

Need a consistent API for all these use cases.

@zhentaocc
Copy link
Contributor Author

zhentaocc commented Nov 2, 2021

Having discussed with @TheaperDeng @yangw1234 , currently we have 3 solutions:

  1. trainer.compile(model, loss, optimizer) and return a lightning module
  2. trainer.compile(model, loss, optimizer) and return a lightning module. With extra flag to track if trainer.compile is called or not, we forced user to use this function before fit().
  3. Bind model to trainer, convert pytorch model to lightning when trainer is initiated and no need to pass model to fit().

We prefer option 2 to make the conversion as a fixed process and giving proper warning and error to inform users of the correct usage. Finally user need to do:
create model --> create loss,optim --> pl_model=compile(model, loss, optim) -> fit(pl_model, ...)
And this can cover all 3 use cases mentioned above.

The name of this trainer.compile(model, loss, optimizer) is to be decided so that no confusion to users.

@jason-dai What's your thoughts?

@jason-dai
Copy link
Contributor

Having discussed with @TheaperDeng @yangw1234 , currently we have 3 solutions:

  1. trainer.compile(model, loss, optimizer) and return a lightning module
  2. trainer.compile(model, loss, optimizer) and return a lightning module. With extra flag to track if trainer.compile is called or not, we forced user to use this function before fit().
  3. Bind model to trainer, convert pytorch model to lightning when trainer is initiated and no need to pass model to fit().

We prefer option 2 to make the conversion as a fixed process and giving proper warning and error to inform users of the correct usage. Finally user need to do: create model --> create loss,optim --> pl_model=compile(model, loss, optim) -> fit(pl_model, ...) And this can cover all 3 use cases mentioned above.

The name of this trainer.compile(model, loss, optimizer) is to be decided so that no confusion to users.

@jason-dai What's your thoughts?

I think we only need compile for PyTorch model; otherwise the user needs to change his or her PL code.

@zhentaocc
Copy link
Contributor Author

Having discussed with @TheaperDeng @yangw1234 , currently we have 3 solutions:

  1. trainer.compile(model, loss, optimizer) and return a lightning module
  2. trainer.compile(model, loss, optimizer) and return a lightning module. With extra flag to track if trainer.compile is called or not, we forced user to use this function before fit().
  3. Bind model to trainer, convert pytorch model to lightning when trainer is initiated and no need to pass model to fit().

We prefer option 2 to make the conversion as a fixed process and giving proper warning and error to inform users of the correct usage. Finally user need to do: create model --> create loss,optim --> pl_model=compile(model, loss, optim) -> fit(pl_model, ...) And this can cover all 3 use cases mentioned above.
The name of this trainer.compile(model, loss, optimizer) is to be decided so that no confusion to users.
@jason-dai What's your thoughts?

I think we only need compile for PyTorch model; otherwise the user needs to change his or her PL code.

The main reason why we want to make compile a necessary step is that considering adding other extensions on Lightning (e.g. onnxruntime support, maybe following we'll have other extensions to add), integrating all these to compile relieves users from calling many interfaces to use different extensions. Even though user has already created a lightning model, compile can still do nothing and return the model.

pl_model = trainer.compile(torch.vision.ResNet18(),  loss, optimizer, ...)
pl_model = trainer.compile(LightningModule(...)) # This does nothing and returns LightningModule(...)

Above are all legal for users to use so they don't have to change their lightning code. By this way, they can be warned to use complie if they need any extra extension in nano as well as converting to a pl model.

@jason-dai
Copy link
Contributor

jason-dai commented Nov 2, 2021

Having discussed with @TheaperDeng @yangw1234 , currently we have 3 solutions:

  1. trainer.compile(model, loss, optimizer) and return a lightning module
  2. trainer.compile(model, loss, optimizer) and return a lightning module. With extra flag to track if trainer.compile is called or not, we forced user to use this function before fit().
  3. Bind model to trainer, convert pytorch model to lightning when trainer is initiated and no need to pass model to fit().

We prefer option 2 to make the conversion as a fixed process and giving proper warning and error to inform users of the correct usage. Finally user need to do: create model --> create loss,optim --> pl_model=compile(model, loss, optim) -> fit(pl_model, ...) And this can cover all 3 use cases mentioned above.
The name of this trainer.compile(model, loss, optimizer) is to be decided so that no confusion to users.
@jason-dai What's your thoughts?

I think we only need compile for PyTorch model; otherwise the user needs to change his or her PL code.

The main reason why we want to make compile a necessary step is that considering adding other extensions on Lightning (e.g. onnxruntime support, maybe following we'll have other extensions to add), integrating all these to compile relieves users from calling many interfaces to use different extensions. Even though user has already created a lightning model, compile can still do nothing and return the model.

pl_model = trainer.compile(torch.vision.ResNet18(),  loss, optimizer, ...)
pl_model = trainer.compile(LightningModule(...)) # This does nothing and returns LightningModule(...)

Above are all legal for users to use so they don't have to change their lightning code. By this way, they can be warned to use complie if they need any extra extension in nano as well as converting to a pl model.

It's OK to use compile to add nano-specific extensions; however, the user should not be required to call compile if he or she just has standard PL code and wants to use nano for transparent acceleration.

@zhentaocc
Copy link
Contributor Author

As discussed above, make modifications:

  1. Refactored to_lightning to class LightningModuleFromTorch
  2. Added unit test to test LightningModuleFromTorch
  3. Added static method Trainer.compile(...) since self is not used
  4. Added unit test to test Trainer.compile(...)

@staticmethod
def compile(model: nn.Module, loss: _Loss = None, optimizer: torch.optim = None):
"""
Compile a pytorch model into a pytorch-lightning model and return it.
Copy link
Contributor

@jason-dai jason-dai Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment is incorrect if we also support LightningModule below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review modified docstring. Any further suggestions to properly describe this function?

@zhentaocc
Copy link
Contributor Author

Passed tests. It's ready now.

@yangw1234
Copy link
Contributor

@yangw1234 yangw1234 merged commit f92cf4d into intel:branch-2.0 Nov 11, 2021
dding3 pushed a commit to dding3/BigDL that referenced this pull request Nov 17, 2021
* added decorator to create a pytorch lightning model from torch

* added unit test for pytorch lightning decorator

* refactoring - renaming, adding hints and docstring

* moved lightning extension to nano/pytorch

* remove loss, optim creator and directly pass loss and optimizer to initiate

* added another implementation for pytorch to lightning

* use LightningModuleFromTorch to create lightning module from pytorch

* remove temporary change

* remove redundant part

* added trainer.compile to convert pytorch to pytorch-lightning

* added unit test for trainer.compile

* fixed return when input is pl model

* added type hint for LightningModuleFromTorch.copy

* Renamed copy as _copy

* Modified comment of compile

* added input checking

* refactored docstring

* Reformat docstring

* Tiny changes

* reformat

* correct the import

* type check and

* assign model as a member variable

* override load_state_dict

* fix test_trainer_compile

* fix test_lightning

* try lightning module and then self.model

* rename _forward as forward

* type check

* optimize imports

Co-authored-by: Yang Wang <yang3.wang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Nano: basic pytorch-lightning "wrapper" design

4 participants