Add pytorch-lightning decorator to nano #3181

zhentaocc · 2021-10-21T06:24:39Z

Resolved 1st task in #3171.

Add a base class to convert pytorch model to pytorch-lightning.
Add Trainer.compile for user to use.
Add a unit tests.

python/chronos/test/bigdl/chronos/model/test_pytorch_lightning_wrapper.py

TheaperDeng · 2021-10-22T09:02:50Z

python/chronos/src/bigdl/chronos/model/pytorch_lightning_wrapper.py

+    ):
+        r"""
+        Create an instance from torch.utils.data.Dataset.
+        Override pl.LightningDataModule.from_datasets for cpu usage, setting pin_memory as False by default.


That's strange, I wonder why.

TheaperDeng · 2021-10-22T09:21:50Z

python/chronos/src/bigdl/chronos/model/pytorch_lightning_wrapper.py

+from torch.utils.data import DataLoader, Dataset, IterableDataset
+
+
+class LightningModuleWrapper(pl.LightningModule):


We have planned to put LightningModuleWrapper to nano.

TheaperDeng · 2021-10-25T01:01:30Z

python/chronos/src/bigdl/chronos/model/pytorch_lightning_wrapper.py

+
+
+class LightningModuleWrapper(pl.LightningModule):
+    def __init__(self, model_creator, configs: dict):


I think it should be better if we just accept three creator.

model creator

loss creator

optim creator

and a config

zhentaocc · 2021-10-27T02:03:38Z

Refactored pytorch lightning wrapper as a decorator on nn.Module.
Moved the decorator to nano.

zhentaocc · 2021-10-27T02:03:59Z

Resolved 1st task in #3171.

Added a base class to wrap pytorch model as pytorch-lightning.

Added a base class to create pytorch lightning datamodule from dataset and give an entry to modify pin_memory. Default value for pin_memory is False instead of True.

Added a unit test for Lightning Wrapper usage on Vanilla_LSTM_pytorch:

Enabled forward() to handle extra arguments

test on both nano and pl trainer

python/nano/src/bigdl/nano/pytorch/vision/models/lightning_support.py

jason-dai · 2021-10-27T07:06:57Z

python/nano/test/test_models_lightning_support.py

+    return getattr(torch.optim, config.get("optim", "Adam"))(model.parameters(), lr=config.get("lr", 0.001))
+
+
+@lightning_support.lightning_module(loss_creator, optimizer_creator, config)


this is too complex for the user; I think we can simply do something like:

class MyModel(torch.nn.moudule): ... model = MyModel(...) loss = nn.CrossEntropyLoss() opt= optim.Adam(...) trainer.fit(model, loss, opt, train_data, val_data)

If a user (let's say chronos) have a nn.module, and he/she wants to use the training opt in our bigdl.nano trainer, and onnxruntime inference (which is a support on pl lightning module). He can't use this api design to do both of them.

If a user (let's say chronos) have a nn.module, and he/she wants to use the training opt in our bigdl.nano trainer, and onnxruntime inference (which is a support on pl lightning module). He can't use this api design to do both of them.

You may be able to do something like:

loss = nn.CrossEntropyLoss() opt= optim.Adam(...) @pl_module(loss, opt) class MyModel(torch.nn.moudule): ... model = MyModel(...) trainer.fit(model, train_data, val_data)

But how do you plan to add onnxruntime support in this case?

I will do it like this

loss = nn.CrossEntropyLoss() opt= optim.Adam(...) @onnxruntime() @pl_module(loss, opt) class MyModel(torch.nn.moudule): ... model = MyModel(...) trainer.fit(model, train_data, val_data)

as I stated in #3272

Sounds good to me

Sure. I think we may make this PR only focus the decorator support just to make it easier to review and clearer. And we will raise another PR for the trainer API's change.

Which one do you think will have better user experience?

loss = nn.CrossEntropyLoss() opt= optim.Adam(...) @onnxruntime() @pl_module(loss, opt) class MyModel(torch.nn.moudule): ... model = MyModel(...)

or

class MyModel(torch.nn.moudule): ... model = MyModel(...) loss = nn.CrossEntropyLoss() opt= optim.Adam(...) nano_model = trainer.compile(model, loss, opt, onnx=True)

It seems to me that the trainer.compile has a better user experience since you can not create a opt instance before the model instance is built, which leads to some config dict in the decorator case.

Still the method involve a abnormal usage (i.e. trainer.compile). We need to give thorough and detailed user guide and in-code-warning.

btw, we don't need to implement onnx=True parameter in this PR since I have not merged the onnx PR. I will do it later after this PR has been merged.

It seems to me that the trainer.compile has a better user experience since you can not create a opt instance before the model instance is built, which leads to some config dict in the decorator case.

Still the method involve a abnormal usage (i.e. trainer.compile). We need to give thorough and detailed user guide and in-code-warning.

btw, we don't need to implement onnx=True parameter in this PR since I have not merged the onnx PR. I will do it later after this PR has been merged.

What do you mean by abnormal usage? I think the point is

If the user simply has a PyTorch model, he or she can directly use it in nano.pytorch.trainer methods (if the default behavior of fit, test, predict, etc., works for him or her)

If the user needs more complex behavior (e.g., onnxruntime support), he or she needs to explictly convert it to pl_module, and we can provide an API based on either decorator or compile method.

It seems to me that the trainer.compile has a better user experience since you can not create a opt instance before the model instance is built, which leads to some config dict in the decorator case.
Still the method involve a abnormal usage (i.e. trainer.compile). We need to give thorough and detailed user guide and in-code-warning.
btw, we don't need to implement onnx=True parameter in this PR since I have not merged the onnx PR. I will do it later after this PR has been merged.

What do you mean by abnormal usage? I think the point is

If the user simply has a PyTorch model, he or she can directly use it in nano.pytorch.trainer methods (if the default behavior of fit, test, predict, etc., works for him or her)

If the user needs more complex behavior (e.g., onnxruntime support), he or she needs to explictly convert it to pl_module, and we can provide an API based on either decorator or compile method.

I think we can just ask all users with a nn.module to do this:

model = Net() model_pl = trainer.compile(model, loss, opt, onnx=bool) # then use model_pl (pl.lightningmodule) to do anything else (e.g. `fit`, `test`, `predict` with trainer)

abnormal usage means that trainer.compile is not a easy name for users to come up with directly, we need examples, quickstarts and user guides to guide them.

It does not have to be trainer.compile; maybe something similar to ray.distribtued, such as nano.pl_module

zhentaocc · 2021-10-28T08:34:53Z

Modifications:

Pass loss, optim directly
Moved the file under pytorch folder
Renamed file, functions and classed
Added docstring

jason-dai · 2021-10-29T12:52:31Z

I don't think we should implementation multiple interfaces for the same use case - it just confuses the users; instead, just provide one single interface for one use case.

zhentaocc · 2021-11-01T01:55:08Z

I don't think we should implementation multiple interfaces for the same use case - it just confuses the users; instead, just provide one single interface for one use case.

Sure, we need to decide which usage is much more common and friendly to users. And at the same time onnxruntime support should be aligned if we choose to wrap a torch model instance instead of a torch class.
But in the other way, I implemented another function which can either be a decorator or a usual function on a torch model instance:

@to_lightning(loss, torch.optim.Adam, lr=0.01)
class Net(nn.Module):
    pass
pl_model = Net()

or

class Net(nn.Module):
    pass
pl_model = to_lightning(loss, torch.optim.Adam, lr=0.01)(Net())

I suppose we may have more extensions coming on pytorchlightning model. Maybe we can wrap all these extensions into one so user don't have to decide which function to use:

def composed(*decs):
    def deco(f):
        for dec in reversed(decs):
            f = dec(f)
        return f
    return deco

def preprocess(loss=None, optimizer=None, config=None, onnx=True):
    return composed(
        onnxruntime(onnx),
        to_lightning(loss, optimizer, **config)
    )
pl_model = preprocess(loss, torch.optim.Adam, {"lr": 0.01}, onnx=True)(model)

We can also integrate this function into trainer.compile().
What's your thoughts on this, providing a unified interface or just choose one from it?

jason-dai · 2021-11-01T13:43:29Z

I don't think we should implementation multiple interfaces for the same use case - it just confuses the users; instead, just provide one single interface for one use case.

Sure, we need to decide which usage is much more common and friendly to users. And at the same time onnxruntime support should be aligned if we choose to wrap a torch model instance instead of a torch class. But in the other way, I implemented another function which can either be a decorator or a usual function on a torch model instance:
@to_lightning(loss, torch.optim.Adam, lr=0.01)
class Net(nn.Module):
    pass
pl_model = Net()
or
class Net(nn.Module):
    pass
pl_model = to_lightning(loss, torch.optim.Adam, lr=0.01(Net())
I suppose we may have more extensions coming on pytorchlightning model. Maybe we can wrap all these extensions into one so user don't have to decide which function to use:
def composed(*decs):
    def deco(f):
        for dec in reversed(decs):
            f = dec(f)
        return f
    return deco

def preprocess(loss=None, optimizer=None, config=None, onnx=True):
    return composed(
        onnxruntime(onnx),
        to_lightning(loss, optimizer, **config)
    )
pl_model = preprocess(loss, torch.optim.Adam, {"lr": 0.01}, onnx=True)(model)
We can also integrate this function into trainer.compile(). What's your thoughts on this, providing a unified interface or just choose one from it?

pl_model = to_lightning(loss, torch.optim.Adam, lr=0.01(Net()) - is this correct? I think we want to have something similar to ray.remote, but support below cases:

Class definition

Class Net(torch.nn.module):
    ...
model = Net()

Model instance
```
model = torch.vision.ResNet18()
```

Model creator function

def model_creator():
   model = torch.vision.ResNet18()
   return model

Need a consistent API for all these use cases.

zhentaocc · 2021-11-02T08:31:39Z

Having discussed with @TheaperDeng @yangw1234 , currently we have 3 solutions:

trainer.compile(model, loss, optimizer) and return a lightning module
trainer.compile(model, loss, optimizer) and return a lightning module. With extra flag to track if trainer.compile is called or not, we forced user to use this function before fit().
Bind model to trainer, convert pytorch model to lightning when trainer is initiated and no need to pass model to fit().

We prefer option 2 to make the conversion as a fixed process and giving proper warning and error to inform users of the correct usage. Finally user need to do:
create model --> create loss,optim --> pl_model=compile(model, loss, optim) -> fit(pl_model, ...)
And this can cover all 3 use cases mentioned above.

The name of this trainer.compile(model, loss, optimizer) is to be decided so that no confusion to users.

@jason-dai What's your thoughts?

jason-dai · 2021-11-02T09:21:29Z

Having discussed with @TheaperDeng @yangw1234 , currently we have 3 solutions:

trainer.compile(model, loss, optimizer) and return a lightning module

trainer.compile(model, loss, optimizer) and return a lightning module. With extra flag to track if trainer.compile is called or not, we forced user to use this function before fit().

Bind model to trainer, convert pytorch model to lightning when trainer is initiated and no need to pass model to fit().

We prefer option 2 to make the conversion as a fixed process and giving proper warning and error to inform users of the correct usage. Finally user need to do: create model --> create loss,optim --> pl_model=compile(model, loss, optim) -> fit(pl_model, ...) And this can cover all 3 use cases mentioned above.

The name of this trainer.compile(model, loss, optimizer) is to be decided so that no confusion to users.

@jason-dai What's your thoughts?

I think we only need compile for PyTorch model; otherwise the user needs to change his or her PL code.

zhentaocc · 2021-11-02T10:20:02Z

Having discussed with @TheaperDeng @yangw1234 , currently we have 3 solutions:

trainer.compile(model, loss, optimizer) and return a lightning module

trainer.compile(model, loss, optimizer) and return a lightning module. With extra flag to track if trainer.compile is called or not, we forced user to use this function before fit().

Bind model to trainer, convert pytorch model to lightning when trainer is initiated and no need to pass model to fit().

We prefer option 2 to make the conversion as a fixed process and giving proper warning and error to inform users of the correct usage. Finally user need to do: create model --> create loss,optim --> pl_model=compile(model, loss, optim) -> fit(pl_model, ...) And this can cover all 3 use cases mentioned above.
The name of this trainer.compile(model, loss, optimizer) is to be decided so that no confusion to users.
@jason-dai What's your thoughts?

I think we only need compile for PyTorch model; otherwise the user needs to change his or her PL code.

The main reason why we want to make compile a necessary step is that considering adding other extensions on Lightning (e.g. onnxruntime support, maybe following we'll have other extensions to add), integrating all these to compile relieves users from calling many interfaces to use different extensions. Even though user has already created a lightning model, compile can still do nothing and return the model.

pl_model = trainer.compile(torch.vision.ResNet18(),  loss, optimizer, ...)
pl_model = trainer.compile(LightningModule(...)) # This does nothing and returns LightningModule(...)

Above are all legal for users to use so they don't have to change their lightning code. By this way, they can be warned to use complie if they need any extra extension in nano as well as converting to a pl model.

jason-dai · 2021-11-02T13:43:09Z

Having discussed with @TheaperDeng @yangw1234 , currently we have 3 solutions:

trainer.compile(model, loss, optimizer) and return a lightning module

trainer.compile(model, loss, optimizer) and return a lightning module. With extra flag to track if trainer.compile is called or not, we forced user to use this function before fit().

Bind model to trainer, convert pytorch model to lightning when trainer is initiated and no need to pass model to fit().

We prefer option 2 to make the conversion as a fixed process and giving proper warning and error to inform users of the correct usage. Finally user need to do: create model --> create loss,optim --> pl_model=compile(model, loss, optim) -> fit(pl_model, ...) And this can cover all 3 use cases mentioned above.
The name of this trainer.compile(model, loss, optimizer) is to be decided so that no confusion to users.
@jason-dai What's your thoughts?

I think we only need compile for PyTorch model; otherwise the user needs to change his or her PL code.

The main reason why we want to make compile a necessary step is that considering adding other extensions on Lightning (e.g. onnxruntime support, maybe following we'll have other extensions to add), integrating all these to compile relieves users from calling many interfaces to use different extensions. Even though user has already created a lightning model, compile can still do nothing and return the model.
pl_model = trainer.compile(torch.vision.ResNet18(),  loss, optimizer, ...)
pl_model = trainer.compile(LightningModule(...)) # This does nothing and returns LightningModule(...)
Above are all legal for users to use so they don't have to change their lightning code. By this way, they can be warned to use complie if they need any extra extension in nano as well as converting to a pl model.

It's OK to use compile to add nano-specific extensions; however, the user should not be required to call compile if he or she just has standard PL code and wants to use nano for transparent acceleration.

zhentaocc · 2021-11-03T05:55:45Z

As discussed above, make modifications:

Refactored to_lightning to class LightningModuleFromTorch
Added unit test to test LightningModuleFromTorch
Added static method Trainer.compile(...) since self is not used
Added unit test to test Trainer.compile(...)

python/nano/src/bigdl/nano/pytorch/trainer/Trainer.py

python/nano/src/bigdl/nano/pytorch/lightning.py

jason-dai · 2021-11-03T13:05:17Z

python/nano/src/bigdl/nano/pytorch/trainer/Trainer.py

+    @staticmethod
+    def compile(model: nn.Module, loss: _Loss = None, optimizer: torch.optim = None):
+        """
+        Compile a pytorch model into a pytorch-lightning model and return it.


the comment is incorrect if we also support LightningModule below

Please review modified docstring. Any further suggestions to properly describe this function?

python/nano/src/bigdl/nano/pytorch/trainer/Trainer.py

zhentaocc · 2021-11-05T01:37:41Z

Passed tests. It's ready now.

…itiate

yangw1234 · 2021-11-11T12:48:34Z

http://10.112.231.51:18888/job/ZOO-PR-NanoTests/74/

* added decorator to create a pytorch lightning model from torch * added unit test for pytorch lightning decorator * refactoring - renaming, adding hints and docstring * moved lightning extension to nano/pytorch * remove loss, optim creator and directly pass loss and optimizer to initiate * added another implementation for pytorch to lightning * use LightningModuleFromTorch to create lightning module from pytorch * remove temporary change * remove redundant part * added trainer.compile to convert pytorch to pytorch-lightning * added unit test for trainer.compile * fixed return when input is pl model * added type hint for LightningModuleFromTorch.copy * Renamed copy as _copy * Modified comment of compile * added input checking * refactored docstring * Reformat docstring * Tiny changes * reformat * correct the import * type check and * assign model as a member variable * override load_state_dict * fix test_trainer_compile * fix test_lightning * try lightning module and then self.model * rename _forward as forward * type check * optimize imports Co-authored-by: Yang Wang <yang3.wang@intel.com>

TheaperDeng added the Chronos label Oct 21, 2021

TheaperDeng requested review from TheaperDeng, shane-huang and shanyu-sys October 21, 2021 06:28

TheaperDeng reviewed Oct 25, 2021

View reviewed changes

zhentaocc closed this Oct 27, 2021

zhentaocc reopened this Oct 27, 2021

zhentaocc changed the title ~~Add Pytorch-Lightning Wrapper~~ Add pytorch-lightning decorator to nano Oct 27, 2021

zhentaocc mentioned this pull request Oct 27, 2021

Nano: basic pytorch-lightning "wrapper" design #3272

Closed

TheaperDeng linked an issue Oct 27, 2021 that may be closed by this pull request

Nano: basic pytorch-lightning "wrapper" design #3272

Closed

jason-dai reviewed Oct 27, 2021

View reviewed changes

python/nano/src/bigdl/nano/pytorch/vision/models/lightning_support.py Outdated Show resolved Hide resolved

jason-dai reviewed Oct 27, 2021

View reviewed changes

python/nano/src/bigdl/nano/pytorch/vision/models/lightning_support.py Outdated Show resolved Hide resolved

jason-dai reviewed Oct 27, 2021

View reviewed changes

python/nano/src/bigdl/nano/pytorch/vision/models/lightning_support.py Outdated Show resolved Hide resolved

jason-dai reviewed Oct 27, 2021

View reviewed changes

zhentaocc requested a review from TheaperDeng October 28, 2021 07:41

TheaperDeng reviewed Nov 3, 2021

View reviewed changes

python/nano/src/bigdl/nano/pytorch/trainer/Trainer.py Outdated Show resolved Hide resolved

python/nano/src/bigdl/nano/pytorch/trainer/Trainer.py Show resolved Hide resolved

python/nano/src/bigdl/nano/pytorch/lightning.py Outdated Show resolved Hide resolved

jason-dai reviewed Nov 3, 2021

View reviewed changes

python/nano/src/bigdl/nano/pytorch/trainer/Trainer.py Show resolved Hide resolved

Chen, Zhentao and others added 27 commits November 11, 2021 14:45

remove loss, optim creator and directly pass loss and optimizer to in…

6580715

…itiate

added another implementation for pytorch to lightning

06f4b88

use LightningModuleFromTorch to create lightning module from pytorch

ff2f028

remove temporary change

aae8fe9

remove redundant part

6af7535

added trainer.compile to convert pytorch to pytorch-lightning

c02c13e

added unit test for trainer.compile

4b930f2

fixed return when input is pl model

6aa2dc1

added type hint for LightningModuleFromTorch.copy

d5bb5c1

Renamed copy as _copy

a893aa3

Modified comment of compile

c6fb693

added input checking

596e4e6

refactored docstring

db4466b

Reformat docstring

90eabc7

Tiny changes

153c1a2

reformat

c642dc9

correct the import

1caa2f1

type check and

f38dff2

assign model as a member variable

9dbd8f0

override load_state_dict

293e54a

fix test_trainer_compile

d3c20d5

fix test_lightning

6fabec3

try lightning module and then self.model

d50e403

rename _forward as forward

6646454

type check

3dc152d

optimize imports

36b8b6c

Merge branch 'branch-2.0' into pytorch_lightning_wrapper

daa9b27

yangw1234 merged commit f92cf4d into intel:branch-2.0 Nov 11, 2021

		from torch.utils.data import DataLoader, Dataset, IterableDataset


		class LightningModuleWrapper(pl.LightningModule):



		class LightningModuleWrapper(pl.LightningModule):
		def __init__(self, model_creator, configs: dict):

		return getattr(torch.optim, config.get("optim", "Adam"))(model.parameters(), lr=config.get("lr", 0.001))


		@lightning_support.lightning_module(loss_creator, optimizer_creator, config)

Add pytorch-lightning decorator to nano #3181

Add pytorch-lightning decorator to nano #3181

Uh oh!

Conversation

zhentaocc commented Oct 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhentaocc commented Oct 27, 2021

Uh oh!

zhentaocc commented Oct 27, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jason-dai Oct 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheaperDeng Nov 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheaperDeng Nov 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhentaocc commented Oct 28, 2021

Uh oh!

jason-dai commented Oct 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhentaocc commented Nov 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jason-dai commented Nov 1, 2021

Uh oh!

zhentaocc commented Nov 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jason-dai commented Nov 2, 2021

Uh oh!

zhentaocc commented Nov 2, 2021

Uh oh!

jason-dai commented Nov 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhentaocc commented Nov 3, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jason-dai Nov 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhentaocc commented Nov 5, 2021

zhentaocc commented Oct 21, 2021 •

edited

Loading

jason-dai Oct 29, 2021 •

edited

Loading

TheaperDeng Nov 1, 2021 •

edited

Loading

TheaperDeng Nov 1, 2021 •

edited

Loading

jason-dai commented Oct 29, 2021 •

edited

Loading

zhentaocc commented Nov 1, 2021 •

edited

Loading

zhentaocc commented Nov 2, 2021 •

edited

Loading

jason-dai commented Nov 2, 2021 •

edited

Loading

jason-dai Nov 3, 2021 •

edited

Loading