[WIP][Relax][Op] Register gradient for nll_loss and Conv2d #114

Ubospica · 2023-01-27T20:55:20Z

No description provided.

* [WIP] Basic task extraction mechanism is implemented. * [WIP] For gradual integration with Relay pipeline, meta_schedule/integration.py is created for relax to avoid potential conflict. * support tir tuning and injection mode * Add target field for Relax Extracted Task * 1. Create relax namespace/tvm objects/... for metaschedule to preserve relay support. 2. Promote target field from Optional<Target> to Target * Support ApplyHistoryBest * Reflect feedback from Yuchen * minor improvement and fix linter issue * add ASF header * Reorganize file structure * fix lint errors * remove the import-outside-toplevel * Reflect comments * remove redundant comment * As per discussion w/ Yuchen, ApplyHistoryBest is introduced as a Relax transformation pass. * remove redundant print msg * fix lint * reflect comments

* Enable tests. * Updated. * Updated. * Updated.

…ai#84)

…er (mlc-ai#76) * [CI] Set up CI; format and lint relax code to pass CI (mlc-ai#72) * init * fix lint * update task_lint * more lint * more lint * lint * jenkinsfile * jenkinsfile * run relax only tests * python3.7 for pytest * point to personal ci-cpu docker * docker pull * test * fix cmake config * update * update * rebase * rebase * AutoTIR integration (mlc-ai#58) * [WIP] Basic task extraction mechanism is implemented. * [WIP] For gradual integration with Relay pipeline, meta_schedule/integration.py is created for relax to avoid potential conflict. * support tir tuning and injection mode * Add target field for Relax Extracted Task * 1. Create relax namespace/tvm objects/... for metaschedule to preserve relay support. 2. Promote target field from Optional<Target> to Target * Support ApplyHistoryBest * Reflect feedback from Yuchen * minor improvement and fix linter issue * add ASF header * Reorganize file structure * fix lint errors * remove the import-outside-toplevel * Reflect comments * remove redundant comment * As per discussion w/ Yuchen, ApplyHistoryBest is introduced as a Relax transformation pass. * remove redundant print msg * fix lint * reflect comments * Yuchen's change * relax ConstantNode in parser and printer * Add constant data in the metasection * rebase * Support ir_module(metadata=json_str) * update test case * remove print info * Update tests * clang-format * pylint * fix ci * Save a copy of metadata in RelaxTransformer * Fix comments * fix comments Co-authored-by: Yuchen Jin <yuchenj@cs.washington.edu> Co-authored-by: Sunghyun Park <49998730+sunggg@users.noreply.github.com>

Co-authored-by: Prakalp Srivastava <prakalp@octoml.ai>

* Clean up taske extraction * black

* Change call_tir convention and fix shape/type deduction. * test * output shape as 3rd arg. * address comments. * lint

Enhance VM Executable as a Subclass of runtime::Module

* [VM] Refactor and improve vm. - Have a separate function for RunInstCall. - Cache func_index lookup by table to avoid repeative lookup by str. - Move PackedFunc call arg stack to Frame to increase locality and avoid re-allocation in repeative calls. - Make frame stack of unique_ptr to avoid frame re-allocation and copy during frame.resize. - Pass curr_frame as arguments into sub-functions to make it explicit. * address review comments

* improve Printer for DynTensorType & ShapeExpr * add testcases

* Add is_device field to attr. * Update. * Address comment. * update. * Update.

* Fix call_tir parsing bug. * update.

* fix structural_equal_hash (cherry picked from commit e7e962634999739a32129378f61cc95f58335447) * address comment & pass the ci

The pattern field of the match shape can define variables, as a result, we need to add DefEqual and Hash here. Added a regression testcase. Lesson: we would benefit from more testcases with check_save_roundtrip checks(like this one) for more relax example. Additional change: - Redirected TVMScript printer to be able to print relax fragements useful for debugging.

* Add gpu ci. * Update autotir gpu test.

…-ai#116)

TOPI has an implementation of `collapse_sum` internally (`tvm/topi/reduction.h`) but it is not exposed to FFI and can not be called in Python side. This patch exposes it and adds some related tests. Besides, now legalizer can legalize `collapse_sum_like/to`! But due to the TOPI implementation, it can't handle the symbolic case.

This PR migrates mlc-ai#46 to new struct info infra, as part of our AD migration. Because we need do numerical testing for gradients, this PR depends on the operator legalizer mlc-ai#96. Also because the original version of legalizer did not handle the negative indexing case of `relax.mean`, this PR fixes it. To lower `collapse_sum_to`, `collapse_sum_like` properly, this PR migrates a previous patch mlc-ai#43 which introduces `collapse_sum` in topi. Now we can remove the skip marker in the legalizer test for `collapse_sum_to` and `collapse_sum_like`. The gradients of `cross_entropy` and `softmax_cross_entropy` are removed. And the former will be added back and adjust to new `cross_entropy` introduced in mlc-ai#96. Further plan in this PR: - [x] Add gradients for `log_softmax` and `nll_loss` once mlc-ai#94 is merged. - [x] Gradients for some tuple related operators such as `split` and `concat`. It can help us to test the correctness of AD when there are Tuple-I/O operators. - (Not in this PR) "Undefined Gradient" representation. As we know, the gradients of some operators w.r.t. specified inputs are undefined or meaningless, such as the partial gradient of `indices` in `take(x, indices)`. Relay directly uses `zeros_like` in this case as it won't affect gradient propagation. Another choice is to introduce a dummy Expr named `UndefinedGradient` to represent it. How do we handle this case in relax?

Create a separate yaml file as in mlc-ai#104 for CI on MLC relax to ease future sync with tlc-pack/relax. Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>

This is the PR following mlc-ai#55 after source branch moved to personal repo. This PR is based on mlc-ai#98. This PR adds the new automatic differentiation API: - `Gradient(func: GlobalVar, require_grads: Optional[Union[Var, List[Var]]] = None) -> tvm.ir.transform.Pass` - transforms the given funcion in the IRModule, and adds a new function that calculates the gradient with regard to the function's output Now Gradient only supports differentiating a function in the IRModule with one dataflow block with respect to the only return value of the function, which needs to be scalar. This PR writes two files for unit test: - `tests/python/relax/test_transform_gradient.py` only contains `assert_structural_equal` assertions. - `tests/python/relax/test_transform_gradient_numeric.py` contains numeric checks, including manually derived gradients and the numerical differentiation method `check_numerical_grads`. Checkpoints: - [x] Refactor to use CopyWithNewParams and ExprFunctor - [x] Check int64/int32 tensors should not be differentiated (now only check in params) - [x] Rebase & migrate to StructInfo - [x] Refactor about Tuple - [x] Refactor about NestedMsg - [x] Support ops taking in tuple or returning tuple - [x] Eliminating collapse_sum_to (done in mlc-ai#98) Future: - (Not in this PR) Handle undefined gradient in add and return value - Now we handle them as zeros Co-authored-by: SiriusNEO <1713833595@qq.com>

Implements the layout conversion pass.

This PR implements the library dispatcher for Relax, which currently uses CUTLASS as one library. It introduces the TIR-level pattern registration and matching algorithm. It introduces a Relax pass to split out subgraphs that match the patterns of backends.

This PR includes: - Introduce `R.abs` and its legalization (For L1Loss) - Registering most of the unary operators in [DataAPI](https://data-apis.org/array-api/draft/API_specification/elementwise_functions.html) (Without legalization) - Split unary arith oprators and check operators (e.g. `isnan`) - Refactor `test_tvmscript_parser`, `test_op_unary` and `test_op_binary` using `tvm.testing.parameters()`.

Implements the Relax importer from PyTorch, using torch FX. An example use of the importer is: ```python # Import the importer. from tvm.relax.frontend import from_pytorch # Define the module class MyModule(torch.nn.Module): def __init__(self): super().__init__() self.linear = torch.nn.Linear(in_features=10, out_features=7, bias=True) def forward(self, input): return self.linear(input) # Instantiate the model and create the input info dict. torch_model = MyModule() input_info = {"input_1": ((128, 10), "float32")} # Use the importer to import the PyTorch model to Relax. mod: tvm.IRModule = from_pytorch(torch_model, input_info) # Print out the imported model. # print(mod.script()) ``` --------- Co-authored-by: Ruihang Lai <ruihangl@cs.cmu.edu>

This PR introduces loss functions for relax training and provides a tool `append_loss` which enables user to append a loss after a forward function. About the `append_loss`, some previous discussions can be found in mlc-ai#111. Currently support: - L1Loss - MSELoss - CrossEntropyLoss

…nvert and CUTLASS dispatch (mlc-ai#116) Fixes bugs found when importing fp16 UNet & VAE through the importer, legalized, AMP, layout convert, and CUTLASS codegen.

…i#118) This PR enables importing MobileNetV2 from PyTorch FX, which including the following changes: - Add `relax.clip` Op, with PrimValue support - Support `torch.clamp` and `torch.ReLU6` in PyTorch FX frontend - Support `torch.nn.functional.adaptive_avg_pool2d` in PyTorch frontend

As discussed before, we won't introduce a new Linear Op at IR level. However, this PR adds a user interface at the python side, which is composed of transpose, matmul and a bias add. Additionally, PyTorch supports 1D weight Tensor for Linear Op (https://pytorch.org/docs/stable/generated/torch.nn.functional.linear.html). So we cannot use `permute_dims(weight, axes=[1, 0])` in the fx importer. This PR fixes this issue as well.

Update optimizer APIs. - Remove `@property state` and `@state.setter` - Add `init()` interface - Remove `Optimizer.__call__()` - Remove underscores before attributes, and unnecessary attributes Current interfaces: ```python class Optimizer: dtype: str name: str param_list: List[Var] state: tvm.runtime.container.ADT def __init__(self, name: str) -> None: self.name = name self.param_list = None self.state = None self.dtype = None def init(self, params: Union[Var, List[Var]]) -> "Optimizer": """Set the parameters, determine the dtype, and build the initial state for the optimizer.""" pass def get_function(self) -> Function: """Use blockbuilder to build an optimizer function that executes updates of the parameters and the optimizer state.""" pass ``` Use examples: See <https://github.com/ACMClass-TVM-20/AD-Example/blob/dc255150dc6a4a6de2fffc2c093a8b2bacc1b030/optimizer_api_example.py> And also updates Gradient APIs: - Before: `def Gradient(global_var: GlobalVar, require_grads: Optional[Union[Var, List[Var]]]) -> tvm.ir.transform.Pass` - After: `def Gradient(func_name: str, require_grads: Optional[Union[Var, List[Var]]]) -> tvm.ir.transform.Pass` Unit tests are changed accordingly.

This is a prototype of `LiftTransformParams`. It allows to compile the end-to-end model without weights provided. The idea is annotate the input parameters that are weights, and identify and lift the transformations to weights, and compile it to a separate function `transform_params` that can be executed in runtime. Users can run `transform_params` with weights to get the weights for the optimized model as a prep step before the deployment. In this way, we perform the same optimizations and defer the weight transformations to the user side, while the overhead of the deferred weight transformation can be ignored as it only need to be run once. A demo notebook is available [here](https://github.com/vinx13/relax/blob/fda857553557c34f97a4f7193a529da607a3421c/tests/python/relax/demo_lift_transform_params.ipynb) This pass is not integrated yet with the default `vm.build` as we are going to iterate it.

This PR adds support for torch dyanmo

* add DataflowBlockPass * update fma_rewrite * drop the skip function * update test_fma_rewrite with DataflowBlockPass * fix the format * fix name * rewrite test in tvm script * add non-dataflow Vars check * add fail testcases * module->IRModule * add docstring to DataflowBlockNode * remove unused pattern * Transform Pass->DataflowBlock Pass * rename global var to global scope var * remove print stmt * reformat tests * add docstring to DataflowBlockMutator * fix filename * minor fix

This PR brings a wrapper for relax training. The following things are done internally in this trainer: - Maintain (store/update) the parameters of the module. - Merge backbone and specified loss function together. - Build/Compile/Run the module. - Build/Compile/Run the optimizer. (using the same vm_config as we run the module.) And it also provides two interfaces for loading params/exporting params. Example: ``` trainer = Trainer(MLP, [1, 2], "main") # [1, 2] means input[1] and input[2] are parameters in this module. trainer.set_loss(MSELoss(reduction="sum"), pred_sinfo, pred_sinfo) trainer.set_vm_config(target="llvm") trainer.set_optimizer(optim_type=SGD, lr=0.001).setup() trainer.setup() trainer.rand_init_params() trainer.forward(*fwd_inputs) trainer.backward(*bwd_inputs) ```

This PR introduce the pipeline namespace. Which contains the collection of pre-defined pipelines that optimizes and lower IRModule before passing to minimum build.

This PR adds a new pass SimplifyNormInference to unpack the norm operator into a sequence of operators, which is same as the pass SimplifyInference in Relay

This PR fixes the timeout rule of MetaSchedule RPCRunner. Prior to this PR, the RPCRunner sets a timeout threshold for jobs submitted to popen pool. As a result, the jobs are timed since the time that they are sent to the remote side. Consider the case where there is only a single device for measurement. In this case, all jobs can only be executed serially and jobs must queue up. Therefore, the previous timeout configuration means the time spent on queueing will also be counted. This causes some jobs, in the worst cases, gets timeout without even started to execute, and has negative impacts on RPC MetaSchedule tuning, from the perspectives of both efficiency and result performance. Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>

nll_loss gradient finished nll loss grad finished formatted nll_loss finished conv2d gradient finished test not finished rename nll_loss_backward trial conv2d finished conv2d finished tile and repeat finished conv2d finished formatted

Ubospica · 2023-02-16T17:39:32Z

Moved to #130

Ubospica changed the title ~~Register gradient for nll_loss~~ [WIP][Relax][Op] Register gradient for nll_loss Jan 27, 2023

Ubospica changed the title ~~[WIP][Relax][Op] Register gradient for nll_loss~~ [WIP][Relax][Op] Register gradient for nll_loss and Conv2d Jan 28, 2023

Ubospica mentioned this pull request Jan 28, 2023

[Tracking Issue] Relax training M0 migration and polishment #97

Closed

20 tasks

MasterJH5574 force-pushed the relax branch from d8e982f to faf4672 Compare January 28, 2023 20:27

Ubospica force-pushed the mlc-dev/2023-01-26-conv2d_nllloss_gradient branch 2 times, most recently from c987f75 to 66ef91a Compare January 29, 2023 17:15

MasterJH5574 force-pushed the relax branch from 38ac587 to 79de148 Compare January 31, 2023 22:07

sunggg and others added 23 commits February 5, 2023 15:14

Bug fix; print ShapeExpr (mlc-ai#82)

7880c8c

[TESTS] Enable Tests (mlc-ai#78)

717a864

* Enable tests. * Updated. * Updated. * Updated.

Make offset type specific to avoid errors on non-linux systems. (mlc-…

3104daf

…ai#84)

Rebase.

8ca2b5f

[Bugfix] Fix bb multi-function creation bug (mlc-ai#86)

a1c95bf

Fix bug in relax.vm.build to pass target argument. (mlc-ai#91)

20dc5cd

Co-authored-by: Prakalp Srivastava <prakalp@octoml.ai>

Clean up task extraction (mlc-ai#92)

2da0a93

* Clean up taske extraction * black

Change call_tir convention; Unify shape/type deduction rule (mlc-ai#94)

7cc87c9

* Change call_tir convention and fix shape/type deduction. * test * output shape as 3rd arg. * address comments. * lint

[VM] Enhance VM Executable as a Subclass of runtime::Module (mlc-ai#95)

62075ca

Enhance VM Executable as a Subclass of runtime::Module

[VM][Refactor] Move VM files to TVM runtime directory (mlc-ai#98)

a03da36

Improve printer for DynTensorType and ShapeExpr (mlc-ai#97)

679fb79

* improve Printer for DynTensorType & ShapeExpr * add testcases

Fix after rebase

2766afa

[VM] Initialize VM through packed function (mlc-ai#101)

1ce5b18

[VM] Fix hardcoded device type in memory lowering (mlc-ai#106)

67b7de1

* Add is_device field to attr. * Update. * Address comment. * update. * Update.

[Bugfix] Fix call_tir parsing bug (mlc-ai#109)

1a151aa

* Fix call_tir parsing bug. * update.

[FIX] fix structural_equal_hash (mlc-ai#107)

68f1480

* fix structural_equal_hash (cherry picked from commit e7e962634999739a32129378f61cc95f58335447) * address comment & pass the ci

introduce blockbuilder call_te (mlc-ai#110)

1a70cbe

[CI] Enable GPU tests; Add AutoTIR cuda test. (mlc-ai#115)

ea2cddb

* Add gpu ci. * Update autotir gpu test.

[BlockBuilder] Deduce and fill shape/type for Expr in Normalize. (mlc…

4546ef4

…-ai#116)

SiriusNEO and others added 16 commits February 8, 2023 09:41

[MLC][CI] Try to setup win mac building CI (mlc-ai#106)

a6e234b

Create a separate yaml file as in mlc-ai#104 for CI on MLC relax to ease future sync with tlc-pack/relax. Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>

[Relax][Pass] Layout Conversion (mlc-ai#84)

bab4c4f

Implements the layout conversion pass.

[FIX] Add check for function call in legalizer (mlc-ai#110)

b23b52a

[Relax][Training] Optimizer API (mlc-ai#107)

f70314e

[Relax][Vertical] Fix bugs of importing fp16 PT model, AMP, Layout Co…

c763e8e

…nvert and CUTLASS dispatch (mlc-ai#116) Fixes bugs found when importing fp16 UNet & VAE through the importer, legalized, AMP, layout convert, and CUTLASS codegen.

MasterJH5574 force-pushed the relax branch from 5746118 to f1ab5e8 Compare February 8, 2023 15:31

[Relax][Frontend] dynamo backend (mlc-ai#117)

9dab12c

This PR adds support for torch dyanmo

SiriusNEO and others added 4 commits February 10, 2023 12:40

[Pipeline] Introduce pipeline (mlc-ai#123)

fb4a5ad

This PR introduce the pipeline namespace. Which contains the collection of pre-defined pipelines that optimizes and lower IRModule before passing to minimum build.

[Pass] SimplifyNormInference (mlc-ai#120)

40bce00

This PR adds a new pass SimplifyNormInference to unpack the norm operator into a sequence of operators, which is same as the pass SimplifyInference in Relay

Ubospica force-pushed the mlc-dev/2023-01-26-conv2d_nllloss_gradient branch from 4792bc7 to 2d6b98b Compare February 12, 2023 11:02

MasterJH5574 force-pushed the relax branch from 456469a to 788cbcb Compare February 12, 2023 20:44

echuraev mentioned this pull request Feb 14, 2023

[Bug] CrossEntropyLoss function doesn't work for training. #127

Closed

fix nll_loss

e74dc55

nll_loss gradient finished nll loss grad finished formatted nll_loss finished conv2d gradient finished test not finished rename nll_loss_backward trial conv2d finished conv2d finished tile and repeat finished conv2d finished formatted

Ubospica force-pushed the mlc-dev/2023-01-26-conv2d_nllloss_gradient branch from f531042 to e74dc55 Compare February 15, 2023 12:46

Ubospica closed this Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][Relax][Op] Register gradient for nll_loss and Conv2d #114

[WIP][Relax][Op] Register gradient for nll_loss and Conv2d #114

Ubospica commented Jan 27, 2023

Ubospica commented Feb 16, 2023

[WIP][Relax][Op] Register gradient for nll_loss and Conv2d #114

[WIP][Relax][Op] Register gradient for nll_loss and Conv2d #114

Conversation

Ubospica commented Jan 27, 2023

Ubospica commented Feb 16, 2023