Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][Relax][Op] Register gradient for nll_loss and Conv2d #114

Closed

Conversation

Ubospica
Copy link
Contributor

No description provided.

@Ubospica Ubospica changed the title Register gradient for nll_loss [WIP][Relax][Op] Register gradient for nll_loss Jan 27, 2023
@Ubospica Ubospica changed the title [WIP][Relax][Op] Register gradient for nll_loss [WIP][Relax][Op] Register gradient for nll_loss and Conv2d Jan 28, 2023
@Ubospica Ubospica force-pushed the mlc-dev/2023-01-26-conv2d_nllloss_gradient branch 2 times, most recently from c987f75 to 66ef91a Compare January 29, 2023 17:15
sunggg and others added 23 commits February 5, 2023 15:14
* [WIP] Basic task extraction mechanism is implemented.

* [WIP] For gradual integration with Relay pipeline, meta_schedule/integration.py is created for relax to avoid potential conflict.

* support tir tuning and injection mode

* Add target field for Relax Extracted Task

* 1. Create relax namespace/tvm objects/... for metaschedule to preserve relay support. 2. Promote target field from Optional<Target> to Target

* Support ApplyHistoryBest

* Reflect feedback from Yuchen

* minor improvement and fix linter issue

* add ASF header

* Reorganize file structure

* fix lint errors

* remove the import-outside-toplevel

* Reflect comments

* remove redundant comment

* As per discussion w/ Yuchen, ApplyHistoryBest is introduced as a Relax transformation pass.

* remove redundant print msg

* fix lint

* reflect comments
* Enable tests.

* Updated.

* Updated.

* Updated.
…er (mlc-ai#76)

* [CI] Set up CI; format and lint relax code to pass CI (mlc-ai#72)

* init

* fix lint

* update task_lint

* more lint

* more lint

* lint

* jenkinsfile

* jenkinsfile

* run relax only tests

* python3.7 for pytest

* point to personal ci-cpu docker

* docker pull

* test

* fix cmake config

* update

* update

* rebase

* rebase

* AutoTIR integration (mlc-ai#58)

* [WIP] Basic task extraction mechanism is implemented.

* [WIP] For gradual integration with Relay pipeline, meta_schedule/integration.py is created for relax to avoid potential conflict.

* support tir tuning and injection mode

* Add target field for Relax Extracted Task

* 1. Create relax namespace/tvm objects/... for metaschedule to preserve relay support. 2. Promote target field from Optional<Target> to Target

* Support ApplyHistoryBest

* Reflect feedback from Yuchen

* minor improvement and fix linter issue

* add ASF header

* Reorganize file structure

* fix lint errors

* remove the import-outside-toplevel

* Reflect comments

* remove redundant comment

* As per discussion w/ Yuchen, ApplyHistoryBest is introduced as a Relax transformation pass.

* remove redundant print msg

* fix lint

* reflect comments

* Yuchen's change

* relax ConstantNode in parser and printer

* Add constant data in the metasection

* rebase

* Support ir_module(metadata=json_str)

* update test case

* remove print info

* Update tests

* clang-format

* pylint

* fix ci

* Save a copy of metadata in RelaxTransformer

* Fix comments

* fix comments

Co-authored-by: Yuchen Jin <yuchenj@cs.washington.edu>
Co-authored-by: Sunghyun Park <49998730+sunggg@users.noreply.github.com>
Co-authored-by: Prakalp Srivastava <prakalp@octoml.ai>
* Clean up taske extraction

* black
* Change call_tir convention and fix shape/type deduction.

* test

* output shape as 3rd arg.

* address comments.

* lint
Enhance VM Executable as a Subclass of runtime::Module
* [VM] Refactor and improve vm.

- Have a separate function for RunInstCall.
- Cache func_index lookup by table to avoid repeative lookup by str.
- Move PackedFunc call arg stack to Frame to increase locality and avoid re-allocation in repeative calls.
- Make frame stack of unique_ptr to avoid frame re-allocation and copy during frame.resize.
- Pass curr_frame as arguments into sub-functions to make it explicit.

* address review comments
* improve Printer for DynTensorType & ShapeExpr

* add testcases
* Add is_device field to attr.

* Update.

* Address comment.

* update.

* Update.
* Fix call_tir parsing bug.

* update.
* fix structural_equal_hash

(cherry picked from commit e7e962634999739a32129378f61cc95f58335447)

* address comment & pass the ci
The pattern field of the match shape can define variables,
as a result, we need to add DefEqual and Hash here.

Added a regression testcase.

Lesson: we would benefit from more testcases
with check_save_roundtrip checks(like this one) for more relax example.

Additional change:
- Redirected TVMScript printer to be able to print relax fragements useful for debugging.
* Add gpu ci.

* Update autotir gpu test.
SiriusNEO and others added 16 commits February 8, 2023 09:41
TOPI has an implementation of `collapse_sum` internally
(`tvm/topi/reduction.h`) but it is not exposed to FFI and can not be
called in Python side. This patch exposes it and adds some related
tests.
Besides, now legalizer can legalize `collapse_sum_like/to`! But due to
the TOPI implementation, it can't handle the symbolic case.
This PR migrates mlc-ai#46 to new struct
info infra, as part of our AD migration.

Because we need do numerical testing for gradients, this PR depends on
the operator legalizer mlc-ai#96. Also
because the original version of legalizer did not handle the negative
indexing case of `relax.mean`, this PR fixes it.

To lower `collapse_sum_to`, `collapse_sum_like` properly, this PR
migrates a previous patch mlc-ai#43 which
introduces `collapse_sum` in topi. Now we can remove the skip marker in
the legalizer test for `collapse_sum_to` and `collapse_sum_like`.

The gradients of `cross_entropy` and `softmax_cross_entropy` are
removed. And the former will be added back and adjust to new
`cross_entropy` introduced in mlc-ai#96.

Further plan in this PR:
- [x] Add gradients for `log_softmax` and `nll_loss` once
mlc-ai#94 is merged.
- [x] Gradients for some tuple related operators such as `split` and
`concat`. It can help us to test the correctness of AD when there are
Tuple-I/O operators.
- (Not in this PR) "Undefined Gradient" representation. As we know, the
gradients of some operators w.r.t. specified inputs are undefined or
meaningless, such as the partial gradient of `indices` in `take(x,
indices)`. Relay directly uses `zeros_like` in this case as it won't
affect gradient propagation. Another choice is to introduce a dummy Expr
named `UndefinedGradient` to represent it. How do we handle this case in
relax?
Create a separate yaml file as in mlc-ai#104 for CI on MLC relax to ease
future sync with tlc-pack/relax.

Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
This is the PR following mlc-ai#55 after source branch moved to personal repo.

This PR is based on mlc-ai#98.

This PR adds the new automatic differentiation API:
- `Gradient(func: GlobalVar, require_grads: Optional[Union[Var,
List[Var]]] = None) -> tvm.ir.transform.Pass`
- transforms the given funcion in the IRModule, and adds a new function
that calculates the gradient with regard to the function's output

Now Gradient only supports differentiating a function in the IRModule
with one dataflow block with respect to the only return value of the
function, which needs to be scalar.

This PR writes two files for unit test:
- `tests/python/relax/test_transform_gradient.py` only contains
`assert_structural_equal` assertions.
- `tests/python/relax/test_transform_gradient_numeric.py` contains
numeric checks, including manually derived gradients and the numerical
differentiation method `check_numerical_grads`.

Checkpoints:
- [x] Refactor to use CopyWithNewParams and ExprFunctor
- [x] Check int64/int32 tensors should not be differentiated (now only
check in params)
- [x] Rebase & migrate to StructInfo
- [x] Refactor about Tuple
- [x] Refactor about NestedMsg
- [x] Support ops taking in tuple or returning tuple
- [x] Eliminating collapse_sum_to (done in mlc-ai#98)

Future:
- (Not in this PR) Handle undefined gradient in add and return value
	- Now we handle them as zeros

Co-authored-by: SiriusNEO <1713833595@qq.com>
Implements the layout conversion pass.
This PR implements the library dispatcher for Relax, which currently uses CUTLASS as one library.
It introduces the TIR-level pattern registration and matching algorithm.
It introduces a Relax pass to split out subgraphs that match the patterns of backends.
This PR includes:
- Introduce `R.abs` and its legalization (For L1Loss)
- Registering most of the unary operators in
[DataAPI](https://data-apis.org/array-api/draft/API_specification/elementwise_functions.html)
(Without legalization)
- Split unary arith oprators and check operators (e.g. `isnan`)
- Refactor `test_tvmscript_parser`, `test_op_unary` and `test_op_binary`
using `tvm.testing.parameters()`.
Implements the Relax importer from PyTorch, using torch FX.

An example use of the importer is:

```python
# Import the importer.
from tvm.relax.frontend import from_pytorch

# Define the module
class MyModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = torch.nn.Linear(in_features=10, out_features=7, bias=True)

    def forward(self, input):
        return self.linear(input)

# Instantiate the model and create the input info dict.
torch_model = MyModule()
input_info = {"input_1": ((128, 10), "float32")}

# Use the importer to import the PyTorch model to Relax.
mod: tvm.IRModule = from_pytorch(torch_model, input_info)

# Print out the imported model.
# print(mod.script())
```

---------

Co-authored-by: Ruihang Lai <ruihangl@cs.cmu.edu>
This PR introduces loss functions for relax training and provides a tool
`append_loss` which enables user to append a loss after a forward
function.

About the `append_loss`, some previous discussions can be found in
mlc-ai#111.

Currently support:
-  L1Loss
-  MSELoss
- CrossEntropyLoss
…nvert and CUTLASS dispatch (mlc-ai#116)

Fixes bugs found when importing fp16 UNet & VAE through the importer,
legalized, AMP, layout convert, and CUTLASS codegen.
…i#118)

This PR enables importing MobileNetV2 from PyTorch FX, which including
the following changes:

- Add `relax.clip` Op, with PrimValue support
- Support `torch.clamp` and `torch.ReLU6` in PyTorch FX frontend
- Support `torch.nn.functional.adaptive_avg_pool2d` in PyTorch frontend
As discussed before, we won't introduce a new Linear Op at IR level.
However, this PR adds a user interface at the python side, which is
composed of transpose, matmul and a bias add.

Additionally, PyTorch supports 1D weight Tensor for Linear Op
(https://pytorch.org/docs/stable/generated/torch.nn.functional.linear.html).
So we cannot use `permute_dims(weight, axes=[1, 0])` in the fx importer.
This PR fixes this issue as well.
Update optimizer APIs.
- Remove `@property state` and `@state.setter`
- Add `init()` interface
- Remove `Optimizer.__call__()`
- Remove underscores before attributes, and unnecessary attributes

Current interfaces:
```python
class Optimizer:
    dtype: str
    name: str
    param_list: List[Var]
    state: tvm.runtime.container.ADT

    def __init__(self, name: str) -> None:
        self.name = name
        self.param_list = None
        self.state = None
        self.dtype = None

    def init(self, params: Union[Var, List[Var]]) -> "Optimizer":
        """Set the parameters, determine the dtype, and build the initial state for the optimizer."""
		pass

    def get_function(self) -> Function:
        """Use blockbuilder to build an optimizer function that executes updates of the parameters
        and the optimizer state."""
		pass
```

Use examples:

See
<https://github.com/ACMClass-TVM-20/AD-Example/blob/dc255150dc6a4a6de2fffc2c093a8b2bacc1b030/optimizer_api_example.py>

And also updates Gradient APIs:
- Before: `def Gradient(global_var: GlobalVar, require_grads:
Optional[Union[Var, List[Var]]]) -> tvm.ir.transform.Pass`
- After: `def Gradient(func_name: str, require_grads:
Optional[Union[Var, List[Var]]]) -> tvm.ir.transform.Pass`

Unit tests are changed accordingly.
This is a prototype of `LiftTransformParams`. It allows to compile the
end-to-end model without weights provided. The idea is annotate the
input parameters that are weights, and identify and lift the
transformations to weights, and compile it to a separate function
`transform_params` that can be executed in runtime. Users can run
`transform_params` with weights to get the weights for the optimized
model as a prep step before the deployment. In this way, we perform the
same optimizations and defer the weight transformations to the user
side, while the overhead of the deferred weight transformation can be
ignored as it only need to be run once.

A demo notebook is available
[here](https://github.com/vinx13/relax/blob/fda857553557c34f97a4f7193a529da607a3421c/tests/python/relax/demo_lift_transform_params.ipynb)

This pass is not integrated yet with the default `vm.build` as we are
going to iterate it.
This PR adds support for torch dyanmo
spectrometerHBH pushed a commit to spectrometerHBH/relax that referenced this pull request Feb 9, 2023
* add DataflowBlockPass

* update fma_rewrite

* drop the skip function

* update test_fma_rewrite with DataflowBlockPass

* fix the format

* fix name

* rewrite test in tvm script

* add non-dataflow Vars check

* add fail testcases

* module->IRModule

* add docstring to DataflowBlockNode

* remove unused pattern

* Transform Pass->DataflowBlock Pass

* rename global var to global scope var

* remove print stmt

* reformat tests

* add docstring to DataflowBlockMutator

* fix filename

* minor fix
SiriusNEO and others added 4 commits February 10, 2023 12:40
This PR brings a wrapper for relax training. The following things are
done internally in this trainer:
- Maintain (store/update) the parameters of the module.
- Merge backbone and specified loss function together.
- Build/Compile/Run the module.
- Build/Compile/Run the optimizer. (using the same vm_config as we run
the module.)

And it also provides two interfaces for loading params/exporting params.

Example:
```
trainer = Trainer(MLP, [1, 2], "main") # [1, 2] means input[1] and input[2] are parameters in this module.
trainer.set_loss(MSELoss(reduction="sum"), pred_sinfo, pred_sinfo)
trainer.set_vm_config(target="llvm")
trainer.set_optimizer(optim_type=SGD, lr=0.001).setup()
trainer.setup()
trainer.rand_init_params()
trainer.forward(*fwd_inputs)
trainer.backward(*bwd_inputs)
```
This PR introduce the pipeline namespace. Which contains
the collection of pre-defined pipelines that optimizes and
lower IRModule before passing to minimum build.
This PR adds a new pass SimplifyNormInference to unpack the norm
operator into a sequence of operators, which is same as the pass
SimplifyInference in Relay
This PR fixes the timeout rule of MetaSchedule RPCRunner.

Prior to this PR, the RPCRunner sets a timeout threshold for jobs
submitted to popen pool. As a result, the jobs are timed since the time
that they are sent to the remote side.

Consider the case where there is only a single device for measurement.
In this case, all jobs can only be executed serially and jobs must queue
up. Therefore, the previous timeout configuration means the time spent
on queueing will also be counted. This causes some jobs, in the worst
cases, gets timeout without even started to execute, and has negative
impacts on RPC MetaSchedule tuning, from the perspectives of both
efficiency and result performance.

Co-authored-by: Bohan Hou
<32121147+spectrometerHBH@users.noreply.github.com>
nll_loss gradient finished

nll loss grad  finished

formatted

nll_loss finished

conv2d gradient finished
test not finished

rename nll_loss_backward

trial

conv2d finished

conv2d finished

tile and repeat finished

conv2d finished

formatted
@Ubospica Ubospica force-pushed the mlc-dev/2023-01-26-conv2d_nllloss_gradient branch from f531042 to e74dc55 Compare February 15, 2023 12:46
@Ubospica
Copy link
Contributor Author

Moved to #130

@Ubospica Ubospica closed this Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.