diff --git a/.gitignore b/.gitignore
index 2b7448f159c..3c30a631744 100644
--- a/.gitignore
+++ b/.gitignore
@@ -11,7 +11,6 @@ torch_xla/csrc/version.cpp
 *.so
 
 # Files autogenerated by scripts/generate_code.sh
-torch_xla/csrc/aten_xla_type.h
 torch_xla/csrc/aten_xla_type_default.h
 torch_xla/csrc/aten_xla_type_default.cpp
 
diff --git a/OP_LOWERING_GUIDE.md b/OP_LOWERING_GUIDE.md
index ffe0bcbcecc..3d9d68e234e 100644
--- a/OP_LOWERING_GUIDE.md
+++ b/OP_LOWERING_GUIDE.md
@@ -10,14 +10,13 @@ You should follow the instructions in [here](https://github.com/pytorch/xla/blob
 You can find the definition of the C++ ATen operations in [native_functions.yaml](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/native_functions.yaml). After you build Pytorch/XLA from source, you will also find our default implementation (forward to PyTorch native CPU) in `xla/torch_xla/csrc/aten_xla_type_default.h/cpp`. Pytorch operations can usually be mapped to [PyTorch tensor api](https://pytorch.org/docs/stable/index.html) easily. If that is not the case searching the PyTorch native implementation under [PyTorch repo](https://github.com/pytorch/pytorch) is recommended. The goal is to lower the PyTorch operations into a sequence of XLA operations defined in [here](https://www.tensorflow.org/xla/operation_semantics).
 
 ## File structure
-All file mentioned below lives under the `xla/torch_xla/csrc` folder, with the exception of `xla_native_functions.yaml`
+All file mentioned below lives under the `xla/torch_xla/csrc` folder
 
-1. `xla_native_functions.yaml` contains the list of all operators that are lowered. Each operator name must directly match a pytorch operator listed in [native_functions.yaml](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/native_functions.yaml). This file serves as the interface to adding new xla operators, and is an input to PyTorch's [codegen machinery](https://github.com/pytorch/pytorch/blob/master/tools/codegen/gen_backend_stubs.py). It generates the below 3 files: `aten_xla_type.h`, `aten_xla_type_default.h`, and `aten_xla_type_default.cpp`
-2. `aten_xla_type.h/.cpp` are entry points of PyTorch to the pytorch_xla world. `aten_xla_type.h` is auto-generated through a combination of `xla_native_functions.yaml` and the PyTorch core `native_functions.yaml` file, and contains declarations for kernels that need to be defined in `aten_xla_type.cpp`. The kernels written here need to construct 'XLATensor' using the input `at::Tensor` and other parameters. The resulting `XLATensor` needs to be converted back to the `at::Tensor` before returning to the PyTorch world.
-3. `aten_xla_type_default.h/.cpp` are also auto-generated, and contain our default implementation of the PyTorch operations which simply fall back to the underlying CPU implementation. Functions in here will be used if lowering is not explicitly defined in `xla_native_functions.yaml` + `aten_xla_type.cpp`.
-4. `tensor.h` contains the `XLATensor` declarations. These declarations are one to one mapping of the `at::Tensor` nodes we declared in `aten_xla_type.h`
-5. `tensor_methods.cpp` contains the implementation of `XLATensor node` defined in `tensor.h`. We constructed the corresponding `ir::op` from the parameter’s `ir::Value` and wrapped it inside a `XLATensor`. Ir stands for intermediate representation.
-6. `ops/` directory contains all `ir::ops` declaration and definition. Smaller nodes can be put in `ops/ops.h/.cpp`. More complicated nodes can be put into a separate file. All ops inherit from `ir::ops::Node` and provide a way to lower input `ir::Value` to a sequence of `XlaOp`.
+1. `aten_xla_type_default.h/.cpp` are auto-generated by [this script](https://github.com/pytorch/xla/blob/master/scripts/gen.py) and contain our default implementation of the PyTorch operations. Functions in here will be used if lowering is not explicitly defined in `aten_xla_type.cpp`.
+2. `aten_xla_type.h/.cpp` are entry points of PyTorch to the pytorch_xla world. We need to copy operation declarations from `aten_xla_type_default.h` to here and construct XLATensor using the input `at::Tensor` and other parameters. The resulting `XLATensor` needs to be converted back to the `at::Tensor` before returning to the PyTorch world.
+3. `tensor.h` contains the `XLATensor` declarations. These declarations are one to one mapping of the `at::Tensor` nodes we declared in `aten_xla_type.h`
+4. `tensor_methods.cpp` contains the implementation of `XLATensor node` defined in `tensor.h`. We constructed the corresponding `ir::op` from the parameter’s `ir::Value` and wrapped it inside a `XLATensor`. Ir stands for intermediate representation.
+5. `ops/` directory contains all `ir::ops` declaration and definition. Smaller nodes can be put in `ops/ops.h/.cpp`. More complicated nodes can be put into a separate file. All ops inherit from `ir::ops::Node` and provide a way to lower input `ir::Value` to a sequence of `XlaOp`.
 
 ## Unit Test
 Our CircleCI runs PyTorch native python tests for every change and every day. Those tests will use XLA implementation if we provide a lowering. We usually don’t need to add additional python tests for PyTorch/XLA unless we want to verify some xla behaviors(like dynamic shape) or we skipped the pytorch native test for some reason. The python test should be added to `xla/test/test_operations.py` if it is required. We also need to add CPP tests in `xla/test/cpp/test_aten_xla_tensor.cpp`. This test should call PyTorch c++ API and verify our implementation yields the same result as PyTorch native implementation. We also need to verify if the xla implementation is called when the tensor is a XLA tensor by checking the `aten::op` and `xla::op` counters.
diff --git a/scripts/generate_code.sh b/scripts/generate_code.sh
index 07565fd4192..91ba340dccf 100755
--- a/scripts/generate_code.sh
+++ b/scripts/generate_code.sh
@@ -7,6 +7,9 @@ if [ -z "$PT_INC_DIR" ]; then
   PT_INC_DIR="$PTDIR/build/aten/src/ATen"
 fi
 
-python "$PTDIR/tools/codegen/gen_backend_stubs.py" \
-  --output_dir="$XDIR/torch_xla/csrc" \
-  --source_yaml="xla_native_functions.yaml"\
+python "$CDIR/gen.py" \
+  --gen_class_mode \
+  --output_folder="$XDIR/torch_xla/csrc" \
+  "$XDIR/torch_xla/csrc/aten_xla_type.h" \
+  "$PTDIR/build/aten/src/ATen/RegistrationDeclarations.h" \
+  "$PT_INC_DIR/Functions.h" \
diff --git a/torch_xla/csrc/aten_xla_type.h b/torch_xla/csrc/aten_xla_type.h
new file mode 100644
index 00000000000..41a9aca8eb9
--- /dev/null
+++ b/torch_xla/csrc/aten_xla_type.h
@@ -0,0 +1,1132 @@
+#pragma once
+
+#include <ATen/Tensor.h>
+
+namespace torch_xla {
+
+// Base ATEN Type class where the XLA specific overrides should be defined.
+class AtenXlaType {
+ public:
+  static void InitializeAtenBindings();
+
+  //////////////////////////////////////////////////////////////////////////////
+  // ATEN API ovverrides in alphabetical order.
+  // Note: The C++ signatures must match the ones listed within the following
+  // pytorch folder file:
+  //   torch/csrc/autograd/generated/RegistrationDeclarations.h
+  /////////////////////////////////////////////////////////////////////////////
+  static at::Tensor& __ilshift__(at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor& __ilshift__(at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor& __irshift__(at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor& __irshift__(at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor __lshift__(const at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor __lshift__(const at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor __rshift__(const at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor __rshift__(const at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor _adaptive_avg_pool3d(const at::Tensor& self,
+                                         at::IntArrayRef output_size);
+
+  static at::Tensor _adaptive_avg_pool3d_backward(const at::Tensor& grad_output,
+                                                  const at::Tensor& self);
+
+  static at::Tensor _adaptive_avg_pool2d(const at::Tensor& self,
+                                         at::IntArrayRef output_size);
+
+  static at::Tensor _adaptive_avg_pool2d_backward(const at::Tensor& grad_output,
+                                                  const at::Tensor& self);
+
+  static void _amp_foreach_non_finite_check_and_unscale_(
+      at::TensorList self, at::Tensor& found_inf, const at::Tensor& inv_scale);
+
+  static at::Tensor& _amp_update_scale_(at::Tensor& current_scale,
+                                        at::Tensor& growth_tracker,
+                                        const at::Tensor& found_inf,
+                                        double scale_growth_factor,
+                                        double scale_backoff_factor,
+                                        int64_t growth_interval);
+
+  static at::Tensor _copy_from(const at::Tensor& self, const at::Tensor& dst,
+                               bool non_blocking);
+
+  static at::Tensor& _index_put_impl_(
+      at::Tensor& self, const c10::List<c10::optional<at::Tensor>>& indices,
+      const at::Tensor& values, bool accumulate, bool unsafe);
+
+  static at::Tensor _log_softmax(const at::Tensor& self, int64_t dim,
+                                 bool half_to_float);
+
+  static at::Tensor _log_softmax_backward_data(const at::Tensor& grad_output,
+                                               const at::Tensor& output,
+                                               int64_t dim,
+                                               const at::Tensor& self);
+
+  static std::tuple<at::Tensor, at::Tensor> _pack_padded_sequence(
+      const at::Tensor& input, const at::Tensor& lengths, bool batch_first);
+
+  static at::Tensor _s_where(const at::Tensor& condition,
+                             const at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor _softmax(const at::Tensor& self, int64_t dim,
+                             bool half_to_float);
+
+  static at::Tensor _softmax_backward_data(const at::Tensor& grad_output,
+                                           const at::Tensor& output,
+                                           int64_t dim, const at::Tensor& self);
+
+  static at::Tensor _trilinear(const at::Tensor& i1, const at::Tensor& i2,
+                               const at::Tensor& i3, at::IntArrayRef expand1,
+                               at::IntArrayRef expand2, at::IntArrayRef expand3,
+                               at::IntArrayRef sumdim, int64_t unroll_dim);
+
+  static at::Tensor _unsafe_view(const at::Tensor& self, at::IntArrayRef size);
+
+  static at::Tensor abs(const at::Tensor& self);
+
+  static at::Tensor& abs_(at::Tensor& self);
+
+  static at::Tensor acos(const at::Tensor& self);
+
+  static at::Tensor& acos_(at::Tensor& self);
+
+  static at::Tensor acosh(const at::Tensor& self);
+
+  static at::Tensor& acosh_(at::Tensor& self);
+
+  static at::Tensor add(const at::Tensor& self, const at::Tensor& other,
+                        const at::Scalar& alpha);
+
+  static at::Tensor add(const at::Tensor& self, const at::Scalar& other,
+                        const at::Scalar& alpha);
+
+  static at::Tensor& add_(at::Tensor& self, const at::Tensor& other,
+                          const at::Scalar& alpha);
+
+  static at::Tensor& add_(at::Tensor& self, const at::Scalar& other,
+                          const at::Scalar& alpha);
+
+  static at::Tensor addcdiv(const at::Tensor& self, const at::Tensor& tensor1,
+                            const at::Tensor& tensor2, const at::Scalar& value);
+
+  static at::Tensor& addcdiv_(at::Tensor& self, const at::Tensor& tensor1,
+                              const at::Tensor& tensor2,
+                              const at::Scalar& value);
+
+  static at::Tensor addcmul(const at::Tensor& self, const at::Tensor& tensor1,
+                            const at::Tensor& tensor2, const at::Scalar& value);
+
+  static at::Tensor& addcmul_(at::Tensor& self, const at::Tensor& tensor1,
+                              const at::Tensor& tensor2,
+                              const at::Scalar& value);
+
+  static at::Tensor addmm(const at::Tensor& self, const at::Tensor& mat1,
+                          const at::Tensor& mat2, const at::Scalar& beta,
+                          const at::Scalar& alpha);
+
+  static at::Tensor alias(const at::Tensor& self);
+
+  static at::Tensor all(const at::Tensor& self);
+
+  static at::Tensor all(const at::Tensor& self, int64_t dim, bool keepdim);
+
+  static at::Tensor any(const at::Tensor& self);
+
+  static at::Tensor any(const at::Tensor& self, int64_t dim, bool keepdim);
+
+  static at::Tensor& arange_out(const at::Scalar& start, const at::Scalar& end,
+                                const at::Scalar& step, at::Tensor& out);
+
+  static at::Tensor argmax(const at::Tensor& self, c10::optional<int64_t> dim,
+                           bool keepdim);
+
+  static at::Tensor argmin(const at::Tensor& self, c10::optional<int64_t> dim,
+                           bool keepdim);
+
+  static at::Tensor as_strided(const at::Tensor& self, at::IntArrayRef size,
+                               at::IntArrayRef stride,
+                               c10::optional<int64_t> storage_offset);
+
+  static const at::Tensor& as_strided_(const at::Tensor& self,
+                                       at::IntArrayRef size,
+                                       at::IntArrayRef stride,
+                                       c10::optional<int64_t> storage_offset);
+
+  static at::Tensor asin(const at::Tensor& self);
+
+  static at::Tensor& asin_(at::Tensor& self);
+
+  static at::Tensor asinh(const at::Tensor& self);
+
+  static at::Tensor& asinh_(at::Tensor& self);
+
+  static at::Tensor atan(const at::Tensor& self);
+
+  static at::Tensor atanh(const at::Tensor& self);
+
+  static at::Tensor atan2(const at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor& atan2_(at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor& atan_(at::Tensor& self);
+
+  static at::Tensor& atanh_(at::Tensor& self);
+
+  static at::Tensor avg_pool2d(const at::Tensor& self,
+                               at::IntArrayRef kernel_size,
+                               at::IntArrayRef stride, at::IntArrayRef padding,
+                               bool ceil_mode, bool count_include_pad,
+                               c10::optional<int64_t> divisor_override);
+
+  static at::Tensor avg_pool2d_backward(
+      const at::Tensor& grad_output, const at::Tensor& self,
+      at::IntArrayRef kernel_size, at::IntArrayRef stride,
+      at::IntArrayRef padding, bool ceil_mode, bool count_include_pad,
+      c10::optional<int64_t> divisor_override);
+
+  static at::Tensor avg_pool3d(const at::Tensor& self,
+                               at::IntArrayRef kernel_size,
+                               at::IntArrayRef stride, at::IntArrayRef padding,
+                               bool ceil_mode, bool count_include_pad,
+                               c10::optional<int64_t> divisor_override);
+
+  static at::Tensor avg_pool3d_backward(
+      const at::Tensor& grad_output, const at::Tensor& self,
+      at::IntArrayRef kernel_size, at::IntArrayRef stride,
+      at::IntArrayRef padding, bool ceil_mode, bool count_include_pad,
+      c10::optional<int64_t> divisor_override);
+
+  static at::Tensor baddbmm(const at::Tensor& self, const at::Tensor& batch1,
+                            const at::Tensor& batch2, const at::Scalar& beta,
+                            const at::Scalar& alpha);
+  static at::Tensor& baddbmm_(at::Tensor& self, const at::Tensor& batch1,
+                              const at::Tensor& batch2, const at::Scalar& beta,
+                              const at::Scalar& alpha);
+
+  static at::Tensor bernoulli(const at::Tensor& self,
+                              c10::optional<at::Generator> generator);
+  static at::Tensor& bernoulli_(at::Tensor& self, double p,
+                                c10::optional<at::Generator> generator);
+  static at::Tensor& bernoulli_(at::Tensor& self, const at::Tensor& p,
+                                c10::optional<at::Generator> generator);
+
+  // binary_cross_entropy is still in PyTorch TH legacy, which means both are
+  // overrideable for XLA. But overriding one set is sufficient. Currently
+  // binary_cross_entropy and its backward are used instead of the _out version.
+  // When it's moved to Aten in the future, we should only keep one set here.
+  static at::Tensor binary_cross_entropy(
+      const at::Tensor& self, const at::Tensor& target,
+      const c10::optional<at::Tensor>& weight, int64_t reduction);
+  static at::Tensor binary_cross_entropy_backward(
+      const at::Tensor& grad_output, const at::Tensor& self,
+      const at::Tensor& target, const c10::optional<at::Tensor>& weight,
+      int64_t reduction);
+
+  static at::Tensor binary_cross_entropy_with_logits(
+      const at::Tensor& self, const at::Tensor& target,
+      const c10::optional<at::Tensor>& weight,
+      const c10::optional<at::Tensor>& pos_weight, int64_t reduction);
+
+  static at::Tensor& bitwise_and_out(const at::Tensor& self,
+                                     const at::Tensor& other, at::Tensor& out);
+
+  static at::Tensor& bitwise_and_out(const at::Tensor& self,
+                                     const at::Scalar& other, at::Tensor& out);
+
+  static at::Tensor& bitwise_not_out(const at::Tensor& self, at::Tensor& out);
+
+  static at::Tensor& bitwise_or_out(const at::Tensor& self,
+                                    const at::Tensor& other, at::Tensor& out);
+
+  static at::Tensor& bitwise_or_out(const at::Tensor& self,
+                                    const at::Scalar& other, at::Tensor& out);
+
+  static at::Tensor& bitwise_xor_out(const at::Tensor& self,
+                                     const at::Scalar& other, at::Tensor& out);
+
+  static at::Tensor& bitwise_xor_out(const at::Tensor& self,
+                                     const at::Tensor& other, at::Tensor& out);
+
+  static at::Tensor bmm(const at::Tensor& self, const at::Tensor& mat2);
+
+  static at::Tensor cat(at::TensorList tensors, int64_t dim);
+
+  static at::Tensor ceil(const at::Tensor& self);
+
+  static at::Tensor& ceil_(at::Tensor& self);
+
+  static at::Tensor cholesky(const at::Tensor& self, bool upper);
+
+  static at::Tensor clamp(const at::Tensor& self,
+                          const c10::optional<at::Scalar>& min,
+                          const c10::optional<at::Scalar>& max);
+
+  static at::Tensor clamp(const at::Tensor& self,
+                          const c10::optional<at::Tensor>& min,
+                          const c10::optional<at::Tensor>& max);
+
+  static at::Tensor& clamp_(at::Tensor& self,
+                            const c10::optional<at::Scalar>& min,
+                            const c10::optional<at::Scalar>& max);
+
+  static at::Tensor clamp_max(const at::Tensor& self, const at::Scalar& max);
+
+  static at::Tensor& clamp_max_(at::Tensor& self, const at::Scalar& max);
+
+  static at::Tensor& clamp_max_out(const at::Tensor& self,
+                                   const at::Tensor& max, at::Tensor& out);
+
+  static at::Tensor clamp_min(const at::Tensor& self, const at::Scalar& min);
+
+  static at::Tensor& clamp_min_(at::Tensor& self, const at::Scalar& min);
+
+  static at::Tensor& clamp_min_out(const at::Tensor& self,
+                                   const at::Tensor& min, at::Tensor& out);
+
+  static at::Tensor clone(const at::Tensor& self,
+                          c10::optional<at::MemoryFormat> memory_format);
+
+  static at::Tensor constant_pad_nd(const at::Tensor& self, at::IntArrayRef pad,
+                                    const at::Scalar& value);
+
+  static std::tuple<at::Tensor, at::Tensor, at::Tensor>
+  convolution_backward_overrideable(
+      const at::Tensor& grad_output, const at::Tensor& input,
+      const at::Tensor& weight, at::IntArrayRef stride, at::IntArrayRef padding,
+      at::IntArrayRef dilation, bool transposed, at::IntArrayRef output_padding,
+      int64_t groups, std::array<bool, 3> output_mask);
+
+  static at::Tensor convolution_overrideable(
+      const at::Tensor& input, const at::Tensor& weight,
+      const c10::optional<at::Tensor>& bias, at::IntArrayRef stride,
+      at::IntArrayRef padding, at::IntArrayRef dilation, bool transposed,
+      at::IntArrayRef output_padding, int64_t groups);
+
+  static at::Tensor cos(const at::Tensor& self);
+
+  static at::Tensor& cos_(at::Tensor& self);
+
+  static at::Tensor cosh(const at::Tensor& self);
+
+  static at::Tensor& cosh_(at::Tensor& self);
+
+  static at::Tensor cross(const at::Tensor& self, const at::Tensor& other,
+                          c10::optional<int64_t> dim);
+
+  static at::Tensor cumprod(const at::Tensor& self, int64_t dim,
+                            c10::optional<at::ScalarType> dtype);
+
+  static at::Tensor cumsum(const at::Tensor& self, int64_t dim,
+                           c10::optional<at::ScalarType> dtype);
+
+  static at::Tensor diag(const at::Tensor& self, int64_t diagonal);
+
+  static at::Tensor diagonal(const at::Tensor& self, int64_t offset,
+                             int64_t dim1, int64_t dim2);
+
+  static at::Tensor div(const at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor div(const at::Tensor& self, const at::Tensor& other,
+                        c10::optional<std::string> rounding_mode);
+
+  static at::Tensor div(const at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor& div_(at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor& div_(at::Tensor& self, const at::Tensor& other,
+                          c10::optional<std::string> rounding_mode);
+
+  static at::Tensor& div_(at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor dot(const at::Tensor& self, const at::Tensor& tensor);
+
+  static at::Tensor elu(const at::Tensor& self, const at::Scalar& alpha,
+                        const at::Scalar& scale, const at::Scalar& input_scale);
+
+  static at::Tensor& elu_(at::Tensor& self, const at::Scalar& alpha,
+                          const at::Scalar& scale,
+                          const at::Scalar& input_scale);
+
+  static at::Tensor elu_backward(const at::Tensor& grad_output,
+                                 const at::Scalar& alpha,
+                                 const at::Scalar& scale,
+                                 const at::Scalar& input_scale, bool self,
+                                 const at::Tensor& self_or_result);
+
+  static at::Tensor embedding(const at::Tensor& weight,
+                              const at::Tensor& indices, int64_t padding_idx,
+                              bool scale_grad_by_freq, bool sparse);
+
+  static at::Tensor embedding_dense_backward(const at::Tensor& grad_output,
+                                             const at::Tensor& indices,
+                                             int64_t num_weights,
+                                             int64_t padding_idx,
+                                             bool scale_grad_by_freq);
+
+  static at::Tensor empty(at::IntArrayRef size,
+                          c10::optional<at::ScalarType> dtype,
+                          c10::optional<at::Layout> layout,
+                          c10::optional<at::Device> device,
+                          c10::optional<bool> pin_memory,
+                          c10::optional<at::MemoryFormat> memory_format);
+
+  static at::Tensor empty_strided(at::IntArrayRef size, at::IntArrayRef stride,
+                                  c10::optional<at::ScalarType> dtype,
+                                  c10::optional<at::Layout> layout,
+                                  c10::optional<at::Device> device,
+                                  c10::optional<bool> pin_memory);
+
+  static at::Tensor eq(const at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor eq(const at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor& eq_(at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor& eq_(at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor erf(const at::Tensor& self);
+
+  static at::Tensor& erf_(at::Tensor& self);
+
+  static at::Tensor erfc(const at::Tensor& self);
+
+  static at::Tensor& erfc_(at::Tensor& self);
+
+  static at::Tensor erfinv(const at::Tensor& self);
+
+  static at::Tensor& erfinv_(at::Tensor& self);
+
+  static at::Tensor exp(const at::Tensor& self);
+
+  static at::Tensor& exp_(at::Tensor& self);
+
+  static at::Tensor expand(const at::Tensor& self, at::IntArrayRef size,
+                           bool implicit);
+
+  static at::Tensor expm1(const at::Tensor& self);
+
+  static at::Tensor& expm1_(at::Tensor& self);
+
+  static at::Tensor& exponential_(at::Tensor& self, double lambd,
+                                  c10::optional<at::Generator> generator);
+
+  static at::Tensor& eye_out(int64_t n, at::Tensor& out);
+
+  static at::Tensor& eye_out(int64_t n, int64_t m, at::Tensor& out);
+
+  static at::Tensor& fill_(at::Tensor& self, const at::Scalar& value);
+
+  static at::Tensor& fill_(at::Tensor& self, const at::Tensor& value);
+
+  static at::Tensor flip(const at::Tensor& self, at::IntArrayRef dims);
+
+  static at::Tensor floor(const at::Tensor& self);
+
+  static at::Tensor& floor_(at::Tensor& self);
+
+  static at::Tensor fmod(const at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor fmod(const at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor& fmod_(at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor& fmod_(at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor frac(const at::Tensor& self);
+
+  static at::Tensor& frac_(at::Tensor& self);
+
+  static at::Tensor gather(const at::Tensor& self, int64_t dim,
+                           const at::Tensor& index, bool sparse_grad);
+
+  static at::Tensor ge(const at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor ge(const at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor& ge_(at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor& ge_(at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor gelu(const at::Tensor& self);
+
+  static at::Tensor gelu_backward(const at::Tensor& grad,
+                                  const at::Tensor& self);
+
+  static at::Tensor ger(const at::Tensor& self, const at::Tensor& vec2);
+
+  static at::Tensor gt(const at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor gt(const at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor& gt_(at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor& gt_(at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor hardshrink(const at::Tensor& self,
+                               const at::Scalar& lambda);
+
+  static at::Tensor hardshrink_backward(const at::Tensor& grad_out,
+                                        const at::Tensor& self,
+                                        const at::Scalar& lambda);
+
+  static at::Tensor hardsigmoid(const at::Tensor& self);
+
+  static at::Tensor& hardsigmoid_(at::Tensor& self);
+
+  static at::Tensor hardsigmoid_backward(const at::Tensor& grad_output,
+                                         const at::Tensor& self);
+
+  static at::Tensor hardtanh(const at::Tensor& self, const at::Scalar& min_val,
+                             const at::Scalar& max_val);
+
+  static at::Tensor& hardtanh_(at::Tensor& self, const at::Scalar& min_val,
+                               const at::Scalar& max_val);
+
+  static at::Tensor hardtanh_backward(const at::Tensor& grad_output,
+                                      const at::Tensor& self,
+                                      const at::Scalar& min_val,
+                                      const at::Scalar& max_val);
+
+  static at::Tensor index(const at::Tensor& self,
+                          const c10::List<c10::optional<at::Tensor>>& indices);
+
+  static at::Tensor& index_add_(at::Tensor& self, int64_t dim,
+                                const at::Tensor& index,
+                                const at::Tensor& source);
+
+  static at::Tensor& index_copy_(at::Tensor& self, int64_t dim,
+                                 const at::Tensor& index,
+                                 const at::Tensor& source);
+
+  static at::Tensor& index_fill_(at::Tensor& self, int64_t dim,
+                                 const at::Tensor& index,
+                                 const at::Scalar& value);
+
+  static at::Tensor& index_fill_(at::Tensor& self, int64_t dim,
+                                 const at::Tensor& index,
+                                 const at::Tensor& value);
+
+  static at::Tensor& index_put_(
+      at::Tensor& self, const c10::List<c10::optional<at::Tensor>>& indices,
+      const at::Tensor& values, bool accumulate);
+
+  static at::Tensor index_select(const at::Tensor& self, int64_t dim,
+                                 const at::Tensor& index);
+
+  static at::Tensor inverse(const at::Tensor& self);
+
+  static at::Tensor kl_div(const at::Tensor& self, const at::Tensor& target,
+                           int64_t reduction, bool log_target);
+
+  static at::Tensor kl_div_backward(const at::Tensor& grad_output,
+                                    const at::Tensor& self,
+                                    const at::Tensor& target, int64_t reduction,
+                                    bool log_target);
+
+  static std::tuple<at::Tensor, at::Tensor> kthvalue(const at::Tensor& self,
+                                                     int64_t k, int64_t dim,
+                                                     bool keepdim);
+
+  static at::Tensor l1_loss(const at::Tensor& self, const at::Tensor& target,
+                            int64_t reduction);
+
+  static at::Tensor l1_loss_backward(const at::Tensor& grad_output,
+                                     const at::Tensor& self,
+                                     const at::Tensor& target,
+                                     int64_t reduction);
+
+  static at::Tensor le(const at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor le(const at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor& le_(at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor& le_(at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor leaky_relu(const at::Tensor& self,
+                               const at::Scalar& negative_slope);
+
+  static at::Tensor& leaky_relu_(at::Tensor& self,
+                                 const at::Scalar& negative_slope);
+
+  static at::Tensor leaky_relu_backward(const at::Tensor& grad_output,
+                                        const at::Tensor& self,
+                                        const at::Scalar& negative_slope,
+                                        bool self_is_result);
+
+  static at::Tensor log(const at::Tensor& self);
+
+  static at::Tensor log10(const at::Tensor& self);
+
+  static at::Tensor& log10_(at::Tensor& self);
+
+  static at::Tensor log1p(const at::Tensor& self);
+
+  static at::Tensor& log1p_(at::Tensor& self);
+
+  static at::Tensor log2(const at::Tensor& self);
+
+  static at::Tensor& log2_(at::Tensor& self);
+
+  static at::Tensor& log_(at::Tensor& self);
+
+  static at::Tensor log_sigmoid_backward(const at::Tensor& grad_output,
+                                         const at::Tensor& self,
+                                         const at::Tensor& buffer);
+
+  static std::tuple<at::Tensor, at::Tensor> log_sigmoid_forward(
+      const at::Tensor& self);
+
+  static at::Tensor logdet(const at::Tensor& self);
+
+  static at::Tensor logsumexp(const at::Tensor& self, at::IntArrayRef dim,
+                              bool keepdim);
+
+  static at::Tensor lt(const at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor lt(const at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor& lt_(at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor& lt_(at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor& masked_fill_(at::Tensor& self, const at::Tensor& mask,
+                                  const at::Scalar& value);
+
+  static at::Tensor& masked_fill_(at::Tensor& self, const at::Tensor& mask,
+                                  const at::Tensor& value);
+
+  static at::Tensor& masked_scatter_(at::Tensor& self, const at::Tensor& mask,
+                                     const at::Tensor& source);
+
+  static at::Tensor masked_select(const at::Tensor& self,
+                                  const at::Tensor& mask);
+
+  static at::Tensor max(const at::Tensor& self);
+
+  static std::tuple<at::Tensor, at::Tensor> max(const at::Tensor& self,
+                                                int64_t dim, bool keepdim);
+
+  static at::Tensor maximum(const at::Tensor& self, const at::Tensor& other);
+
+  static std::tuple<at::Tensor&, at::Tensor&> max_out(const at::Tensor& self,
+                                                      int64_t dim, bool keepdim,
+                                                      at::Tensor& max,
+                                                      at::Tensor& max_values);
+
+  static at::Tensor max_pool2d(const at::Tensor& self,
+                               at::IntArrayRef kernel_size,
+                               at::IntArrayRef stride, at::IntArrayRef padding,
+                               at::IntArrayRef dilation, bool ceil_mode);
+
+  static std::tuple<at::Tensor, at::Tensor> max_pool2d_with_indices(
+      const at::Tensor& self, at::IntArrayRef kernel_size,
+      at::IntArrayRef stride, at::IntArrayRef padding, at::IntArrayRef dilation,
+      bool ceil_mode);
+
+  static at::Tensor max_pool2d_with_indices_backward(
+      const at::Tensor& grad_output, const at::Tensor& self,
+      at::IntArrayRef kernel_size, at::IntArrayRef stride,
+      at::IntArrayRef padding, at::IntArrayRef dilation, bool ceil_mode,
+      const at::Tensor& indices);
+
+  static at::Tensor max_pool3d(const at::Tensor& self,
+                               at::IntArrayRef kernel_size,
+                               at::IntArrayRef stride, at::IntArrayRef padding,
+                               at::IntArrayRef dilation, bool ceil_mode);
+
+  static std::tuple<at::Tensor, at::Tensor> max_pool3d_with_indices(
+      const at::Tensor& self, at::IntArrayRef kernel_size,
+      at::IntArrayRef stride, at::IntArrayRef padding, at::IntArrayRef dilation,
+      bool ceil_mode);
+
+  static at::Tensor max_pool3d_with_indices_backward(
+      const at::Tensor& grad_output, const at::Tensor& self,
+      at::IntArrayRef kernel_size, at::IntArrayRef stride,
+      at::IntArrayRef padding, at::IntArrayRef dilation, bool ceil_mode,
+      const at::Tensor& indices);
+
+  static at::Tensor max_unpool2d(const at::Tensor& self,
+                                 const at::Tensor& indices,
+                                 at::IntArrayRef output_size);
+
+  static at::Tensor max_unpool2d_backward(const at::Tensor& grad_output,
+                                          const at::Tensor& self,
+                                          const at::Tensor& indices,
+                                          at::IntArrayRef output_size);
+
+  static at::Tensor max_unpool3d(const at::Tensor& self,
+                                 const at::Tensor& indices,
+                                 at::IntArrayRef output_size,
+                                 at::IntArrayRef stride,
+                                 at::IntArrayRef padding);
+
+  static at::Tensor max_unpool3d_backward(const at::Tensor& grad_output,
+                                          const at::Tensor& self,
+                                          const at::Tensor& indices,
+                                          at::IntArrayRef output_size,
+                                          at::IntArrayRef stride,
+                                          at::IntArrayRef padding);
+
+  static at::Tensor mean(const at::Tensor& self,
+                         c10::optional<at::ScalarType> dtype);
+
+  static at::Tensor mean(const at::Tensor& self, at::IntArrayRef dim,
+                         bool keepdim, c10::optional<at::ScalarType> dtype);
+
+  static at::Tensor min(const at::Tensor& self);
+
+  static std::tuple<at::Tensor, at::Tensor> min(const at::Tensor& self,
+                                                int64_t dim, bool keepdim);
+
+  static at::Tensor minimum(const at::Tensor& self, const at::Tensor& other);
+
+  static std::tuple<at::Tensor&, at::Tensor&> min_out(const at::Tensor& self,
+                                                      int64_t dim, bool keepdim,
+                                                      at::Tensor& min,
+                                                      at::Tensor& min_indices);
+
+  static at::Tensor mm(const at::Tensor& self, const at::Tensor& mat2);
+
+  static at::Tensor mse_loss(const at::Tensor& self, const at::Tensor& target,
+                             int64_t reduction);
+
+  static at::Tensor mse_loss_backward(const at::Tensor& grad_output,
+                                      const at::Tensor& self,
+                                      const at::Tensor& target,
+                                      int64_t reduction);
+
+  static at::Tensor mul(const at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor mul(const at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor& mul_(at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor& mul_(at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor mv(const at::Tensor& self, const at::Tensor& vec);
+
+  static at::Tensor& mv_out(const at::Tensor& self, const at::Tensor& vec,
+                            at::Tensor& out);
+
+  static std::tuple<at::Tensor, at::Tensor, at::Tensor> native_batch_norm(
+      const at::Tensor& input, const c10::optional<at::Tensor>& weight,
+      const c10::optional<at::Tensor>& bias,
+      const c10::optional<at::Tensor>& running_mean,
+      const c10::optional<at::Tensor>& running_var, bool training,
+      double momentum, double eps);
+
+  static std::tuple<at::Tensor, at::Tensor, at::Tensor>
+  native_batch_norm_backward(const at::Tensor& grad_out,
+                             const at::Tensor& input,
+                             const c10::optional<at::Tensor>& weight,
+                             const c10::optional<at::Tensor>& running_mean,
+                             const c10::optional<at::Tensor>& running_var,
+                             const c10::optional<at::Tensor>& save_mean,
+                             const c10::optional<at::Tensor>& save_invstd,
+                             bool train, double eps,
+                             std::array<bool, 3> output_mask);
+
+  static at::Tensor ne(const at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor ne(const at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor& ne_(at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor& ne_(at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor neg(const at::Tensor& self);
+
+  static at::Tensor& neg_(at::Tensor& self);
+
+  static at::Tensor nll_loss2d_backward(const at::Tensor& grad_output,
+                                        const at::Tensor& self,
+                                        const at::Tensor& target,
+                                        const c10::optional<at::Tensor>& weight,
+                                        int64_t reduction, int64_t ignore_index,
+                                        const at::Tensor& total_weight);
+
+  static std::tuple<at::Tensor, at::Tensor> nll_loss2d_forward(
+      const at::Tensor& self, const at::Tensor& target,
+      const c10::optional<at::Tensor>& weight, int64_t reduction,
+      int64_t ignore_index);
+
+  static at::Tensor nll_loss_backward(const at::Tensor& grad_output,
+                                      const at::Tensor& self,
+                                      const at::Tensor& target,
+                                      const c10::optional<at::Tensor>& weight,
+                                      int64_t reduction, int64_t ignore_index,
+                                      const at::Tensor& total_weight);
+
+  static std::tuple<at::Tensor, at::Tensor> nll_loss_forward(
+      const at::Tensor& self, const at::Tensor& target,
+      const c10::optional<at::Tensor>& weight, int64_t reduction,
+      int64_t ignore_index);
+
+  static at::Tensor nonzero(const at::Tensor& self);
+
+  static at::Tensor norm(const at::Tensor& self,
+                         const c10::optional<at::Scalar>& p,
+                         at::ScalarType dtype);
+
+  static at::Tensor norm(const at::Tensor& self, const at::Scalar& p);
+
+  static at::Tensor norm(const at::Tensor& self,
+                         const c10::optional<at::Scalar>& p,
+                         at::IntArrayRef dim, bool keepdim,
+                         at::ScalarType dtype);
+
+  static at::Tensor norm(const at::Tensor& self,
+                         const c10::optional<at::Scalar>& p,
+                         at::IntArrayRef dim, bool keepdim);
+
+  static at::Tensor normal(const at::Tensor& mean, double std,
+                           c10::optional<at::Generator> generator);
+
+  static at::Tensor normal(double mean, const at::Tensor& std,
+                           c10::optional<at::Generator> generator);
+
+  static at::Tensor normal(const at::Tensor& mean, const at::Tensor& std,
+                           c10::optional<at::Generator> generator);
+
+  static at::Tensor& normal_(at::Tensor& self, double mean, double std,
+                             c10::optional<at::Generator> generator);
+
+  static at::Tensor permute(const at::Tensor& self, at::IntArrayRef dims);
+
+  static at::Tensor pow(const at::Tensor& self, const at::Scalar& exponent);
+
+  static at::Tensor pow(const at::Tensor& self, const at::Tensor& exponent);
+
+  static at::Tensor pow(const at::Scalar& self, const at::Tensor& exponent);
+
+  static at::Tensor& pow_(at::Tensor& self, const at::Scalar& exponent);
+
+  static at::Tensor& pow_(at::Tensor& self, const at::Tensor& exponent);
+
+  static at::Tensor prod(const at::Tensor& self,
+                         c10::optional<at::ScalarType> dtype);
+
+  static at::Tensor prod(const at::Tensor& self, int64_t dim, bool keepdim,
+                         c10::optional<at::ScalarType> dtype);
+
+  static at::Tensor& put_(at::Tensor& self, const at::Tensor& index,
+                          const at::Tensor& source, bool accumulate);
+
+  static std::tuple<at::Tensor, at::Tensor> qr(const at::Tensor& self,
+                                               bool some);
+
+  static at::Tensor& random_(at::Tensor& self, int64_t from,
+                             c10::optional<int64_t> to,
+                             c10::optional<at::Generator> generator);
+
+  static at::Tensor& random_(at::Tensor& self, int64_t to,
+                             c10::optional<at::Generator> generator);
+
+  static at::Tensor& random_(at::Tensor& self,
+                             c10::optional<at::Generator> generator);
+
+  static at::Tensor reciprocal(const at::Tensor& self);
+
+  static at::Tensor& reciprocal_(at::Tensor& self);
+
+  static at::Tensor reflection_pad2d(const at::Tensor& self,
+                                     at::IntArrayRef padding);
+
+  static at::Tensor reflection_pad2d_backward(const at::Tensor& grad_output,
+                                              const at::Tensor& self,
+                                              at::IntArrayRef padding);
+
+  static at::Tensor relu(const at::Tensor& self);
+
+  static at::Tensor& relu_(at::Tensor& self);
+
+  static at::Tensor remainder(const at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor remainder(const at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor& remainder_(at::Tensor& self, const at::Tensor& other);
+
+  static at::Tensor& remainder_(at::Tensor& self, const at::Scalar& other);
+
+  static at::Tensor repeat(const at::Tensor& self, at::IntArrayRef repeats);
+
+  static at::Tensor replication_pad1d(const at::Tensor& self,
+                                      at::IntArrayRef padding);
+  static at::Tensor replication_pad1d_backward(const at::Tensor& grad_output,
+                                               const at::Tensor& self,
+                                               at::IntArrayRef padding);
+
+  static at::Tensor replication_pad2d(const at::Tensor& self,
+                                      at::IntArrayRef padding);
+  static at::Tensor replication_pad2d_backward(const at::Tensor& grad_output,
+                                               const at::Tensor& self,
+                                               at::IntArrayRef padding);
+
+  static const at::Tensor& resize_(
+      const at::Tensor& self, at::IntArrayRef size,
+      c10::optional<at::MemoryFormat> memory_format);
+
+  static at::Tensor round(const at::Tensor& self);
+
+  static at::Tensor& round_(at::Tensor& self);
+
+  static at::Tensor rrelu_with_noise(const at::Tensor& self,
+                                     const at::Tensor& noise,
+                                     const at::Scalar& lower,
+                                     const at::Scalar& upper, bool training,
+                                     c10::optional<at::Generator> generator);
+
+  static at::Tensor rrelu_with_noise_backward(
+      const at::Tensor& grad_output, const at::Tensor& self,
+      const at::Tensor& noise, const at::Scalar& lower, const at::Scalar& upper,
+      bool training, bool self_is_result);
+
+  static at::Tensor rsqrt(const at::Tensor& self);
+
+  static at::Tensor& rsqrt_(at::Tensor& self);
+
+  static at::Tensor rsub(const at::Tensor& self, const at::Tensor& other,
+                         const at::Scalar& alpha);
+
+  static at::Tensor rsub(const at::Tensor& self, const at::Scalar& other,
+                         const at::Scalar& alpha);
+
+  static at::Tensor& scatter_(at::Tensor& self, int64_t dim,
+                              const at::Tensor& index, const at::Tensor& src);
+
+  static at::Tensor& scatter_(at::Tensor& self, int64_t dim,
+                              const at::Tensor& index, const at::Scalar& value);
+
+  static at::Tensor& scatter_add_(at::Tensor& self, int64_t dim,
+                                  const at::Tensor& index,
+                                  const at::Tensor& src);
+
+  static at::Tensor select(const at::Tensor& self, int64_t dim, int64_t index);
+
+  static at::Tensor& silu_out(const at::Tensor& self, at::Tensor& out);
+
+  static at::Tensor sigmoid(const at::Tensor& self);
+
+  static at::Tensor& sigmoid_(at::Tensor& self);
+
+  static at::Tensor sigmoid_backward(const at::Tensor& grad_output,
+                                     const at::Tensor& output);
+
+  static at::Tensor sign(const at::Tensor& self);
+
+  static at::Tensor& sign_(at::Tensor& self);
+
+  static at::Tensor sin(const at::Tensor& self);
+
+  static at::Tensor& sin_(at::Tensor& self);
+
+  static at::Tensor sinh(const at::Tensor& self);
+
+  static at::Tensor& sinh_(at::Tensor& self);
+
+  static at::Tensor slice(const at::Tensor& self, int64_t dim,
+                          c10::optional<int64_t> start,
+                          c10::optional<int64_t> end, int64_t step);
+
+  static at::Tensor smooth_l1_loss(const at::Tensor& self,
+                                   const at::Tensor& target, int64_t reduction,
+                                   double beta);
+
+  static at::Tensor smooth_l1_loss_backward(const at::Tensor& grad_output,
+                                            const at::Tensor& self,
+                                            const at::Tensor& target,
+                                            int64_t reduction, double beta);
+
+  static at::Tensor softplus(const at::Tensor& self, const at::Scalar& beta,
+                             const at::Scalar& threshold);
+
+  static at::Tensor softplus_backward(const at::Tensor& grad_output,
+                                      const at::Tensor& self,
+                                      const at::Scalar& beta,
+                                      const at::Scalar& threshold,
+                                      const at::Tensor& output);
+
+  static at::Tensor softshrink(const at::Tensor& self,
+                               const at::Scalar& lambda);
+
+  static at::Tensor softshrink_backward(const at::Tensor& grad_output,
+                                        const at::Tensor& self,
+                                        const at::Scalar& lambda);
+
+  static std::tuple<at::Tensor, at::Tensor> sort(const at::Tensor& self,
+                                                 int64_t dim, bool descending);
+
+  static std::vector<at::Tensor> split(const at::Tensor& self,
+                                       int64_t split_size, int64_t dim);
+
+  static std::vector<at::Tensor> split_with_sizes(const at::Tensor& self,
+                                                  at::IntArrayRef split_sizes,
+                                                  int64_t dim);
+
+  static at::Tensor sqrt(const at::Tensor& self);
+
+  static at::Tensor& sqrt_(at::Tensor& self);
+
+  static at::Tensor squeeze(const at::Tensor& self);
+
+  static at::Tensor squeeze(const at::Tensor& self, int64_t dim);
+
+  static at::Tensor& squeeze_(at::Tensor& self);
+
+  static at::Tensor& squeeze_(at::Tensor& self, int64_t dim);
+
+  static at::Tensor stack(at::TensorList tensors, int64_t dim);
+
+  static at::Tensor std(const at::Tensor& self, bool unbiased);
+
+  static at::Tensor std(const at::Tensor& self, at::IntArrayRef dim,
+                        bool unbiased, bool keepdim);
+
+  static at::Tensor std(const at::Tensor& self,
+                        c10::optional<at::IntArrayRef> dim,
+                        c10::optional<int64_t> correction, bool keepdim);
+
+  static at::Tensor sub(const at::Tensor& self, const at::Tensor& other,
+                        const at::Scalar& alpha);
+
+  static at::Tensor sub(const at::Tensor& self, const at::Scalar& other,
+                        const at::Scalar& alpha);
+
+  static at::Tensor& sub_(at::Tensor& self, const at::Tensor& other,
+                          const at::Scalar& alpha);
+
+  static at::Tensor& sub_(at::Tensor& self, const at::Scalar& other,
+                          const at::Scalar& alpha);
+
+  static at::Tensor sum(const at::Tensor& self,
+                        c10::optional<at::ScalarType> dtype);
+
+  static at::Tensor sum(const at::Tensor& self, at::IntArrayRef dim,
+                        bool keepdim, c10::optional<at::ScalarType> dtype);
+
+  static std::tuple<at::Tensor, at::Tensor, at::Tensor> svd(
+      const at::Tensor& self, bool some, bool compute_uv);
+
+  static std::tuple<at::Tensor, at::Tensor> symeig(const at::Tensor& self,
+                                                   bool eigenvectors,
+                                                   bool upper);
+
+  static at::Tensor t(const at::Tensor& self);
+
+  static at::Tensor& t_(at::Tensor& self);
+
+  static at::Tensor take(const at::Tensor& self, const at::Tensor& index);
+
+  static at::Tensor tan(const at::Tensor& self);
+
+  static at::Tensor& tan_(at::Tensor& self);
+
+  static at::Tensor tanh(const at::Tensor& self);
+
+  static at::Tensor& tanh_(at::Tensor& self);
+
+  static at::Tensor tanh_backward(const at::Tensor& grad_output,
+                                  const at::Tensor& output);
+
+  static at::Tensor threshold(const at::Tensor& self,
+                              const at::Scalar& threshold,
+                              const at::Scalar& value);
+
+  static at::Tensor& threshold_(at::Tensor& self, const at::Scalar& threshold,
+                                const at::Scalar& value);
+
+  static at::Tensor threshold_backward(const at::Tensor& grad_output,
+                                       const at::Tensor& self,
+                                       const at::Scalar& threshold);
+
+  static std::tuple<at::Tensor, at::Tensor> topk(const at::Tensor& self,
+                                                 int64_t k, int64_t dim,
+                                                 bool largest, bool sorted);
+
+  static at::Tensor trace(const at::Tensor& self);
+
+  static std::tuple<at::Tensor, at::Tensor> triangular_solve(
+      const at::Tensor& b, const at::Tensor& A, bool upper, bool transpose,
+      bool unitriangular);
+
+  static at::Tensor transpose(const at::Tensor& self, int64_t dim0,
+                              int64_t dim1);
+
+  static at::Tensor& transpose_(at::Tensor& self, int64_t dim0, int64_t dim1);
+
+  static at::Tensor tril(const at::Tensor& self, int64_t diagonal);
+
+  static at::Tensor& tril_(at::Tensor& self, int64_t diagonal);
+
+  static at::Tensor triu(const at::Tensor& self, int64_t diagonal);
+
+  static at::Tensor& triu_(at::Tensor& self, int64_t diagonal);
+
+  static at::Tensor trunc(const at::Tensor& self);
+
+  static at::Tensor& trunc_(at::Tensor& self);
+
+  static std::vector<at::Tensor> unbind(const at::Tensor& self, int64_t dim);
+
+  static at::Tensor& uniform_(at::Tensor& self, double from, double to,
+                              c10::optional<at::Generator> generator);
+
+  static at::Tensor unsqueeze(const at::Tensor& self, int64_t dim);
+
+  static at::Tensor& unsqueeze_(at::Tensor& self, int64_t dim);
+
+  static at::Tensor upsample_bilinear2d(const at::Tensor& self,
+                                        at::IntArrayRef output_size,
+                                        bool align_corners,
+                                        c10::optional<double> scales_h,
+                                        c10::optional<double> scales_w);
+
+  static at::Tensor upsample_bilinear2d_backward(
+      const at::Tensor& grad_output, at::IntArrayRef output_size,
+      at::IntArrayRef input_size, bool align_corners,
+      c10::optional<double> scales_h, c10::optional<double> scales_w);
+
+  static at::Tensor upsample_nearest2d(
+      const at::Tensor& input, c10::optional<at::IntArrayRef> output_size,
+      c10::optional<at::ArrayRef<double>> scale_factors);
+
+  static at::Tensor upsample_nearest2d_backward(
+      const at::Tensor& grad_output, c10::optional<at::IntArrayRef> output_size,
+      at::IntArrayRef input_size,
+      c10::optional<at::ArrayRef<double>> scale_factors);
+
+  static at::Tensor upsample_nearest2d(const at::Tensor& self,
+                                       at::IntArrayRef output_size,
+                                       c10::optional<double> scales_h,
+                                       c10::optional<double> scales_w);
+
+  static at::Tensor upsample_nearest2d_backward(const at::Tensor& grad_output,
+                                                at::IntArrayRef output_size,
+                                                at::IntArrayRef input_size,
+                                                c10::optional<double> scales_h,
+                                                c10::optional<double> scales_w);
+
+  static at::Tensor var(const at::Tensor& self, bool unbiased);
+
+  static at::Tensor var(const at::Tensor& self, at::IntArrayRef dim,
+                        bool unbiased, bool keepdim);
+
+  static at::Tensor var(const at::Tensor& self,
+                        c10::optional<at::IntArrayRef> dim,
+                        c10::optional<int64_t> correction, bool keepdim);
+
+  static at::Tensor view(const at::Tensor& self, at::IntArrayRef size);
+
+  static at::Tensor& zero_(at::Tensor& self);
+
+  static at::Scalar _local_scalar_dense(const at::Tensor& self);
+};
+
+}  // namespace torch_xla
diff --git a/xla_native_functions.yaml b/xla_native_functions.yaml
deleted file mode 100644
index 37cbb05df26..00000000000
--- a/xla_native_functions.yaml
+++ /dev/null
@@ -1,379 +0,0 @@
-backend: XLA
-cpp_namespace: torch_xla
-supported:
-  - abs
-  - abs_
-  - acos
-  - acos_
-  - add.Tensor
-  - add_.Tensor
-  - add.Scalar
-  - add_.Scalar
-  - all.dim
-  - any.dim
-  - arange.start_out
-  - argmax
-  - argmin
-  - acosh
-  - acosh_
-  - asinh
-  - asinh_
-  - atanh
-  - atanh_
-  - as_strided
-  - as_strided_
-  - asin
-  - asin_
-  - atan
-  - atan_
-  - baddbmm
-  - baddbmm_
-  - bernoulli
-  - bernoulli_.Tensor
-  - bernoulli_.float
-  - binary_cross_entropy
-  - binary_cross_entropy_backward
-  - binary_cross_entropy_with_logits
-  - bitwise_not.out
-  - bmm
-  - cat
-  - ceil
-  - ceil_
-  - clamp
-  - clamp.Tensor
-  - clamp_
-  - clamp_max
-  - clamp_max.Tensor_out
-  - clamp_max_
-  - clamp_min
-  - clamp_min.Tensor_out
-  - clamp_min_
-  - constant_pad_nd
-  - convolution_overrideable
-  - convolution_backward_overrideable
-  - _copy_from
-  - cos
-  - cos_
-  - cosh
-  - cosh_
-  - cumprod
-  - cumsum
-  - diagonal
-  - div.Tensor
-  - div_.Tensor
-  - div.Tensor_mode
-  - div_.Tensor_mode
-  - div.Scalar
-  - div_.Scalar
-  - dot
-  - embedding
-  - embedding_dense_backward
-  - empty.memory_format
-  - resize_
-  - empty_strided
-  - erf
-  - erf_
-  - erfc
-  - erfc_
-  - exp
-  - exp_
-  - expm1
-  - expm1_
-  - expand
-  - eye.out
-  - eye.m_out
-  - fill_.Scalar
-  - fill_.Tensor
-  - floor
-  - floor_
-  - frac
-  - frac_
-  - index.Tensor
-  - index_copy_
-  - index_put_
-  - _index_put_impl_
-  - inverse
-  - kl_div
-  - kl_div_backward
-  - kthvalue
-  - log
-  - log_
-  - log10
-  - log10_
-  - log1p
-  - log1p_
-  - log2
-  - log2_
-  - logdet
-  - _log_softmax
-  - _log_softmax_backward_data
-  - logsumexp
-  - max.dim
-  - max.dim_max
-  - mean
-  - mean.dim
-  - min.dim
-  - min.dim_min
-  - mm
-  - mul.Tensor
-  - mul_.Tensor
-  - mul.Scalar
-  - mul_.Scalar
-  - mv
-  - mv.out
-  - native_batch_norm
-  - native_batch_norm_backward
-  - permute
-  - reciprocal
-  - reciprocal_
-  - neg
-  - neg_
-  - repeat
-  - round
-  - round_
-  - relu
-  - relu_
-  - gelu
-  - gelu_backward
-  - hardshrink
-  - hardshrink_backward
-  - rsqrt
-  - rsqrt_
-  - select.int
-  - silu.out
-  - sigmoid
-  - sigmoid_
-  - sin
-  - sin_
-  - sinh
-  - sinh_
-  - slice.Tensor
-  - _softmax
-  - _softmax_backward_data
-  - split.Tensor
-  - split_with_sizes
-  - squeeze
-  - squeeze.dim
-  - squeeze_
-  - squeeze_.dim
-  - stack
-  - sum
-  - sum.dim_IntList
-  - sqrt
-  - sqrt_
-  - std
-  - std.dim
-  - std.correction
-  - prod
-  - prod.dim_int
-  - t
-  - t_
-  - tan
-  - tan_
-  - tanh
-  - tanh_
-  - threshold
-  - threshold_
-  - threshold_backward
-  - transpose.int
-  - transpose_
-  - flip
-  - _trilinear
-  - trunc
-  - trunc_
-  - _unsafe_view
-  - unsqueeze
-  - unsqueeze_
-  - var
-  - var.dim
-  - var.correction
-  - _s_where
-  - norm.ScalarOpt_dtype
-  - norm.Scalar
-  - norm.ScalarOpt_dim_dtype
-  - norm.ScalarOpt_dim
-  - clone
-  - zero_
-  - sub.Tensor
-  - sub_.Tensor
-  - sub.Scalar
-  - sub_.Scalar
-  - rsub.Tensor
-  - rsub.Scalar
-  - addmm
-  - unbind.int
-  - _local_scalar_dense
-  - _pack_padded_sequence
-  - masked_fill_.Scalar
-  - masked_fill_.Tensor
-  - masked_scatter_
-  - view
-  - put_
-  - index_add_
-  - index_fill_.int_Scalar
-  - index_fill_.int_Tensor
-  - scatter_.src
-  - scatter_.value
-  - scatter_add_
-  - eq_.Scalar
-  - eq_.Tensor
-  - bitwise_and.Tensor_out
-  - bitwise_and.Scalar_out
-  - bitwise_or.Tensor_out
-  - bitwise_or.Scalar_out
-  - bitwise_xor.Tensor_out
-  - bitwise_xor.Scalar_out
-  - __lshift__.Scalar
-  - __lshift__.Tensor
-  - __ilshift__.Scalar
-  - __ilshift__.Tensor
-  - __rshift__.Scalar
-  - __rshift__.Tensor
-  - __irshift__.Scalar
-  - __irshift__.Tensor
-  - atan2_
-  - tril_
-  - triu_
-  - fmod_.Scalar
-  - fmod_.Tensor
-  - remainder_.Scalar
-  - remainder_.Tensor
-  - addcdiv_
-  - random_.from
-  - random_.to
-  - random_
-  - uniform_
-  - exponential_
-  - diag
-  - cross
-  - triu
-  - tril
-  - trace
-  - ne.Scalar
-  - ne.Tensor
-  - ne_.Scalar
-  - ne_.Tensor
-  - eq.Scalar
-  - eq.Tensor
-  - ge.Scalar
-  - ge.Tensor
-  - ge_.Scalar
-  - ge_.Tensor
-  - le.Scalar
-  - le.Tensor
-  - le_.Scalar
-  - le_.Tensor
-  - gt.Scalar
-  - gt.Tensor
-  - gt_.Scalar
-  - gt_.Tensor
-  - lt.Scalar
-  - lt.Tensor
-  - lt_.Scalar
-  - lt_.Tensor
-  - take
-  - index_select
-  - masked_select
-  - nonzero
-  - gather
-  - addcmul
-  - addcmul_
-  - addcdiv
-  - triangular_solve
-  - symeig
-  - svd
-  - cholesky
-  - qr
-  - erfinv
-  - erfinv_
-  - sign
-  - sign_
-  - atan2
-  - fmod.Scalar
-  - fmod.Tensor
-  - remainder.Scalar
-  - remainder.Tensor
-  - min
-  - max
-  - maximum
-  - minimum
-  - sort
-  - topk
-  - all
-  - any
-  - pow.Tensor_Tensor
-  - pow.Scalar
-  - pow.Tensor_Scalar
-  - pow_.Scalar
-  - pow_.Tensor
-  - normal_
-  - normal.Tensor_float
-  - normal.float_Tensor
-  - normal.Tensor_Tensor
-  - alias
-  - _amp_foreach_non_finite_check_and_unscale_
-  - _amp_update_scale_
-  - mse_loss
-  - mse_loss_backward
-  - l1_loss
-  - l1_loss_backward
-  - nll_loss_forward
-  - nll_loss_backward
-  - nll_loss2d_forward
-  - nll_loss2d_backward
-  - smooth_l1_loss
-  - smooth_l1_loss_backward
-  - elu
-  - elu_backward
-  - elu_
-  - hardsigmoid
-  - hardsigmoid_
-  - hardsigmoid_backward
-  - hardtanh
-  - hardtanh_backward
-  - hardtanh_
-  - leaky_relu
-  - leaky_relu_backward
-  - leaky_relu_
-  - log_sigmoid_forward
-  - log_sigmoid_backward
-  - rrelu_with_noise
-  - rrelu_with_noise_backward
-  - softplus
-  - softplus_backward
-  - softshrink
-  - softshrink_backward
-  - _adaptive_avg_pool2d
-  - _adaptive_avg_pool2d_backward
-  - _adaptive_avg_pool3d
-  - _adaptive_avg_pool3d_backward
-  - avg_pool2d
-  - avg_pool2d_backward
-  - avg_pool3d
-  - avg_pool3d_backward
-  - max_pool2d_with_indices
-  - max_pool2d_with_indices_backward
-  - max_pool3d_with_indices
-  - max_pool3d_with_indices_backward
-  - max_unpool2d
-  - max_unpool2d_backward
-  - max_unpool3d
-  - max_unpool3d_backward
-  - reflection_pad2d
-  - reflection_pad2d_backward
-  - replication_pad1d
-  - replication_pad1d_backward
-  - replication_pad2d
-  - replication_pad2d_backward
-  - upsample_nearest2d.vec
-  - upsample_nearest2d_backward.vec
-  - upsample_bilinear2d
-  - upsample_bilinear2d_backward
-  - upsample_nearest2d
-  - upsample_nearest2d_backward
-  - sigmoid_backward
-  - tanh_backward
-  - ger
-autograd:
-  - max_pool2d
-  - max_pool3d