PyTorch NNAPI integration prototype #46780

dreiss · 2020-10-23T18:35:13Z

Stack from ghstack:

Make bundled inputs work with quantized zero inputs #47407 Make bundled inputs work with quantized zero inputs
PyTorch NNAPI integration prototype #46780 PyTorch NNAPI integration prototype
Add a command-line flag for overriding pthreadpool size #46781 Add a command-line flag for overriding pthreadpool size

Summary:
This is in prototype status, but pretty functional. There are two major
parts.

Model converter. This is a pure Python component that consumes a
model in TorchScript format, converts the operations into NNAPI
semantics, and serializes the model in a custom format. It then wraps
the result in a new TorchScript model that can invoke NNAPI under the
hood.
Runtime. This is a TorchBind object that deserializes the model and
sends the result to NNAPI. This is fairly simple since the serialized
format is basically just a list of NNAPI calls to make, so most of the
code is spent on bounds checking.

A few notes on the design.

Currently, all tensor sizes need to be fixed, and those fixed sizes
are burned directly into the serialized model. This will probably
need to change. NNAPI supports variable-sized tensors, but the
important hardware backends do not. However, we're seeing use cases
crop up where the input size is not known until around the time that
the model is loaded (for example, it might depend on the camera aspect
ratio). I think the proper fix here is to remove the code in the
converter that eagerly calculates the sizes of the intermediate
tensors and replace it with a code generator that will generate some
TorchScript code that will perform those calculations at model load
time. This way, we will be able to support models that have
variable-sized inputs while still only showing fixed-sized operands to
NNAPI.
The important hardware backends want operands to be in NHWC order, but
PyTorch natively represents all tensors and NCHW. The strategy for
this is to keep NCHW during most of the conversion process, but track
and additional value per operand representing the "dimension order".
The dimension order gets propagated through convolutions and pointwise
ops. When we're ready to serialize the model, we reorder the
dimensions for "channels last" operands to NHWC.

Test Plan:
Some local testing with FB prod models. I'll need to add some examples
and automated tests.

Differential Revision: D24574040

dr-ci · 2020-10-23T18:52:22Z

💊 CI failures summary and remediations

As of commit c323076 (more details on the Dr. CI page):

2/2 failures possibly* introduced in this PR
- 2/2 non-CircleCI failure(s)

Extra GitHub checks: 1 failed

Failed: GitHub Actions - clang-tidy

codecov.io: 1 failed

Failed: codecov/patch

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 22 times.

aten/src/ATen/nnapi/nnapi_bind.cpp

dhruvbird · 2020-10-26T16:51:40Z

aten/src/ATen/nnapi/NeuralNetworks.h

+    int32_t zeroPoint;
+} ANeuralNetworksOperandType;
+
+#endif  // MINIMAL_NEURAL_NETWORKS_H


Okay so this file is mainly from NNAPIs source tree - i.e. enum and struct definitions.

dhruvbird · 2020-10-26T17:00:25Z

torch/backends/_nnapi/prepare.py

+            if fmt == 0:
+                fixed_args.append(args[idx].contiguous())
+            elif fmt == 1:
+                fixed_args.append(args[idx].permute(0,2,3,1).contiguous())


Okay this is where the NHCW stuff you mentioned earlier happens.

dhruvbird · 2020-10-26T17:08:16Z

torch/backends/_nnapi/serializer.py

+    TENSOR_QUANT16_ASYMM = 12
+
+
+class NNAPI_OperationCode(object):


Okay, these were picked up from https://developer.android.com/ndk/reference/group/neural-networks#operationcode

dhruvbird · 2020-10-26T17:08:50Z

torch/backends/_nnapi/serializer.py

+        return struct.pack("i" * len(ints), *ints)
+
+
+    ADDER_MAP = {


Okay so these are the aten operators supported for the NNAPI execution. I'm guessing this list is dictated by the operation codes for the operators from https://developer.android.com/ndk/reference/group/neural-networks#operationcode

Yeah, this maps from the internal TorchScript operator name to the function that needs to handle that node. In some cases (like processing constants), there's no NNAPI operation generated.

I'm assuming this map could grow if we are supporting more and more nnapi operations?

Yes. Though I'd like to replace the explicit map with a decorator.

torch/backends/_nnapi/serializer.py

dhruvbird · 2020-10-26T17:53:27Z

torch/backends/_nnapi/serializer.py

+        if config is None:
+            config = {}
+
+        self.solid_weights = config.get("solid_weights", False)


Q. What is solid_weights?

Oh, this is a vestigal feature. I'll delete it. Currently, we get each weight from a separate tensor. "solid weights" was to let all of the weights be bundled as a single blob to make it possible to deploy without using PyTorch or Caffe2 as a wrapper.

Thanks! Now I understand the intent, but not the details, so I shan't mull over them for now.

dhruvbird · 2020-10-26T17:55:31Z

torch/backends/_nnapi/serializer.py

+        self.modules = {}
+        self.constants = {}
+        self.jitval_operand_map = {}
+        self.cached_immediates = {}


Okay - cached_immediates is a set of all literal constant values that show up in this model, de-duped.

dhruvbird · 2020-10-26T18:08:08Z

torch/backends/_nnapi/serializer.py

+    return abs(lhs - rhs) <= tolerance * min(lhs, rhs)
+
+
+def tensor_size(op_type, dims):


Okay so this is the tensor size in bytes.

dhruvbird · 2020-10-26T19:45:03Z

torch/backends/_nnapi/serializer.py

+    if len(s1) > len(s2):
+        #s2 = [1] * (len(s1) - len(s2)) + s2
+        raise Exception("Non-equal-rank broadcast is too dangerous because XXX.")
+    if len(s2) > len(s1):
+        #s3 = [1] * (len(s2) - len(s1)) + s1
+        raise Exception("Non-equal-rank broadcast is too dangerous because XXX.")


@dreiss Maybe this is not relevant here, but https://pytorch.org/docs/stable/notes/broadcasting.html suggests that broadcasting can account for non-equal shapes of tensors. Is this something that would be a limitation of this specific API?

I need to clean this code up a bit. In theory, PyTorch and NNAPI have the same broadcasting semantics (back to front, extend front with 1s). However, when using "nnapi_nhwc", it gets a bit wonky, because PyTorch uses an implicit NHWC representation based on strides, so the tensors still have NCHW semantics, and broadcasting order is still W,H,C,N. NNAPI (and TF and C2) use explicit NHWC, so the broadcast order is C,W,H,N. The trickiest bits come into play when we try to broadcast a known NHWC tensors with a constant (where we don't necessarily known the user's intention). I haven't worked through all the cases yet, so I decided to not support non-equal-rank broadcast for now.

Thanks! I understand now that there is additional complexity when you're dealing with different orders of dimensions and that broadcasting only works when the dimensions are ordered the same, and at a high level I gather that you can't make that assertion here, so you've left this open. If you somehow knew the order of dimensions then you could have supported the 1 padding behaviour. Got it!

dhruvbird · 2020-10-26T19:50:53Z

torch/backends/_nnapi/serializer.py

+        header = struct.pack(
+                "iiiiii",
+                version,
+                len(self.operands),
+                len(self.values),
+                len(self.operations),
+                len(self.inputs),
+                len(self.outputs),
+                )
+        model.append(header)
+
+        serialized_values, serialized_value_data = self.serialize_values()
+
+        model.extend(struct.pack("iifi", t, len(d), s, z) for (t,d,_m,s,z) in self.operands)
+        model.extend(serialized_values)
+        model.extend(struct.pack("iii", *x) for x in self.operations)
+        model.extend(self.serialize_ints(fix_shape(dims, mf)) for (_, dims, mf, _, _) in self.operands)
+        model.extend(serialized_value_data)
+        model.append(self.serialize_ints(self.operation_args))
+        model.append(self.serialize_ints(self.inputs))
+        model.append(self.serialize_ints(self.outputs))


Okay so the serialized model basically is a file which has the model header specifying version, num(operands), num(values), etc... where operands, values, etc... are fixed size structures and the de-serializer basically reads them off as such because it knows how many of each to expect.

torch/backends/_nnapi/prepare.py

dhruvbird · 2020-10-26T20:04:16Z

torch/backends/_nnapi/serializer.py

+        self.outputs = []
+
+        self.modules = {}
+        self.constants = {}


Maybe I'm missing something, but I don't see any place that constants is being set (or updated). I only see queries.

nvm I see it now as self.constants[key].

dhruvbird · 2020-10-26T20:08:18Z

torch/backends/_nnapi/serializer.py

+    def add_constant_value(self, jitval, ctype, value):
+        assert jitval not in self.constants
+        self.constants[jitval] = (ctype, value)


Q. Where can I find out more about the permissible values in jitval and what they mean?

This is a "Value" in the JIT IR. https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/ir/ir.h#L148 It represents a local variable or the result of a TorchScript expression.

I don't see a conversion from struct Value to std::string or const char* in the file csrc/jit/ir/ir.h. How does one use jitval as a string in this context?

Summary: This is in prototype status, but pretty functional. There are two major parts. - Model converter. This is a pure Python component that consumes a model in TorchScript format, converts the operations into NNAPI semantics, and serializes the model in a custom format. It then wraps the result in a new TorchScript model that can invoke NNAPI under the hood. - Runtime. This is a TorchBind object that deserializes the model and sends the result to NNAPI. This is fairly simple since the serialized format is basically just a list of NNAPI calls to make, so most of the code is spent on bounds checking. A few notes on the design. - Currently, all tensor sizes need to be fixed, and those fixed sizes are burned directly into the serialized model. This will probably need to change. NNAPI supports variable-sized tensors, but the important hardware backends do not. However, we're seeing use cases crop up where the input size is not known until around the time that the model is loaded (for example, it might depend on the camera aspect ratio). I think the proper fix here is to remove the code in the converter that eagerly calculates the sizes of the intermediate tensors and replace it with a code generator that will generate some TorchScript code that will perform those calculations at model load time. This way, we will be able to support models that have variable-sized inputs while still only showing fixed-sized operands to NNAPI. - The important hardware backends want operands to be in NHWC order, but PyTorch natively represents all tensors and NCHW. The strategy for this is to keep NCHW during most of the conversion process, but track and additional value per operand representing the "dimension order". The dimension order gets propagated through convolutions and pointwise ops. When we're ready to serialize the model, we reorder the dimensions for "channels last" operands to NHWC. Test Plan: Some local testing with FB prod models. I'll need to add some examples and automated tests. ghstack-source-id: e1fa978af170d4d00c5270c52b9d4cb63843e7d2 Pull Request resolved: pytorch#46780

iseeyuan · 2020-10-27T23:28:29Z

aten/src/ATen/nnapi/codegen.py

+            }
+            """)
+        .replace("__DEFINE_CHECK_FUNCTIONS__", "\n".join(define_checks))
+        .replace("__LOAD_FUNCTIONS__", "\n".join(load_functions))


When we find a symbol with dlsym, do we build it into PyTorch binary? Would that introduce some binary size?

The wrapper obviously has some size in the binary, but dlsym happens at runtime.

Got it. So we assume that libneuralnetworks.so is available if on-device.

iseeyuan · 2020-10-27T23:50:09Z

torch/backends/_nnapi/serializer.py

+    TENSOR_QUANT16_ASYMM = 12
+
+
+class NNAPI_OperationCode(object):


Does the nnapi OperationCode has optional or default args like PT operators? For example, if pt::conv_2d() handles 5 args and the last one is optional. There could be calls with either 4 or 5 args. They are both valid. When converted to nnapi's CONV_2D, would it also cover both the situations with 4 and 5 args?

Each NNAPI op defines its own semantics. For example, Conv2D has two forms, one with 10 args and one with 13: https://android.googlesource.com/platform/frameworks/ml/+/refs/tags/android-11.0.0_r8/nn/runtime/include/NeuralNetworks.h#401 . But if you look closer, some of those args are "Available since API level 29", so there are also a 7-arg version and a different 10-arg version. I believe the two different 10-arg versions are distinguished by the types of the arguments.

I see. We may need to find the bast schema match of Conv2D manually, when converting a conv2d node to nnapi::Conv2D

iseeyuan · 2020-10-28T00:00:30Z

torch/backends/_nnapi/serializer.py

+        value = getattr(obj, name)
+        output = node.outputsAt(0)
+        ctype = output.type()
+        self.add_constant_value(output, ctype, value)


Good idea to freeze the model! So in nnapi models, there's no concept of "parameters"? The weights and bias are all "constants"? Does that mean it does not have training capability?

NNAPI is only for inference.

Would this model only be used in platforms with nnapi available (Android)?

Yes. (Though it is possible in theory to build the NNAPI CPU implementation for Linux and run the model directly, which is how I've been testing.)

Does that mean we may keep multiple versions of a model?
For Android, we deploy an nnapi model.
For iOS, I'm wondering if there is corresponding API layer (CoreML?)

@iseeyuan for Metal, I believe it's integrated as a separate backend instead. I'm also curious to know why NNAPI isn't implemented as a separate backend. One thought I had is that PyTorch represents image Tensors as NCHW whereas NNAP (based on @dreiss diff summary) requires them to be in NHWC format, so implementing it as a separate backend may have required the conversion on every operator entry/exit? Not sure.

iseeyuan · 2020-10-28T00:03:37Z

torch/backends/_nnapi/serializer.py

+                self.add_getattr(node),
+            "prim::Constant": lambda self, node:
+                self.add_constant_node(node),
+            "prim::ListConstruct": lambda self, node:


Could nnapi has corresponding prim::DictConstruct, TupleConstruct, etc? Do you have any plan to handle those nodes?

NNAPI has no concepts of lists and dicts. It only operates on tensors and scalars: https://android.googlesource.com/platform/frameworks/ml/+/refs/tags/android-11.0.0_r8/nn/runtime/include/NeuralNetworks.h#68

Like how you handled prim::ListConstruct, we may also write code to handle DictConstruct and TupleConstruct from torchscript side?

Oh, yes, I suppose so.

iseeyuan · 2020-10-28T00:05:45Z

torch/backends/_nnapi/serializer.py

+
+
+def serialize_model(module, inputs, config=None):
+    return _NnapiSerializer(config).serialize_model(module, inputs)


Is inputs used inside _NnapiSerializer?

Yes. We use it to determine the input shape.

iseeyuan · 2020-10-28T00:11:56Z

aten/src/ATen/nnapi/NeuralNetworks.h

+    ANEURALNETWORKS_PREFER_LOW_POWER = 0,
+    ANEURALNETWORKS_PREFER_FAST_SINGLE_ANSWER = 1,
+    ANEURALNETWORKS_PREFER_SUSTAINED_SPEED = 2,
+} PreferenceCode;


It might be useful to guide the build with PreferenceCode. Should it be specified per model? What could I find the usage?

Yeah, I would like to expose this. I haven't figured out a good API for it yet.

iseeyuan · 2020-10-28T00:15:20Z

torch/backends/_nnapi/serializer.py

+        return struct.pack("i" * len(ints), *ints)
+
+
+    ADDER_MAP = {


I'm assuming this map could grow if we are supporting more and more nnapi operations?

iseeyuan · 2020-10-28T00:37:20Z

Awesome work on model serialization and nnapi binding! In addition to the embedded comments, I have some general questions:

Looks like the serializer would convert the torchscript graph to nnapi model (a torch.nn.Module) and script it. Would this model only be used in platforms with nnapi available (Android)?
If we deploy this nnapi model to production, do you estimate any blocker to convert it to bytecode?
Is there a test covering the serialization -> loading -> running, and verify the results are the same with the original torchscript model?

dreiss · 2020-10-28T17:16:58Z

Would this model only be used in platforms with nnapi available (Android)?

Yes. (Though it is possible in theory to build the NNAPI CPU implementation for Linux and run the model directly, which is how I've been testing.)

If we deploy this nnapi model to production, do you estimate any blocker to convert it to bytecode?
I have tested with the lite interpreter. There are some minor blockers, but I think I have appropriate fixes ready.

Is there a test covering the serialization -> loading -> running, and verify the results are the same with the original torchscript model?

Unfortunately, it is currently only possible to run these models on Android. I will push harder on getting a host build of NNAPI so we can automate this.

dhruvbird · 2020-10-29T22:29:41Z

Regarding unit tests: @dreiss Is the NNAPI similator API written by you or available in open source? If so could it be checked in and tests run against that library. One option could be to have a differently code-generated copy of nnapi_wrapper.cpp which just calls into the emulation later methods instead of via the function pointers.

I have reviewed the code related to the general serialization and deserialization of the model in the NNAPI format and it seems fine. I checked if the case statements have some copy-paste issue, and it doesn't seem that they do. I would love to see some test coverage as @iseeyuan mentioned, but I guess there are practical limitations? Specifically, nnapi_wrapper, nnapi_model_loader, codegen, and parts of serializer seem fine.

I haven't reviewed the code which actually translates the call from PyTorch into the NNAPI backend, since it requires some understanding of how the translation takes place, and I'm not certain if I have that understanding yet. This includes most code in serializer, which does the actual translation from pt->nnapi.

dreiss · 2020-10-29T23:03:38Z

Is the NNAPI similator API written by you or available in open source?

No. I have a heavily hacked-up version of the Android source tree that is able to build libneuralnetworks.so on Linux. I've asked the Android team about getting official support for this, but it is quite difficult. I will continue to ask them.

dhruvbird · 2020-10-30T01:40:18Z

torch/backends/_nnapi/prepare.py

+            if fmt == 0:
+                fixed_args.append(args[idx].contiguous())
+            elif fmt == 1:
+                fixed_args.append(args[idx].permute(0,2,3,1).contiguous())


Would it be feasible to comment here saying that fmt == 0 is NCHW (i.e. channels first) and fmt == 1 is NHWC or maybe used a enum for this?

Enum might be tricky since this needs to be TorchScript-compatible, but I can certainly add a comment.

dhruvbird · 2020-10-30T01:48:40Z

torch/backends/_nnapi/prepare.py

+def convert_model_to_nnapi(model, inputs):
+    model = torch.jit.freeze(model)


It's interesting that converting a model to NNAPI requires providing inputs and running the model? Am I reading this correctly?

Seems like it based on https://discuss.pytorch.org/t/any-different-between-model-input-and-model-forward-input/3690

i.e. model(input) ends up calling model.forward(input).

Currently, we only run the model in order to get the number, shape, dtype, and qparams of the outputs. We could also do this without running the model.

dhruvbird · 2020-10-30T01:51:13Z

torch/backends/_nnapi/prepare.py

+            out_mem_fmts: List[int],
+            out_templates: List[torch.Tensor]):
+        super().__init__()
+        self.ser_model = ser_model


@dreiss Okay I think I now understand why you did this. Your serialized model (for NNAPI) is just a Tensor represented as a regular member of the original model so that it will get naturally serialized when the model is serialized to torchscript. Is that right?

dhruvbird · 2020-10-30T01:52:51Z

aten/src/ATen/nnapi/nnapi_bind.cpp

+    TORCH_CHECK(serialized_model_tensor.is_contiguous());
+    c10::ArrayRef<uint8_t> ser_model = {
+      serialized_model_tensor.data_ptr<uint8_t>(),
+      serialized_model_tensor.nbytes()
+    };


Okay this is how it's read in.

dhruvbird · 2020-10-30T02:01:49Z

aten/src/ATen/nnapi/nnapi_model_loader.cpp

+  if (len % 4 == 0) {
+    return len;
+  }
+  return len + 4 - (phys % 4);


Unlikely that this will result in an overflow, but it could if your model has size 4G - 3 bytes or more.

True, but I think we're a long way away from shipping a 4G model on NNAPI.

dhruvbird · 2020-10-30T02:25:22Z

torch/backends/_nnapi/serializer.py

+    def expand_sizes(self, size):
+        return [ s.item() for s in size ]


This method seems to be unused.

Probably left-over from the old Caffe2 implementation.

dhruvbird · 2020-10-30T02:28:07Z

torch/backends/_nnapi/serializer.py

+
+            # Pad with 0 bytes out to a multiple of 4 for alignment.
+            physical_length = ((source_length - 1) | 0x3) + 1
+            padded_data = data + (b"\0" * (physical_length - source_length))


@dreiss Q. Does this assume a certain endianness?

I don't think so. I assume little-endian somewhere else, though.

dhruvbird · 2020-10-30T17:14:11Z

torch/backends/_nnapi/prepare.py

+    @torch.jit.export
+    def __getstate__(self):
+        return self.nnapi_module
+
+    @torch.jit.export
+    def __setstate__(self, nnapi_module):
+        self.training = False
+        self.nnapi_module = nnapi_module
+        self.nnapi_module.init()


Seems like get/set-state is just a way to set arbitrary metadata on a model. It's not clear to me if this method should get/set members that were previously set in the __init__ method, but seems like that is the intent.

#20242

The purpose here is to make sure that init gets called when the model is loaded, without the user having to intervene.

dhruvbird · 2020-10-30T17:17:16Z

torch/backends/_nnapi/serializer.py

+            weight_permutation = (0, 2, 3, 1)
+        elif args.group == in_c:
+            # Depthwise convolution
+            depthwise = True
+            weight_permutation = (1, 2, 3, 0)
+        else:
+            raise Exception("Group convolution not supported yet.")
+
+        # TODO: Transform at load time to share weights with CPU model.
+        nnapi_weight_tensor = weight_tensor.permute(*weight_permutation).contiguous()
+        weight_id = self.add_tensor_operand_for_weight(nnapi_weight_tensor)
+        weight_oper = self.operands[weight_id]


I'm wondering if you could add some helper methods like toNHCW and toNHWC since this seems like a repeating pattern (i.e. permute the dimensions in the Tensor).

dhruvbird · 2020-10-30T17:22:54Z

torch/backends/_nnapi/serializer.py

+        for idx, node in enumerate(model.graph.nodes()):
+            LOG.debug("Processing node #%d: %r", idx, node)
+            self.add_node(node)


Okay this is the main code graph which is iterated over, and the node is a JIT node.

dhruvbird · 2020-10-30T17:24:58Z

torch/backends/_nnapi/serializer.py

+        self_jitval = next(model.graph.inputs())
+        self.add_constant_value(self_jitval, self_jitval.type(), model)
+
+        for input_value, input_tensor in zip(list(model.graph.inputs())[1:], inputs):
+            op_id = self.add_tensor_operand_for_input(input_value, input_tensor)
+            inp_dim_orders.append(self.operands[op_id].dim_order.value)
+
+        for idx, node in enumerate(model.graph.nodes()):
+            LOG.debug("Processing node #%d: %r", idx, node)
+            self.add_node(node)
+
+        retn = model.graph.return_node()
+        assert retn.inputsSize() == 1
+        assert retn.outputsSize() == 0
+        # TODO: Make outputs a local variable? 
+        # TODO: Handle tuple-of-tensor return
+        for idx in range(1):
+            op_id = self.jitval_operand_map[retn.inputsAt(0)]
+            self.outputs.append(op_id)
+            out_dim_orders.append(self.operands[op_id].dim_order.value)
+


Okay so this is where the magic happens (it seems). i.e. once the model is run, somehow the JIT (runtime/interpreter) caches all intermediate state, and makes it available to anyone who wishes to inspect the JIT state -- which this code does and converts it into the custom NNAPI serialized Tensor (well serialized model).

dhruvbird · 2020-10-30T17:29:28Z

This change itself looks good (based on the stuff I understand and was able to review - verified parts of it with OSS documentation, and everything seems to check out). The actual operator conversion to NNAPI operators isn't reviewed for correctness.

A few things you could consider adding are:

When serializing every individual type of entity, put some magic constants so that if some change is made, the deserializer can fail loudly if it expects something different. Currently, it seems like it can fail at a different point in time based on size checks or even worse, if 2 types of structs with the same size change positions, no one will know.
Sprinkle the code with some documentation for key parts which are non-trivial or not super obvious.

iseeyuan

LGTM in general. Is there a guidance on how to convert and deploy an nnapi model with an example? That may help me to run the code and dive a little deeper. I think it would be good for this first version in prototype status.

Summary: This is in prototype status, but pretty functional. There are two major parts. - Model converter. This is a pure Python component that consumes a model in TorchScript format, converts the operations into NNAPI semantics, and serializes the model in a custom format. It then wraps the result in a new TorchScript model that can invoke NNAPI under the hood. - Runtime. This is a TorchBind object that deserializes the model and sends the result to NNAPI. This is fairly simple since the serialized format is basically just a list of NNAPI calls to make, so most of the code is spent on bounds checking. A few notes on the design. - Currently, all tensor sizes need to be fixed, and those fixed sizes are burned directly into the serialized model. This will probably need to change. NNAPI supports variable-sized tensors, but the important hardware backends do not. However, we're seeing use cases crop up where the input size is not known until around the time that the model is loaded (for example, it might depend on the camera aspect ratio). I think the proper fix here is to remove the code in the converter that eagerly calculates the sizes of the intermediate tensors and replace it with a code generator that will generate some TorchScript code that will perform those calculations at model load time. This way, we will be able to support models that have variable-sized inputs while still only showing fixed-sized operands to NNAPI. - The important hardware backends want operands to be in NHWC order, but PyTorch natively represents all tensors and NCHW. The strategy for this is to keep NCHW during most of the conversion process, but track and additional value per operand representing the "dimension order". The dimension order gets propagated through convolutions and pointwise ops. When we're ready to serialize the model, we reorder the dimensions for "channels last" operands to NHWC. Test Plan: Some local testing with FB prod models. I'll need to add some examples and automated tests. Differential Revision: [D24574040](https://our.internmc.facebook.com/intern/diff/D24574040) [ghstack-poisoned]

codecov · 2020-11-05T06:06:15Z

Codecov Report

Merging #46780 into gh/dreiss/77/base will decrease coverage by 0.32%.
The diff coverage is 0.35%.

@@                  Coverage Diff                  @@
##           gh/dreiss/77/base   #46780      +/-   ##
=====================================================
- Coverage              60.85%   60.52%   -0.33%     
=====================================================
  Files                   2751     2756       +5     
  Lines                 254447   255838    +1391     
=====================================================
+ Hits                  154849   154857       +8     
- Misses                 99598   100981    +1383

facebook-github-bot · 2020-11-06T07:13:16Z

@dreiss merged this pull request in 9a9383e.

facebook-github-bot · 2020-11-06T07:13:24Z

@dreiss merged this pull request in 9a9383e.

This was referenced Oct 23, 2020

Add inputsSize to Python IR, like outputsSize #46779

Closed

Add a command-line flag for overriding pthreadpool size #46781

Closed

dhruvbird reviewed Oct 26, 2020

View reviewed changes

aten/src/ATen/nnapi/nnapi_bind.cpp Show resolved Hide resolved

dhruvbird reviewed Oct 26, 2020

View reviewed changes

torch/backends/_nnapi/serializer.py Show resolved Hide resolved

dhruvbird reviewed Oct 26, 2020

View reviewed changes

torch/backends/_nnapi/prepare.py Show resolved Hide resolved

dhruvbird reviewed Oct 26, 2020

View reviewed changes

torch/backends/_nnapi/prepare.py Show resolved Hide resolved

dhruvbird reviewed Oct 26, 2020

View reviewed changes

iseeyuan reviewed Oct 28, 2020

View reviewed changes

dhruvbird reviewed Oct 30, 2020

View reviewed changes

dhruvbird requested review from dhruvbird and iseeyuan October 30, 2020 17:30

facebook-github-bot added the cla signed label Oct 30, 2020

dhruvbird approved these changes Oct 30, 2020

View reviewed changes

iseeyuan approved these changes Nov 2, 2020

View reviewed changes

dreiss mentioned this pull request Nov 4, 2020

Make bundled inputs work with quantized zero inputs #47407

Closed

dreiss added 2 commits November 4, 2020 16:18

facebook-github-bot closed this in 9a9383e Nov 6, 2020

facebook-github-bot added the Merged label Nov 6, 2020

facebook-github-bot deleted the gh/dreiss/77/head branch November 9, 2020 15:17

		return abs(lhs - rhs) <= tolerance * min(lhs, rhs)


		def tensor_size(op_type, dims):



		def serialize_model(module, inputs, config=None):
		return _NnapiSerializer(config).serialize_model(module, inputs)

		def convert_model_to_nnapi(model, inputs):
		model = torch.jit.freeze(model)

		def expand_sizes(self, size):
		return [ s.item() for s in size ]

PyTorch NNAPI integration prototype #46780

PyTorch NNAPI integration prototype #46780

Conversation

dreiss commented Oct 23, 2020 • edited

dr-ci bot commented Oct 23, 2020 • edited

💊 CI failures summary and remediations

Extra GitHub checks: 1 failed

codecov.io: 1 failed

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iseeyuan commented Oct 28, 2020 • edited

dreiss commented Oct 28, 2020

dhruvbird commented Oct 29, 2020

dreiss commented Oct 29, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhruvbird Oct 30, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhruvbird commented Oct 30, 2020

iseeyuan left a comment

Choose a reason for hiding this comment

codecov bot commented Nov 5, 2020

Codecov Report

facebook-github-bot commented Nov 6, 2020

facebook-github-bot commented Nov 6, 2020

dreiss commented Oct 23, 2020 •

edited

dr-ci bot commented Oct 23, 2020 •

edited

iseeyuan commented Oct 28, 2020 •

edited

dhruvbird Oct 30, 2020 •

edited