-
Notifications
You must be signed in to change notification settings - Fork 685
Summary: Initial CMSS-NN Add Op #13296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13296
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 7262dda with merge base dfc387b ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
23666ac
to
b166a2e
Compare
backends/cortex_m/ops/op_add.cpp
Outdated
ET_LOG(Info, "Input1 dtype: %d, Input2 dtype: %d, Output dtype: %d", | ||
static_cast<int>(input1_dtype), static_cast<int>(input2_dtype), static_cast<int>(out_dtype)); | ||
|
||
// Stub for now |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
extern "C" { | ||
#include "Include/arm_nnfunctions.h" | ||
} |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
Tensor& out) { | ||
ET_LOG(Info, "add_Tensor kernel called"); | ||
|
||
// Ensure input is char type |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
"other.scalar_type() %" PRId8 " is not char type", | ||
static_cast<int8_t>(other.scalar_type())); | ||
|
||
// Stub for now |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
|
||
// 🔧 FIX: Use template types that ExecutorTorch definitely provides | ||
// Use to<int64_t>() and to<double>() which are commonly instantiated | ||
int64_t zp1 = input1_zero_point.to<int64_t>(); |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
ET_LOG(Error, "quantized_add_out: arm_elementwise_add_s8 failed with status [%d]", status); | ||
std::memset(out.mutable_data_ptr<int8_t>(), 0, out.nbytes()); | ||
} else { | ||
ET_LOG(Info, "quantized_add_out: Successfully completed with AoT-computed parameters! 🎯"); |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
ET_LOG(Info, "quantized_add: input1_int8.sizes() = %zu", input1_int8.sizes().size()); | ||
return const_cast<Tensor&>(input1_int8); // to make compiler happy |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
backends/cortex_m/ops/op_add.cpp
Outdated
@@ -0,0 +1,44 @@ | |||
/* |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
backends/cortex_m/ops/operators.py
Outdated
# Call the backend kernel that writes into 'out' | ||
return exir_ops.edge.cortex_m.add.out(self, other, alpha, out) |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
backends/cortex_m/ops/operators.py
Outdated
output_multiplier: int, | ||
output_shift: int, | ||
) -> torch.Tensor: | ||
return torch.empty_like(self, dtype=torch.int8) |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as duplicate.
This comment was marked as duplicate.
Sorry, something went wrong.
This comment was marked as duplicate.
This comment was marked as duplicate.
Sorry, something went wrong.
This comment was marked as duplicate.
This comment was marked as duplicate.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
backends/cortex_m/ops/operators.py
Outdated
# For now, convert back to float, add, and quantize (as placeholder) | ||
|
||
# Dequantize inputs using multiplier/shift | ||
self_fp = (self.float() - self_zero_point) * (self_multiplier / (1 << (31 - self_shift))) |
This comment was marked as duplicate.
This comment was marked as duplicate.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
backends/cortex_m/ops/operators.py
Outdated
out: torch.Tensor, | ||
) -> torch.Tensor: | ||
# Validate shape compatibility if needed | ||
assert out.shape == self.shape, "Output shape must match input shape" |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
backends/cortex_m/ops/operators.yaml
Outdated
kernels: | ||
- arg_meta: null | ||
kernel_name: cortex_m::dequantize_per_tensor_out | ||
- func: aten::add.Tensor(Tensor input1, Tensor input2, *, Tensor(a!) out) -> Tensor(a!) |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
- arg_meta: null | ||
kernel_name: cortex_m::add_out | ||
|
||
- func: cortex_m::quantized_add(Tensor self, Scalar self_zero_point, Scalar self_multiplier, Scalar self_shift, Tensor other, Scalar other_zero_point, Scalar other_multiplier, Scalar other_shift, Scalar output_zero_point, Scalar output_multiplier, Scalar output_shift) -> Tensor |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
import torch | ||
import math | ||
|
||
class QuantizedAddFusionPass(ExportPass): |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
and dequant_node2.target == exir_ops.edge.cortex_m.dequantize_per_tensor.default): | ||
continue | ||
|
||
print("✅ Found complete cortex_m Q/DQ + add pattern!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we simplify this by using a subgreph_rewriter
? I am thinking of this as a common pattern for most of these ops.
see - https://docs.pytorch.org/executorch/stable/compiler-custom-compiler-passes.html#level-2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I am looking to replace this with subgraph_rewriter. I haven't successfully managed to do it so far. Maybe a follow up PR ,if you are ok with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All CMSIS-NN ops are quantized so adding one pass per op for fusing them in this way will result in a lot of duplicate code. In the arm backend we "fold" all q/dq-ops into quantized ops like this: https://github.com/pytorch/executorch/blob/main/backends/arm/_passes/fold_qdq_with_annotated_qparams_pass.py.
Could we reuse that here? And then have one single pass for mapping edge ops with correct dtypes to arm ops perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Let me look into this pass and see if I can reuse it for this purpose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the fold_qdq pass, I think it fundamentally serves a diff purpose, afaict, the pass only removes Q/DQ nodes and stores quantization parameters as metadata etc.. The current cortex_m pass on the other hand, replaces the operation itself with custom CMSIS-NN operations (like cortex_m::quantized_add.out) and computes CMSIS-NN specific multipliers and shifts etc..
Moroever with the refactored changes (latest version) , quantized_op_fusion_pass.py is already properly extensible - we can easily add more operations by extending the SUPPORTED_OPS_MAPPING dictionary without creating separate passes for each operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@digantdesai Here is the issue to track subgraph_rewriter : #13627
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvement with the generalized pass! FWIW this is closer to what we started out with in the Arm backend and we moved to folding the Q/DQ ops separately from all other lowering logic to simplify development as the backend grew more complex. This backend is smaller though so it might not be as big of an issue, we'll see.
|
||
# Step 2: Apply fusion pass | ||
fusion_pass = QuantizedAddFusionPass() | ||
final_program = intermediate_program.transform([fusion_pass]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can give these as a list no need to call transform twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
# Check fusion occurred | ||
check_count(transformed_graph, exir_ops.edge.cortex_m.quantized_add.default, 1) | ||
|
||
# Verify numerical equivalence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these checks commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok the test case was doing wrong thing previously ,in the sense that it was comparing f32 output with quantized int8 output. I fixed this 'numerical equivalence' in the latest iteration and it passes as expected
examples/arm/aot_arm_compiler.py
Outdated
gm = edge.exported_program().graph_module | ||
|
||
logging.debug(">>> Lowered GraphModule code <<<") | ||
logging.debug(gm.code) # Python‐style source of the graph | ||
logging.debug(">>> Lowered GraphModule nodes <<<") | ||
for node in gm.graph.nodes: | ||
logging.debug(f"Node: {node.target}, args={node.args}, kwargs={node.kwargs}") | ||
logging.debug("==== Graph after quantization ====") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just left it, as it proved very useful while debugging this pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally we do not keep debugging code around upstream unless it is not overly verbose and there is a really good reason for it so I would prefer to see this removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s
examples/arm/run.sh
Outdated
"--delegate --quantize" # 4 qadd2 | ||
"--delegate --quantize" # 5 qops | ||
"--delegate --quantize" # 6 mv2 | ||
"--quantize" # 5 qadd2 (quantize only) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this test added here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case qadd2 (no delegation , only quantize) serves as a good e2e test case on FVP sim to validate / test the fused quantized node flow (with CMSIS-NN integration )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that we are testing this but adding them to the default list of models to run in run.sh will scale very poorly as we add more ops so I do not think it is the best place for it. Could it be added as a testsuite in https://github.com/pytorch/executorch/blob/main/backends/arm/test/test_arm_baremetal.sh perhaps (or even have a separate test_cortex_m.sh in this backend)?
examples/arm/aot_arm_compiler.py
Outdated
edge = edge.transform([ReplaceQuantNodesPass()]) | ||
# Instantiate the pass | ||
replace_quant_pass = ReplaceQuantNodesPass() | ||
quantized_add_fusion_pass = QuantizedAddFusionPass() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not in this PR but the Cortex-M backend will need a more structured way of running passes soon
and dequant_node2.target == exir_ops.edge.cortex_m.dequantize_per_tensor.default): | ||
continue | ||
|
||
print("✅ Found complete cortex_m Q/DQ + add pattern!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All CMSIS-NN ops are quantized so adding one pass per op for fusing them in this way will result in a lot of duplicate code. In the arm backend we "fold" all q/dq-ops into quantized ops like this: https://github.com/pytorch/executorch/blob/main/backends/arm/_passes/fold_qdq_with_annotated_qparams_pass.py.
Could we reuse that here? And then have one single pass for mapping edge ops with correct dtypes to arm ops perhaps?
) | ||
|
||
# Link directly to the CMSIS-NN static library file | ||
target_link_libraries( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it done this way instead of directly adding the library target?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CMSIS-NN library is linked by specifying the static library file path directly because CMSIS-NN is brought in via FetchContent and does not define a CMake target in the build.
using Error = executorch::runtime::Error; | ||
|
||
// Basic tensor type / layout validation and dimension order checking | ||
inline void validate_quantized_tensor_types_and_dim_order( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most ops will require channels-last dim_order and there are multiple dtypes available so this name might be confusing in the future.
examples/arm/run.sh
Outdated
"--delegate --quantize" # 4 qadd2 | ||
"--delegate --quantize" # 5 qops | ||
"--delegate --quantize" # 6 mv2 | ||
"--quantize" # 5 qadd2 (quantize only) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that we are testing this but adding them to the default list of models to run in run.sh will scale very poorly as we add more ops so I do not think it is the best place for it. Could it be added as a testsuite in https://github.com/pytorch/executorch/blob/main/backends/arm/test/test_arm_baremetal.sh perhaps (or even have a separate test_cortex_m.sh in this backend)?
examples/arm/aot_arm_compiler.py
Outdated
gm = edge.exported_program().graph_module | ||
|
||
logging.debug(">>> Lowered GraphModule code <<<") | ||
logging.debug(gm.code) # Python‐style source of the graph | ||
logging.debug(">>> Lowered GraphModule nodes <<<") | ||
for node in gm.graph.nodes: | ||
logging.debug(f"Node: {node.target}, args={node.args}, kwargs={node.kwargs}") | ||
logging.debug("==== Graph after quantization ====") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally we do not keep debugging code around upstream unless it is not overly verbose and there is a really good reason for it so I would prefer to see this removed.
and dequant_node2.target == exir_ops.edge.cortex_m.dequantize_per_tensor.default): | ||
continue | ||
|
||
print("✅ Found complete cortex_m Q/DQ + add pattern!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvement with the generalized pass! FWIW this is closer to what we started out with in the Arm backend and we moved to folding the Q/DQ ops separately from all other lowering logic to simplify development as the backend grew more complex. This backend is smaller though so it might not be as big of an issue, we'll see.
output_shift_val); | ||
|
||
// Call CMSIS-NN kernel with precomputed parameters | ||
arm_cmsis_nn_status status = arm_elementwise_add_s8( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could potentially be added as a pass as well: https://github.com/pytorch/executorch/blob/main/backends/arm/_passes/broadcast_args_pass.py. But long term the ideal solution would be to add broadcast support to CMSIS-NN to get it accelerated w/o memcopies.
inline void validate_quantization_params( | ||
const Scalar& zero_point1, | ||
const Scalar& multiplier1, | ||
const Scalar& shift1, | ||
const Scalar& zero_point2, | ||
const Scalar& multiplier2, | ||
const Scalar& shift2, | ||
const Scalar& output_zero_point, | ||
const Scalar& output_multiplier, | ||
const Scalar& output_shift, | ||
Tensor& output) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it a util to check a single quant params
inline void validate_quantization_params( | |
const Scalar& zero_point1, | |
const Scalar& multiplier1, | |
const Scalar& shift1, | |
const Scalar& zero_point2, | |
const Scalar& multiplier2, | |
const Scalar& shift2, | |
const Scalar& output_zero_point, | |
const Scalar& output_multiplier, | |
const Scalar& output_shift, | |
Tensor& output) { | |
inline void validate_quantization_params( | |
const Scalar& zero_point, | |
const Scalar& multiplier, | |
const Scalar& shift) { |
out_shift_val); | ||
} | ||
|
||
inline Error resize_to_broadcast_target_size_quantized( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why quantization in the name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
// Initialize shapes with 1s for padding | ||
for (int i = 0; i < max_dim; i++) { | ||
inp1_shape[i] = 1; | ||
inp2_shape[i] = 1; | ||
out_shape[i] = 1; | ||
} | ||
|
||
int offset_inp1 = max_dim - input1.dim(); | ||
int offset_inp2 = max_dim - input2.dim(); | ||
int offset_out = max_dim - output.dim(); | ||
|
||
for (int i = 0; i < input1.dim(); i++) { | ||
inp1_shape[i + offset_inp1] = input1.size(i); | ||
} | ||
for (int i = 0; i < input2.dim(); i++) { | ||
inp2_shape[i + offset_inp2] = input2.size(i); | ||
} | ||
for (int i = 0; i < output.dim(); i++) { | ||
out_shape[i + offset_out] = output.size(i); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need all this logic if we are going to call the get_broadcast_target_size
util?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed anymore, removed it
// In the pass we are calling quantized_add's default variant | ||
// but ExecuTorch's kernel dispatch mechanism will end up calling the out | ||
// variant. This stub is to make sure that compiler doesn't complain. | ||
Tensor quantized_add( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this?
Error, | ||
"quantized_add_out: arm_elementwise_add_s8 failed with status [%d]", | ||
status); | ||
std::memset(out.mutable_data_ptr<int8_t>(), 0, out.nbytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this?
Set a failure flag with the appropriate Error
in ctx? ctx.fail(err);
, so runtime doesn't keep on going with wrong data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @psiddh, left some nit comments.
# Copy essential fields | ||
for field in ["val", "tensor_meta", "stack_trace"]: | ||
if field in source_node.meta: | ||
new_node.meta[field] = source_node.meta[field] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't val
here be fp32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the val field in node metadata is fp32 from the original unquantized computation. This was done for debugging. In the latest iter, removed it
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run code formatting?
examples/arm/aot_arm_compiler.py
Outdated
logging.debug(">>> Lowered GraphModule code <<<") | ||
logging.debug(gm.code) # Python‐style source of the graph | ||
logging.debug(">>> Lowered GraphModule nodes <<<") | ||
for node in gm.graph.nodes: | ||
logging.debug(f"Node: {node.target}, args={node.args}, kwargs={node.kwargs}") | ||
logging.debug("==== Graph after quantization ====") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you convert this instructions to CI? Could be a follow-up PR.
I believe we should be able to run on on our AWS graviton instances.
Looks like there is test_arm_baremetal.sh running, either extend that or create new tests to exercise the code you just added please.
Test Plan:
examples/arm/run.sh - No regressions ==> Ok
examples/arm/run.sh now runs 'qadd2' in quantize only mode ==> Ok
python -m unittest test_replace_quant_nodes.py ==> Ok
python -m unittest test_quantize_op_fusion_pass.py ==> Ok
# Add dependency to ensure CMSIS-NN builds before we try to link. Use the | ||
# actual CMSIS-NN target name (usually 'cmsis-nn') | ||
add_dependencies(cortex_m_kernels cmsis-nn) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no C++ testing?
if(CORTEX_M_BUILD_TESTS)
enable_testing()
add_subdirectory(tests)
endif()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mergennachin Created task to track dedicated test suite for cortex_m ops (that should cover Python tests, C++ unit tests & E2E tests): #13739. (and the initial work on this: PR : #1357)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI we can't run cortex-m op unittests without FVP. but for c++ only utils, sure.
target_link_libraries(cortex_m_kernels PRIVATE executorch) | ||
target_compile_options(cortex_m_kernels PUBLIC ${_common_compile_options}) | ||
|
||
# Include directories for cortex_m_kernels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
besides _common_compile_options
don't you also need cortex-m specific compile options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comes from the toolchain file.
Three follow up issues identified to make further improvements:
|
d9b48d3
to
5768f06
Compare
Test Plan: a) Setup for Arm FVP and run 'examples/arm/run.sh' (Check no regressions in e2e test scenarios) b) Then add to run.sh another iteration with qadd with only --quantize flag and see that quantized add op is called c) cd backends/cortex_m/test/; python test_quantize_op_fusion_pass.py ---------------------------------------------------------------------- Ran 9 tests in 11.128s OK Reviewers: Subscribers: Tasks: Tags:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Summary
[PLEASE REMOVE] See CONTRIBUTING.md's Pull Requests for ExecuTorch PR guidelines.
[PLEASE REMOVE] If this PR closes an issue, please add a
Fixes #<issue-id>
line.[PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out CONTRIBUTING.md's Pull Requests.
Test plan
[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218