Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -31,19 +31,16 @@ BEGIN_FUNCTION pytorch_q8dwconv_ukernel_up8x9__aarch32_neon
# - r12 = quantization_params
LDR r12, [sp, 12]

PUSH {r4, r5, r6, r7, r8, r9, r10, r11, lr}
PUSH {r0, r3, r4, r5, r6, r7, r8, r9, r10, r11, lr}
VPUSH {d8-d15}

STR r0, [sp, #-8]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we clobbering d8, unintentionally?
This is done elsewhere too, unfortunately, do you want to take a stab at other kernels too>

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we clobbering d8, unintentionally?

No, the r* registers are pushed and then the d*, all above the SP. The reading offsets for r0 and r3 are adjusted accordingly.

This is done elsewhere too, unfortunately, do you want to take a stab at other kernels too>

We didn't find additional issues in our QNNPACK version, in files such as 8x8-aarch64-neon.S and 8x8-aarch64-neon.S: contrary to STR r0, [sp, #-8] (sp is not modified), instructions such as STP d15, d14, [sp, -16] pre-modify sp. (So values are kept safe over the stack pointer.)

STR r3, [sp, #-4]

# Load the address zero_point array.
# For depth wise kernels the array is of single element.
LDR r5, [r12], 4

# Load o:
# - lr = o = output
LDR lr, [sp, 100]
LDR lr, [sp, 108]

# Load kernel zero point:
# - d31 = vkernel_zero_point
Expand Down Expand Up @@ -90,11 +87,11 @@ BEGIN_FUNCTION pytorch_q8dwconv_ukernel_up8x9__aarch32_neon
0:
# Load input stride
# - r3 = input_stride
LDR r3, [sp, 104]
LDR r3, [sp, 112]

# Load c:
# - r0 = c = channels
LDR r0, [sp, #-8]
LDR r0, [sp, 64]

# Load i0, i1, i2, i3, i4, i5, i6, i7, i8
# - r4 = i0
Expand All @@ -117,7 +114,7 @@ BEGIN_FUNCTION pytorch_q8dwconv_ukernel_up8x9__aarch32_neon

# Load w:
# - r3 = w = weights
LDR r3, [sp, #-4]
LDR r3, [sp, 68]

BLO 2f

Expand Down Expand Up @@ -394,7 +391,7 @@ BEGIN_FUNCTION pytorch_q8dwconv_ukernel_up8x9__aarch32_neon
5:
# Load output increment
# - r3 = output_increment
LDR r3, [sp, 108]
LDR r3, [sp, 116]

# Decrement output width
SUBS r1, r1, 1
Expand All @@ -406,7 +403,7 @@ BEGIN_FUNCTION pytorch_q8dwconv_ukernel_up8x9__aarch32_neon
BNE 0b

VPOP {d8-d15}
POP {r4, r5, r6, r7, r8, r9, r10, r11, pc}
POP {r0, r3, r4, r5, r6, r7, r8, r9, r10, r11, pc}
END_FUNCTION pytorch_q8dwconv_ukernel_up8x9__aarch32_neon

#ifdef __ELF__
Expand Down
Loading