Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coroutine/arm64/Context.S: Support PAC/BTI #9306

Merged
merged 4 commits into from Dec 22, 2023

Conversation

kateinoigakukun
Copy link
Member

@kateinoigakukun kateinoigakukun marked this pull request as ready for review December 20, 2023 17:42
@ioquatix
Copy link
Member

My understanding is that this is insufficient.

What you are asking for, is for a shadow stack for call/return/jump instructions, but I don't think this metadata solves that problem. IIUC, this metadata change indicates that the code implements BTI/PAC without actually implementing it in the coroutine. Feel free to correct me if I'm wrong.

@kateinoigakukun
Copy link
Member Author

Ah, good point. I thought that this is a leaf function and we don't need to care about the protection stuff. But on second thought, the function reads the return address from the shadow stack, so we still need to sign the pointer. Also apparently we also need BTI landing pad. Will push some followup fixes.

@kateinoigakukun
Copy link
Member Author

kateinoigakukun commented Dec 21, 2023

@ioquatix Just curious: why x30 (LR) is saved twice in (str x30, [sp, 0xa0] and stp x29, x30, [sp, 0x90])? IIUC we can skip the second store/load and use ret instead of ret x4.

If we stick to use the return address stored in [sp, 0xa0], we need some extra instructions like:

-       # Load return address into x4
-       ldr x4, [sp, 0xa0]
+       # Load return address into x17
+       ldr x17, [sp, 0xa0]

        # Pop stack frame
        add sp, sp, 0xb0
+       mov x16, sp
+
+       autia1716

-       # Jump to return address (in x4)
-       ret x4
+       # Jump to return address (in x17)
+       ret x17

If we can just use [sp, 0x98] return register loaded to x30, we can simplify it much more:

        # Load return address into x4
-       ldr x4, [sp, 0xa0]

        # Pop stack frame
        add sp, sp, 0xb0

+       autiasp

-       # Jump to return address (in x4)
-       ret x4
+       # Jump to return address (in x30)
+       ret

Also for the case if the target doesn't need ISA backward compatibility, we can combine autiasp and ret into retaa.

@kateinoigakukun kateinoigakukun force-pushed the katei/arm64-pac-bti branch 2 times, most recently from 4918ee0 to d73ec62 Compare December 21, 2023 06:53
@ggardet
Copy link

ggardet commented Dec 21, 2023

I tested current patch, but unfortunately, the problem persists.

@kateinoigakukun
Copy link
Member Author

kateinoigakukun commented Dec 21, 2023

@ggardet Which configure option did you use? I tested ./configure cppflags=-mbranch-protection=standard cflags=-mbranch-protection=standard ASFLAGS=-mbranch-protection=standard LDFLAGS="-mbranch-protection=standard -z force-bti" on
aarch64 openSUSE Tumbleweed with GCC 13.


Side note: The configure command I proposed uses cppflags to append the option. It works but it's semantically wrong. This is because configure automatically append -mbranch-protection=pac-ret after user-provided CFLAGS. Ideally we can change configure to respect user-provided CFLAGS by prepending auto-guessed options before them. But we have a new version release soon and we have a workaround by cppflags, so I don't want to change build stuff at this moment.

@ggardet
Copy link

ggardet commented Dec 21, 2023

Here is my configure command:

[   12s] + export 'CFLAGS=-mbranch-protection=standard -O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -g -fno-strict-aliasing -std=gnu99'
[   12s] + CFLAGS='-mbranch-protection=standard -O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -g -fno-strict-aliasing -std=gnu99'
[   12s] + export 'LDFLAGS= -z force-bti '
[   12s] + LDFLAGS=' -z force-bti '
[   12s] + CONFIG_SHELL=/usr/bin/bash
[   12s] + export CONFIG_SHELL
[   12s] + CFLAGS='-mbranch-protection=standard -O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -g -fno-strict-aliasing -std=gnu99'
[   12s] + export CFLAGS
[   12s] + CXXFLAGS='-mbranch-protection=standard -O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -g'
[   12s] + export CXXFLAGS
[   12s] + FFLAGS='-mbranch-protection=standard -O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -g '
[   12s] + export FFLAGS
[   12s] + FCFLAGS='-mbranch-protection=standard -O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -g '
[   12s] + export FCFLAGS
[   12s] + LDFLAGS=' -z force-bti '
[   12s] + export LDFLAGS
[   12s] + ./configure --host=aarch64-suse-linux-gnu --build=aarch64-suse-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-baseruby=/usr/bin/ruby --enable-pie --with-compress-debug-sections=none --program-suffix=.ruby3.2 --with-soname=ruby3.2 --target=aarch64-suse-linux --with-valgrind --with-mantype=man --enable-shared --disable-static --disable-rpath

I added -z force-bti to LDFLAGS to get the BTI warning and this should not be used for production/release.
And yes, this is on aarch64 openSUSE Tumbleweed with GCC 13.

EDIT: -mbranch-protection=pac-ret added by configure script may overwrite -mbranch-protection=standard which would enable only PAC and not PAC+BTI.

@kateinoigakukun
Copy link
Member Author

@ggardet

EDIT: -mbranch-protection=pac-ret added by configure script may overwrite -mbranch-protection=standard which would enable only PAC and not PAC+BTI.

Right, that's why I added cppflags=-mbranch-protection=standard

@kateinoigakukun
Copy link
Member Author

@ggardet Also please note that I added ASFLAGS=-mbranch-protection=standard too

@ggardet
Copy link

ggardet commented Dec 21, 2023

@ggardet

EDIT: -mbranch-protection=pac-ret added by configure script may overwrite -mbranch-protection=standard which would enable only PAC and not PAC+BTI.

Right, that's why I added cppflags=-mbranch-protection=standard

@ggardet Also please note that I added ASFLAGS=-mbranch-protection=standard too

Ok, that works now.
I still have few warnings, such as:

[  153s] gcc -I. -I.ext/include/aarch64-linux-gnu -I./include -I.   -DONIG_ENC_REGISTER=rb_enc_register -fPIC -O3 -fno-fast-math -ggdb3 -Wall -Wextra -Wdeprecated-declarations -Wdiv-by-zero -Wduplicated-cond -Wimplicit-function-declaration -Wimplicit-int -Wmisleading-indentation -Wpointer-arith -Wwrite-strings -Wold-style-definition -Wimplicit-fallthrough=0 -Wmissing-noreturn -Wno-cast-function-type -Wno-constant-logical-operand -Wno-long-long -Wno-missing-field-initializers -Wno-overlength-strings -Wno-packed-bitfield-compat -Wno-parentheses-equality -Wno-self-assign -Wno-tautological-compare -Wno-unused-parameter -Wno-unused-value -Wsuggest-attribute=format -Wsuggest-attribute=noreturn -Wunused-variable -Wundef -O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -g -fno-strict-aliasing -std=gnu99 -fPIC  -o enc/trans/iso2022.o -c ./enc/trans/iso2022.c
[  153s] gcc -shared -o .ext/aarch64-linux-gnu/enc/encdb.so enc/encdb.o -L. -L. -L.  -mbranch-protection=standard -z force-bti  -fstack-protector-strong -rdynamic -Wl,-export-dynamic -Wl,--no-as-needed -mbranch-protection=standard -z force-bti  -Wl,--compress-debug-sections=none    -lruby3.2 -lm -lpthread  
[  153s] gcc -I. -I.ext/include/aarch64-linux-gnu -I./include -I.   -DONIG_ENC_REGISTER=rb_enc_register -fPIC -O3 -fno-fast-math -ggdb3 -Wall -Wextra -Wdeprecated-declarations -Wdiv-by-zero -Wduplicated-cond -Wimplicit-function-declaration -Wimplicit-int -Wmisleading-indentation -Wpointer-arith -Wwrite-strings -Wold-style-definition -Wimplicit-fallthrough=0 -Wmissing-noreturn -Wno-cast-function-type -Wno-constant-logical-operand -Wno-long-long -Wno-missing-field-initializers -Wno-overlength-strings -Wno-packed-bitfield-compat -Wno-parentheses-equality -Wno-self-assign -Wno-tautological-compare -Wno-unused-parameter -Wno-unused-value -Wsuggest-attribute=format -Wsuggest-attribute=noreturn -Wunused-variable -Wundef -O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -g -fno-strict-aliasing -std=gnu99 -fPIC  -o enc/trans/japanese.o -c ./enc/trans/japanese.c
[  153s] /usr/lib64/gcc/aarch64-suse-linux/13/../../../../aarch64-suse-linux/bin/ld: /tmp/ccM1RRpJ.ltrans0.ltrans.o: warning: BTI turned on by -z force-bti when all inputs do not have BTI in NOTE section.
[  153s] /usr/lib64/gcc/aarch64-suse-linux/13/../../../../aarch64-suse-linux/bin/ld: /tmp/ccjJtYJo.debug.temp.o: warning: BTI turned on by -z force-bti when all inputs do not have BTI in NOTE section.

It looks like some gcc command lines are missing the -mbranch-protection=standard.

@kateinoigakukun
Copy link
Member Author

@ggardet I guess you are setting CFLAGS to append -mbranch-protection=standard but I use cflags (lower case version). Could you try with this again?

coroutine/arm64/Context.h Outdated Show resolved Hide resolved
@ioquatix
Copy link
Member

I read the documentation, thanks for the links. I was under the impression we'd need to implement a shadow stack like how it works with CET. However, it seems like this is sufficient. I'm a little cautious about merging this so soon before 3.3.0 - what are your thoughts on trying to get in before the next release?

@ioquatix
Copy link
Member

Just curious: why x30 (LR) is saved twice in (str x30, [sp, 0xa0] and stp x29, x30, [sp, 0x90])? IIUC we can skip the second store/load and use ret instead of ret x4.

Probably because I was not an expert at the assembler and just did my best to get something that looked good enough. Feel free to improve it as you see fit :)

@kateinoigakukun
Copy link
Member Author

@ioquatix

I'm a little cautious about merging this so soon before 3.3.0 - what are your thoughts on trying to get in before the next release?

I think it's relatively low-risk to get this in 3.3.0 release, let's merge soon after some fixup commits.

Probably because I was not an expert at the assembler and just did my best to get something that looked good enough. Feel free to improve it as you see fit :)

Okay, thank you for clarifying!

@kateinoigakukun
Copy link
Member Author

The RJIT workflow failure is unrelated to this change, it's also failing on master. https://github.com/ruby/ruby/actions/runs/7294360546/job/19879066887

We don't need to save/restore x30 twice, and we can just use `ret`,
which uses x30 as return address register instead of explicit `ret <reg>`
instruction. This also allows us to use `autiasp` instead of `autia1716`
and we can skip setting SP/LR to x16/x17.

Also the size of register save area is shrunk by 16 bytes due to the
removal of extra x30 save/restore.
Copy link
Member

@ioquatix ioquatix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kateinoigakukun
Copy link
Member Author

Thanks!

@kateinoigakukun kateinoigakukun merged commit fa0f752 into ruby:master Dec 22, 2023
97 checks passed
@kateinoigakukun kateinoigakukun deleted the katei/arm64-pac-bti branch December 22, 2023 07:36
@ggardet
Copy link

ggardet commented Dec 22, 2023

@ggardet I guess you are setting CFLAGS to append -mbranch-protection=standard but I use cflags (lower case version). Could you try with this again?

Ok, that works now.
But why cflags (lower case version) instead of being consistent across the code and use CFLAGS (upper case) everywhere?

@kateinoigakukun
Copy link
Member Author

CFLAGS and cflags are different variables having different semantics and both are used in Ruby. CFLAGS can be configured by user and configure.ac appends guessed flags. cflags is appended after CFLAGS.

@kateinoigakukun kateinoigakukun changed the title coroutine/arm64/Context.S: Append PAC/BTI note section if needed coroutine/arm64/Context.S: Support PAC/BTI Dec 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants