Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#50: Add architecture detection, and ARMv8 support, to zwo_fixer's PLT hooking functionality #51

Merged
merged 9 commits into from
Apr 3, 2021

Conversation

jgottula
Copy link
Collaborator

@jgottula jgottula commented Jul 5, 2020

I had a sudden realization after thinking about issue #50 for a little while, that the timeouts may actually have been happening due to libzwo_fixer screwing things up, because apparently it was under the very incorrect presumption that everything was detected properly and that it was running on an AMD64 system, and was therefore patching in AMD64 hook code at undoubtedly incorrect/random addresses in the libASICamera2.so ARMv8 library!

So, the first thing that needed correcting was to do proper architecture detection. Previously there was fairly rigorous detection of the ASI SDK version; however, nothing actually stopped the library from blissfully doing completely wrong things if it was running on a non-AMD64 machine. So now, there are preprocessor checks to categorize the machine the library is being compiled on, and either allow compilation to continue, or annoy the user with an error/warning message telling them that they're attempting to build for an architecture that doesn't actually have the necessary code support built in (including the actual offsets in the relevant platform's variation of the libASICamera2.so binary).

After that, I took a look at how the PLT is handled by the runtime dynamic linker on AArch64 vs how it's done on AMD64, and it's actually pretty similar. And given that it was trivial to locate the necessary offsets in the ARMv8 libASICamera2.so binary, I figured I would just go ahead and implement some multi-arch support into the library. So now it supports both AMD64 and ARMv8. (But not i386, ARMv5, ARMv6, ARMv7, or the Mac or Windows versions. The former four could probably be done pretty easily if the desire ever arose.)

Making the code simpler / more unified / more architecture-generic steered me toward altering the way I had been doing the PLT hooks. Previously I was doing a frankly somewhat boneheaded thing where I wrote a custom AMD64 asm stub over the PLT entry itself. I could have taken the same approach for ARMv8, but I'd have to do a separate asm implementation, and the ARM ISA instruction encoding is a gigantic pain in the ass (particularly because something ostensibly easy like "hey let's just load up a 64-bit immediate value into a register and then jump to it" requires like 7 steps because you can't fit very many bits into a fixed-size 32-bit instruction word). And there were other downsides too: the way I was doing it required me to flip the page protection flags on the memory whenever the hook was enabled or disabled, because the PLT stub itself is a code section with R-X permissions, and modifying the code requires RW-. (Oh and on ARM, unlike x86, doing self-modifying-code-type-stuff doesn't "just work", you have to actually explicitly make sure the L1 I$ and D$ synchronize properly and so forth.) Plus, I was doing some questionable things anyway like using memcpy to overwrite code, which is potentially problematic because there's no guarantee that memcpy isn't going to copy one byte at a time and temporarily put the code into a partially-overwritten state.

So now, I just take advantage of the fact that the PLT entries are already set to just do an indirect jump via their GOT PLT slot (which is in a data section that is already RW-), and I literally just do a single 64-bit atomic exchange operation to overwrite the pointer that's used for the indirect jump. I probably should have been doing this originally, but I think there were plausible reasons why I didn't want to do it. In any case, I think the new way is much simpler and less dumb overall.

@jgottula jgottula requested a review from bgottula July 5, 2020 02:46
@jgottula jgottula added the enhancement New feature or request label Jul 5, 2020
@jgottula
Copy link
Collaborator Author

jgottula commented Jul 5, 2020

@bgottula Make sure to give this a test run for me to make sure I didn't cause any runtime regressions. I did a basic "run the compiler on it" sort of "test", but that's about as far as I could go, given my lack of camera hardware here. (I suppose I could have run a stub program that loads libASICamera2.so but doesn't do actual camera operations, just to see if the fixer library inits properly and doesn't crash it, but meh.)

(Also, though I do have GCC ARM cross-compilers handy, I don't know of any convenient hardware off the top of my head to test run it on, so I couldn't really actually test that the ARMv8 stuff works properly either.)

@jgottula
Copy link
Collaborator Author

jgottula commented Jul 5, 2020

🦾 <-- Hey look, it's an ARM.

@bgottula
Copy link
Collaborator

bgottula commented Jul 5, 2020

Didn't the OP on #49 want arm7 rather than arm8 support?

@jgottula
Copy link
Collaborator Author

jgottula commented Jul 5, 2020 via email

@jgottula
Copy link
Collaborator Author

jgottula commented Jul 5, 2020 via email

@VigibotDev
Copy link

I can test the armv7 module on a PI 4 if you want.

@bgottula
Copy link
Collaborator

bgottula commented Jul 7, 2020

@serveurperso that would be great.

@bgottula
Copy link
Collaborator

@jgottula what do you want to do with this one?

@serveurperso do you ever plan to test this, or did @jgottula do all this work for nothing?

@jgottula
Copy link
Collaborator Author

@jgottula what do you want to do with this one?

Actually yesterday, among other things, I was looking into expanding the apt package support to include the various ARM binaries.

Didn't get around to it just yet because it was lower priority; but it is on my radar.

@bgottula
Copy link
Collaborator

Okay. Well I don't have any current plans to use ARM so I'll leave it up to you. I guess I'll leave this open then?

@jgottula
Copy link
Collaborator Author

Rebased ARMv7 and ARMv8 support onto SDK version 1.16.3.

Comment on lines +57 to 75
ifeq ($(PLATFORM), armv5)
# CC = arm-none-linux-gnueabi-g++
CFLAGS += -march=armv5
endif

ifeq ($(PLATFORM), armv6)
# CC = arm-linux-gnueabi-g++
CFLAGS += -march=armv6
endif

ifeq ($(PLATFORM), armv7)
# CC = arm-linux-gnueabihf-g++
CFLAGS += -march=armv7 -mcpu=cortex-m3 -mthumb
endif

ifeq ($(PLATFORM), armv8)
# CC = aarch64-linux-gnu-g++
endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on with the commented-out CC lines?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, those are based (slightly loosely) on what's in the original demo makefile here: https://github.com/seeing-things/zwo/blob/master/1.16.3/linux_sdk/demo/Makefile#L53-L85

In principle they would be necessary if you were, for example, cross-compiling the capture program on your Intel box for your ARM box with your gcc cross-compiler for ARM. Because in that situation, the main/non-prefixed gcc is the one that builds binaries targeted for your Intel box. And the prefixed version is the one that builds binaries targeted for the ARM box.

When building for the ARM box on the ARM box itself, you don't actually really need the toolchain prefix, because the main/non-prefixed gcc in that situation would be the one targeting the ARM box itself.

(Fun exercise: I bet that on any of your AMD64 boxes, you can run regular gcc via just gcc but also via x86_64-pc-linux-gnu-gcc or similar. But because the host and target arch are the same, you normally just use gcc to build stuff.)

So... yeah, these prefixes are only really needed for cross-compiling... but somebody might want them... I dunno... in any event, I halfheartedly left them in, in commented-out form, as a sort of "whatever" compromise.

@jgottula jgottula force-pushed the add_arm_support branch 2 times, most recently from d92d3c5 to e6ba4c2 Compare March 31, 2021 07:31
@bgottula
Copy link
Collaborator

bgottula commented Apr 1, 2021

I can build from this branch, but the program crashes:

(venv) rgottula@nix:~/dev/zwo/capture$ make
make -C ../zwo_fixer all
make[1]: Entering directory '/home/rgottula/dev/zwo/zwo_fixer'
g++ -std=c++17 -fno-exceptions -fno-strict-aliasing -D_GLIBCXX_USE_CXX11_ABI=0 -fdiagnostics-color=always -Wall -Wno-unused-variable -Wno-unused-function -O1 -fuse-linker-plugin -fvisibility=hidden -fvisibility-inlines-hidden -g3 -gdwarf-4 -fvar-tracking-assignments -fno-omit-frame-pointer -fuse-ld=gold -shared -fPIC -fno-plt -fno-gnu-unique -rdynamic -Wl,--no-gc-sections -Wl,--no-undefined -Wl,-z,defs -lbsd -ldl -lelf -lusb-1.0 -o libzwo_fixer.so zwo_fixer.cpp
make[1]: Leaving directory '/home/rgottula/dev/zwo/zwo_fixer'
mkdir -p bin
g++ src/Frame.cpp src/SERFile.cpp src/agc.cpp src/disk.cpp src/preview.cpp src/camera.cpp src/capture.cpp -o bin/capture -std=c++17 -Wall -g -D_LIN -D_GNU_SOURCE -I./include -pthread -DGLIBC_20 -O3 -m64 -lrt -lbsd -lopencv_core -lopencv_highgui -lopencv_imgproc -I/usr/include/opencv4 -I../1.16.3/linux_sdk/include -L../1.16.3/linux_sdk/lib/x64 -l:libASICamera2.so.1.16.3 -I../zwo_fixer -Wl,-rpath,../zwo_fixer -Wl,-rpath,/home/rgottula/dev/zwo/zwo_fixer -L../zwo_fixer -l:libzwo_fixer.so
mkdir -p bin
g++ write_benchmark.cpp -o bin/write_benchmark -std=c++17 -Wall -g -D_LIN -D_GNU_SOURCE -I./include -pthread -DGLIBC_20 -O3 -m64 -lrt -lbsd
(venv) rgottula@nix:~/dev/zwo/capture$ ./bin/capture 
terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
Aborted (core dumped)

git bisect tells me this commit introduced the bug:

8dd8b6d822dcd0ad2e788e513e48d54faab9d035 is the first bad commit
commit 8dd8b6d822dcd0ad2e788e513e48d54faab9d035
Author: Justin Gottula <justin@jgottula.com>
Date:   Sat Jul 4 19:20:53 2020 -0700

    #50: Rewrite PLTHook to be more generic (just overwrite .got.plt slot)

 zwo_fixer/internal.hpp  | 97 ++++++++++++++++++++++---------------------------
 zwo_fixer/zwo_fixer.cpp |  2 +-
 2 files changed, 45 insertions(+), 54 deletions(-)

Here's a stack trace:

(gdb) run
Starting program: /home/rgottula/dev/zwo/capture/bin/capture 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff6caa859 in __GI_abort () at abort.c:79
#2  0x00007ffff70a2951 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff70ae47c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff70ae4e7 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff70ae799 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff70a53be in std::__throw_out_of_range(char const*) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007ffff71e93a7 in std::__detail::_Map_base<std::string, std::pair<std::string const, unsigned long>, std::allocator<std::pair<std::string const, unsigned long> >, std::__detail::_Select1st, std::equal_to<std::string>, std::hash<std::string>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true>, true>::at (Python Exception <class 'gdb.error'> No type named class std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Rep.: 
__k=, this=0x7ffff71ee660 <g_Offsets_v1_16_3>) at internal.hpp:337
#8  std::unordered_map<std::string, unsigned long, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, unsigned long> > >::at (Python Exception <class 'gdb.error'> No type named class std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Rep.: 
__k=, this=0x7ffff71ee660 <g_Offsets_v1_16_3>) at /usr/include/c++/9/bits/unordered_map.h:1006
#9  GetAddr (Python Exception <class 'gdb.error'> No type named class std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Rep.: 
name=) at internal.hpp:348
#10 0x00007ffff71ea3b3 in __static_initialization_and_destruction_0 (__initialize_p=__initialize_p@entry=1, 
    __priority=__priority@entry=65535) at /usr/include/c++/9/ext/new_allocator.h:80
#11 0x00007ffff71ea9e4 in _GLOBAL__sub_I_zwo_fixer.cpp(void) () at zwo_fixer.cpp:96
#12 0x00007ffff7fe0b8a in ?? () from /lib64/ld-linux-x86-64.so.2
#13 0x00007ffff7fe0c91 in ?? () from /lib64/ld-linux-x86-64.so.2
#14 0x00007ffff7fd013a in ?? () from /lib64/ld-linux-x86-64.so.2
#15 0x0000000000000001 in ?? ()
#16 0x00007fffffffe669 in ?? ()
#17 0x0000000000000000 in ?? ()

@jgottula
Copy link
Collaborator Author

jgottula commented Apr 1, 2021

Oh I know what went wrong. This is one of those cases where git's fancy automatic merging did me a disservice by assuming some things were unrelated when they actually are interrelated and should have been a conflict of sorts. Will fix it up.

ARM Thumb mode is the most evil thing ever conceived from a reverse
engineering point of view. Not only do you (and your disassembler) never
have a goddamn clue when the code is in Thumb mode or not (leading to
confusion and wrongly-disassembled instructions); but then if you want
to hack around with this code and actually interface with it, good luck
with that, because probably half the stuff will be in regular ARM ISA
and the other half will be in Thumb and it's just a giant inconsistent
mess.

You have no idea how happy I am that ARM made the highly intelligent
decision to NOT support the Thumb instruction set(s) in AArch64.
Dropping that shit was the best decision they ever made.
This Makefile is frankly an abomination. It's still originally based on
the crusty ugly Makefile provided with the demo programs in the ASK SDK.

The necessity to manually adjustment of the PLATFORM variable in the
Makefile is idiotic and really should not need to be there, except maybe
in cases of cross-compilation.
Use strlcpy instead of strncpy; and ensure full buffer zeroing.
Whoever invented the SER header was an idiot and decided that all the 4-
and 8-byte fields should be 2-byte-aligned. Do these people understand
how computers work? Do they think memory buses are magic? Sigh...

This solution to the problem is nifty because it uses std::tie stuff.
But ultimately it ended up dumber than intended, because it didn't end
up being possible to do std::tie directly to the struct members for
exactly the same reason that taking pointers to them generated warnings:
they aren't aligned. So the troublesome part ended up just needing an
intermediate temporary assignment anyway, and the niftiness factor was
sadly substantially decreased.
@jgottula
Copy link
Collaborator Author

jgottula commented Apr 1, 2021

Force-pushed a fix.

@bgottula
Copy link
Collaborator

bgottula commented Apr 1, 2021

Confirm, that fixed it.

@bgottula
Copy link
Collaborator

bgottula commented Apr 1, 2021

@jgottula approved, but might make sense to wait to merge until we get more results from @vic4hub

@bgottula bgottula merged commit cf35f26 into master Apr 3, 2021
@bgottula bgottula deleted the add_arm_support branch April 3, 2021 05:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants