Skip to content

UCX: Add support for RISC-V #9168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 24, 2023
Merged

Conversation

rbradford
Copy link
Contributor

@rbradford rbradford commented Jun 28, 2023

What

Adds support for compiling and running UCX on RISC-V (tested under RV64 QEMU.)

Why ?

Architecture specific code was missing for RISC-V (this adds that) and this
also addresses non RISC-V specific issues (although architecture dependent)
around the ELF symbol address and patching the PLT which are needed for correct
functioning.

How ?

Tested with UCX gtests and UCC gtests (patch required for UCC) on Linux under
RV64 QEMU.

  • UCS/ARCH/RV64: Add RISC-V support
  • UCM: Fix RELOC patching method for shared libraries
  • UCM/UTIL/RELOC: Don't rely on RTLD_NEXT for lookup

@rbradford rbradford marked this pull request as ready for review June 28, 2023 10:37
@rbradford rbradford force-pushed the 20230628-riscv-enabling branch 2 times, most recently from 00da96b to 3ac64f4 Compare June 29, 2023 21:39
@rbradford rbradford marked this pull request as draft June 30, 2023 00:36
@rbradford rbradford force-pushed the 20230628-riscv-enabling branch 2 times, most recently from e9833c9 to cac50b1 Compare June 30, 2023 10:32
@rbradford rbradford marked this pull request as ready for review June 30, 2023 16:41
@rbradford
Copy link
Contributor Author

@yosefe The PR is passing the CI and is ready for review. Please take a look when you can. Thanks!

Comment on lines 368 to 370
if (symtab < (void*)dl_info->start) {
symtab = (char*)symtab + dlpi_addr;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this doing?
maybe add a helper function to reduce code duplication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My colleague @evangreen wrote this patch - he is the best person to respond. IIRC the issue is that the offsets can't just be used as pointers since on RISC-V the binary isn't loaded at 0 in the address space.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, parsing the headers to get the DT_* value doesn't necessarily get you the final VA of that entity if the library wasn't loaded at 0. So we sanity check to see "is this pointer at least beyond the lowest loaded address we found for this image?", and if not, we add the start to compensate for the offset the binary was actually loaded at. Basically a poor man's relocation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but on other architectures the libraries are also not loaded at 0, and this code works.. is there some different with RISC-V here?
In any case, can use UCS_PTR_BYTE_OFFSET macro to add pointer to offset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this fix was to specifically address =reloc mode I would propose that this commit be dropped from this PR and I open another PR specifically about this incorporating some of the feedback and identifying the exact cause of the offset requirement (and why it isn't needed elsewhere.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but on other architectures the libraries are also not loaded at 0, and this code works.. is there some different with RISC-V here? In any case, can use UCS_PTR_BYTE_OFFSET macro to add pointer to offset

@yosefe I was able to identify the root cause of the address vs offset situation which I documented in #9283 and addressed in #9284 taking in the feedback to use UCS_PTR_BYTE_OFFSET and consolidate where the addition of the base address needs to happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove it from this PR then?

dl = dlopen(info.dli_fname, flags);
if (dl != NULL) {
(void)dlerror();
func_ptr = dlsym(dl, symbol);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this changes the behavior: before the change the symbol was searched in the caller library (that could be libucm_cuda.so for example) and now it's always searched in libucm.so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of RTLD_NEXT seems to have caught other folks out in the past (https://optumsoft.com/dangers-of-using-dlsym-with-rtld_next/).

If I look at the order of the libraries in the ucc_gtest binary:

/ # /lib/ld-linux-riscv64-lp64d.so.1 --list bin/ucc_gtest
	linux-vdso.so.1 (0x00007fff8e757000)
	libucc.so.1 => /nix/store/37nj6mzi1n7lgdppk1zp51q07g1cvfi3-ucc-riscv64-unknown-linux-gnu-1.2.0-gdb5711b/lib/libucc.so.1 (0x00007fff8e726000)
	libucs.so.0 => /nix/store/07fvbv49gwvv24lls4nv1hb5zdizrcvd-ucx-riscv64-unknown-linux-gnu-1.16.0-g10c71f9/lib/libucs.so.0 (0x00007fff8e6c6000)
	librt.so.1 => /nix/store/wzxw87zd8195qpvr9qy1j4da2q2glcir-glibc-riscv64-unknown-linux-gnu-2.37-g4c09d4a/lib/librt.so.1 (0x00007fff8e6c3000)
	libstdc++.so.6 => /nix/store/021k38q6dk02cbhd2688853qaxbaf6pg-riscv64-unknown-linux-gnu-stage-final-gcc-debug-12.1.0-g043c545-lib/riscv64-unknown-linu
x-gnu/lib/libstdc++.so.6 (0x00007fff8e4e7000)
	libm.so.6 => /nix/store/wzxw87zd8195qpvr9qy1j4da2q2glcir-glibc-riscv64-unknown-linux-gnu-2.37-g4c09d4a/lib/libm.so.6 (0x00007fff8e477000)
	libgcc_s.so.1 => /nix/store/021k38q6dk02cbhd2688853qaxbaf6pg-riscv64-unknown-linux-gnu-stage-final-gcc-debug-12.1.0-g043c545-lib/riscv64-unknown-linux
-gnu/lib/libgcc_s.so.1 (0x00007fff8e463000)
	libpthread.so.0 => /nix/store/wzxw87zd8195qpvr9qy1j4da2q2glcir-glibc-riscv64-unknown-linux-gnu-2.37-g4c09d4a/lib/libpthread.so.0 (0x00007fff8e460000)
	libc.so.6 => /nix/store/wzxw87zd8195qpvr9qy1j4da2q2glcir-glibc-riscv64-unknown-linux-gnu-2.37-g4c09d4a/lib/libc.so.6 (0x00007fff8e32b000)
	/nix/store/wzxw87zd8195qpvr9qy1j4da2q2glcir-glibc-riscv64-unknown-linux-gnu-2.37-g4c09d4a/lib/ld-linux-riscv64-lp64d.so.1 => /lib/ld-linux-riscv64-lp6
4d.so.1 (0x00007fff8e759000)
	libucm.so.0 => /nix/store/07fvbv49gwvv24lls4nv1hb5zdizrcvd-ucx-riscv64-unknown-linux-gnu-1.16.0-g10c71f9/lib/libucm.so.0 (0x00007fff8e30f000)
	libbfd-2.40.0.20230213.so => /nix/store/crsgk85y7i6440xa1x331zzigvc34pbd-binutils-riscv64-unknown-linux-gnu-2.40.0-g3ac11f8-lib/lib/libbfd-2.40.0.2023
0213.so (0x00007fff8dd67000)
	libdl.so.2 => /nix/store/wzxw87zd8195qpvr9qy1j4da2q2glcir-glibc-riscv64-unknown-linux-gnu-2.37-g4c09d4a/lib/libdl.so.2 (0x00007fff8dd64000)
	libz.so.1 => /nix/store/r5mpd0cxlw2sj6a23gwwfs2assjsvzwf-zlib-riscv64-unknown-linux-gnu-1.2.13/lib/libz.so.1 (0x00007fff8dd4f000)
	libsframe.so.0 => /nix/store/crsgk85y7i6440xa1x331zzigvc34pbd-binutils-riscv64-unknown-linux-gnu-2.40.0-g3ac11f8-lib/lib/libsframe.so.0 (0x00007fff8dd
49000)

I can see that libucm comes after libc which means that RTLD_NEXT won't find it.

I was also worried about the affect of this change but all the tests on the CI pass with it - so I am fairly confident that this is the best solution (but open to alternatives!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it fixing a failure that happens with RISC-V? what was that change needed?
I'm worried it may break other use cases..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand - since this was specifically seen in with the UCC gtest (where RTLD_NEXT fell through to patching the PLT) i'm going to propose moving this to a separate PR/GitHub issue where we can discuss the behaviour (i'd also like to see if that fallthough to patching the PLT also occurs with the UCC gtest on x86-64.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove it from this PR then?

@rbradford rbradford force-pushed the 20230628-riscv-enabling branch from cac50b1 to 95202dd Compare July 19, 2023 12:04
Copy link
Contributor Author

@rbradford rbradford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe @tvegas1 Thank you for your responses. I've added additional patches to the PR to respond to your points (I understand you prefer patches only to be squashed when the review is concluded.)

Comment on lines 368 to 370
if (symtab < (void*)dl_info->start) {
symtab = (char*)symtab + dlpi_addr;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My colleague @evangreen wrote this patch - he is the best person to respond. IIRC the issue is that the offsets can't just be used as pointers since on RISC-V the binary isn't loaded at 0 in the address space.

dl = dlopen(info.dli_fname, flags);
if (dl != NULL) {
(void)dlerror();
func_ptr = dlsym(dl, symbol);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of RTLD_NEXT seems to have caught other folks out in the past (https://optumsoft.com/dangers-of-using-dlsym-with-rtld_next/).

If I look at the order of the libraries in the ucc_gtest binary:

/ # /lib/ld-linux-riscv64-lp64d.so.1 --list bin/ucc_gtest
	linux-vdso.so.1 (0x00007fff8e757000)
	libucc.so.1 => /nix/store/37nj6mzi1n7lgdppk1zp51q07g1cvfi3-ucc-riscv64-unknown-linux-gnu-1.2.0-gdb5711b/lib/libucc.so.1 (0x00007fff8e726000)
	libucs.so.0 => /nix/store/07fvbv49gwvv24lls4nv1hb5zdizrcvd-ucx-riscv64-unknown-linux-gnu-1.16.0-g10c71f9/lib/libucs.so.0 (0x00007fff8e6c6000)
	librt.so.1 => /nix/store/wzxw87zd8195qpvr9qy1j4da2q2glcir-glibc-riscv64-unknown-linux-gnu-2.37-g4c09d4a/lib/librt.so.1 (0x00007fff8e6c3000)
	libstdc++.so.6 => /nix/store/021k38q6dk02cbhd2688853qaxbaf6pg-riscv64-unknown-linux-gnu-stage-final-gcc-debug-12.1.0-g043c545-lib/riscv64-unknown-linu
x-gnu/lib/libstdc++.so.6 (0x00007fff8e4e7000)
	libm.so.6 => /nix/store/wzxw87zd8195qpvr9qy1j4da2q2glcir-glibc-riscv64-unknown-linux-gnu-2.37-g4c09d4a/lib/libm.so.6 (0x00007fff8e477000)
	libgcc_s.so.1 => /nix/store/021k38q6dk02cbhd2688853qaxbaf6pg-riscv64-unknown-linux-gnu-stage-final-gcc-debug-12.1.0-g043c545-lib/riscv64-unknown-linux
-gnu/lib/libgcc_s.so.1 (0x00007fff8e463000)
	libpthread.so.0 => /nix/store/wzxw87zd8195qpvr9qy1j4da2q2glcir-glibc-riscv64-unknown-linux-gnu-2.37-g4c09d4a/lib/libpthread.so.0 (0x00007fff8e460000)
	libc.so.6 => /nix/store/wzxw87zd8195qpvr9qy1j4da2q2glcir-glibc-riscv64-unknown-linux-gnu-2.37-g4c09d4a/lib/libc.so.6 (0x00007fff8e32b000)
	/nix/store/wzxw87zd8195qpvr9qy1j4da2q2glcir-glibc-riscv64-unknown-linux-gnu-2.37-g4c09d4a/lib/ld-linux-riscv64-lp64d.so.1 => /lib/ld-linux-riscv64-lp6
4d.so.1 (0x00007fff8e759000)
	libucm.so.0 => /nix/store/07fvbv49gwvv24lls4nv1hb5zdizrcvd-ucx-riscv64-unknown-linux-gnu-1.16.0-g10c71f9/lib/libucm.so.0 (0x00007fff8e30f000)
	libbfd-2.40.0.20230213.so => /nix/store/crsgk85y7i6440xa1x331zzigvc34pbd-binutils-riscv64-unknown-linux-gnu-2.40.0-g3ac11f8-lib/lib/libbfd-2.40.0.2023
0213.so (0x00007fff8dd67000)
	libdl.so.2 => /nix/store/wzxw87zd8195qpvr9qy1j4da2q2glcir-glibc-riscv64-unknown-linux-gnu-2.37-g4c09d4a/lib/libdl.so.2 (0x00007fff8dd64000)
	libz.so.1 => /nix/store/r5mpd0cxlw2sj6a23gwwfs2assjsvzwf-zlib-riscv64-unknown-linux-gnu-1.2.13/lib/libz.so.1 (0x00007fff8dd4f000)
	libsframe.so.0 => /nix/store/crsgk85y7i6440xa1x331zzigvc34pbd-binutils-riscv64-unknown-linux-gnu-2.40.0-g3ac11f8-lib/lib/libsframe.so.0 (0x00007fff8dd
49000)

I can see that libucm comes after libc which means that RTLD_NEXT won't find it.

I was also worried about the affect of this change but all the tests on the CI pass with it - so I am fairly confident that this is the best solution (but open to alternatives!)

@rbradford rbradford force-pushed the 20230628-riscv-enabling branch from 61aac39 to 95202dd Compare July 20, 2023 16:56
@ct-clmsn
Copy link

ct-clmsn commented Jul 26, 2023

@rbradford please look at the prior PR for RISCV support. @yosefe thank you for bringing both efforts together!

#8246

@rbradford
Copy link
Contributor Author

@rbradford please look at the prior PR for RISCV support. @yosefe thank you for bringing both efforts together!

#8246

Yes, this code is derived from that PR and fixed for coding style and bug fixes - I see you've updated it between when I created this PR and now - I thought your original PR was abandoned.

I would appreciate it if you would let me review the code before you finalise it as I think there might still be some minor issues.

@rbradford rbradford force-pushed the 20230628-riscv-enabling branch 2 times, most recently from e662599 to 1af06d0 Compare July 27, 2023 16:25
@rbradford
Copy link
Contributor Author

@yosefe I've updated this PR based on your feedback - the CI issues seem unrelated to the changes included in it.

@ct-clmsn
Copy link

ct-clmsn commented Jul 28, 2023

@rbradford great work; UCX has changed a lot since #8246 was completed, so much so that I consider #8246 OBE (overcome by events).

@yosefe I'd encourage merging this PR and closing out #8246.

This PR has been successfully tested on a sifive unmatched and a pine64/star64 (starfive chip) sbc.

@rbradford I've made a PR on your branch for these files, feel free to ignore. The PR was supposed to update these files with compile guards around OpenMP pragmas; git decided to do add a bunch of other changes that weren't intended (maybe it was a sync against ucx-master/main).

./src/tools/perf/perftest.c
./src/tools/perf/lib/libperf.c
./test/apps/test_init_mt.c

@ct-clmsn
Copy link

ct-clmsn commented Jul 28, 2023

@rbradford if possible, could you compare #8246's ucm_bistro_patch here between lines 80 and 92 against the version in this PR?

The implementation in this PR is using an older version of code from #8246 that has issues with address randomization. #8246 has been updated recently with a better version.

Thanks!

@rbradford
Copy link
Contributor Author

@ct-clmsn Thank you for testing - I have integrated your OpenMP related patch. Also to avoid the sign extension problem I have switched to a different method for making the jump. By using AUIPC we can store the absolute 64-bit address in the instruction stream as a constant and then load it and jump to it. This avoids all sorts of issues with the sign extension of immediates. If you can - I would appreciate you giving it another test.

@rbradford rbradford force-pushed the 20230628-riscv-enabling branch 2 times, most recently from 2fdb7c8 to b17bcb7 Compare August 1, 2023 08:14
@ct-clmsn
Copy link

ct-clmsn commented Aug 1, 2023

@rbradford will give it a test today. We made an effort to work with AUIPC awhile back - as you mentioned its the preferred approach. We ended up opting for JALR because we were getting some jump with no return program faults. In hindsight, that may have been associated with sign-extension issues in our address calculation. Expect a comment later today with an update!

@ct-clmsn
Copy link

ct-clmsn commented Aug 1, 2023

@rbradford all is clear on this end; successfully completed several loopback tests on a star64 over ethernet.

dl = dlopen(info.dli_fname, flags);
if (dl != NULL) {
(void)dlerror();
func_ptr = dlsym(dl, symbol);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it fixing a failure that happens with RISC-V? what was that change needed?
I'm worried it may break other use cases..

Comment on lines 368 to 370
if (symtab < (void*)dl_info->start) {
symtab = (char*)symtab + dlpi_addr;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but on other architectures the libraries are also not loaded at 0, and this code works.. is there some different with RISC-V here?
In any case, can use UCS_PTR_BYTE_OFFSET macro to add pointer to offset

@rbradford rbradford force-pushed the 20230628-riscv-enabling branch from b17bcb7 to 8628a4a Compare August 7, 2023 15:20
Copy link
Contributor Author

@rbradford rbradford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yosefe - i'm proposing to drop the two parts of this PR that are not purely RISC-V enabling and put them in separate PRs for more focussed discussion.

dl = dlopen(info.dli_fname, flags);
if (dl != NULL) {
(void)dlerror();
func_ptr = dlsym(dl, symbol);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand - since this was specifically seen in with the UCC gtest (where RTLD_NEXT fell through to patching the PLT) i'm going to propose moving this to a separate PR/GitHub issue where we can discuss the behaviour (i'd also like to see if that fallthough to patching the PLT also occurs with the UCC gtest on x86-64.)

Comment on lines 368 to 370
if (symtab < (void*)dl_info->start) {
symtab = (char*)symtab + dlpi_addr;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this fix was to specifically address =reloc mode I would propose that this commit be dropped from this PR and I open another PR specifically about this incorporating some of the feedback and identifying the exact cause of the offset requirement (and why it isn't needed elsewhere.)

@rbradford
Copy link
Contributor Author

Hi @yosefe - i'm proposing to drop the two parts of this PR that are not purely RISC-V enabling and put them in separate PRs for more focussed discussion.

I'd also squash the remaining RISC-V enabling commits into just one.

@yosefe
Copy link
Contributor

yosefe commented Aug 7, 2023

@rbradford per https://github.com/openucx/ucx/wiki/Guidance-for-contributors, please do not force push code review changes since it's not possible to track the diff vs. the previous review round.
Please rebase the branch on the previously reviewed revision and force-push to fix it

@rbradford
Copy link
Contributor Author

@rbradford per https://github.com/openucx/ucx/wiki/Guidance-for-contributors, please do not force push code review changes since it's not possible to track the diff vs. the previous review round. Please rebase the branch on the previously reviewed revision and force-push to fix it

Okay - I only did so to remove the already merged patch.

@rbradford rbradford force-pushed the 20230628-riscv-enabling branch from 8628a4a to f60b36c Compare August 7, 2023 15:54
@rbradford
Copy link
Contributor Author

@rbradford per https://github.com/openucx/ucx/wiki/Guidance-for-contributors, please do not force push code review changes since it's not possible to track the diff vs. the previous review round. Please rebase the branch on the previously reviewed revision and force-push to fix it

Done!

@yosefe
Copy link
Contributor

yosefe commented Aug 13, 2023

@rbradford pls also note the merge conflict

@rbradford
Copy link
Contributor Author

@rbradford pls also note the merge conflict

Apologies for the delay in response i've been away. @yosefe How would you like me to resolve this? Ordinarily I would rebase onto master since the problem comes from a conflict with a fix that has been landed into master, however you asked me not to rebase. Do you prefer a merge commit from master into this branch (I couldn't find anything in the UCX developer documentation about this.)

@yosefe
Copy link
Contributor

yosefe commented Aug 22, 2023

@rbradford pls also note the merge conflict

Apologies for the delay in response i've been away. @yosefe How would you like me to resolve this? Ordinarily I would rebase onto master since the problem comes from a conflict with a fix that has been landed into master, however you asked me not to rebase. Do you prefer a merge commit from master into this branch (I couldn't find anything in the UCX developer documentation about this.)

@rbradford please push a merge commit, thanks!

@rbradford
Copy link
Contributor Author

@rbradford pls also note the merge conflict

Apologies for the delay in response i've been away. @yosefe How would you like me to resolve this? Ordinarily I would rebase onto master since the problem comes from a conflict with a fix that has been landed into master, however you asked me not to rebase. Do you prefer a merge commit from master into this branch (I couldn't find anything in the UCX developer documentation about this.)

@rbradford please push a merge commit, thanks!

Done - I also took the patch from #9284 and adapted it remove the heuristic and instead make a pointer from the offset if built with RISC-V.

I know that the dlsym related change is controversial (but the CI runs through with it fine) so i'm happy to move that to a separate PR and issue to discuss as it's only required for running with UCC.

@rbradford rbradford force-pushed the 20230628-riscv-enabling branch from af54648 to 84f7eb0 Compare August 23, 2023 07:56
@rbradford
Copy link
Contributor Author

@yosefe I had to force push since the commit title check doesn't like "Revert" commits (but it is happy with "Merge" ones)

Copy link
Contributor

@yosefe yosefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls squash

rbradford and others added 3 commits August 23, 2023 12:10
This patch is the derived from the work done by Chris Taylor and John
Leidel (from this PR: https://github.com/openucx/ucx/pull/8246/commits)

The ways in which this patch differs from that PR are:

* Use __clear_cache() for i-cache updates with correct syscall fallback
(need to pass 0 for all CPUs) (myself)
* Use appropriate fence instructions for IO and CPU fences (myself)
* Coding standard fixes (myself)
* Removal of unused rv64/atomic.h (myself)
* Header guard fixes (myself)
* Removal of unused UCS_CPU_MODEL_RV64IMAFDC (myself)
* Use generic time functions (Evan Green)
* Calculate ilog2 correctly using clz (myself)
* Use AUIPC+JALR rather than immediate calculations for patch (myself)
* Support for atomic bistro patching (myself)

Original-author: Chris Taylor <ctaylor@tactcomplabs.com>
Original-author: John Leidel <john.leidel@gmail.com>
Co-authored-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
Signed-off-by: Evan Green <evan@rivosinc.com>
Rather than relying on RTLD_NEXT to return the symbol and falling back
to RTLD_DEFAULT instead lookup the symbol from the current shared
library directly.

This resolves an issue where on RISC-V in some circumstances the symbol
returned by RTLD_NEXT is null and RTLD_DEFAULT then provides a pointer
into the PLT stub. Since the PLT stub is too small to be patched by
directly resolving the symbol here the PLT is avoided and the C library
function can be directly patched.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
On some combinations of C library implementation and architecture the
entries in the PT_DYNAMIC table are not relocated and stored as
pointers. Instead the values are offsets from the address that the
library is loaded at. RISC-V with glibc is one such example.

Unfortunately since the internal glibc headers that expose whether the
value is relocated on not are not available the best solution is to make
this architecturally conditional. As this behaviour is now established
as part of the ABI it will not change in the future.

Since this should only be done for the values from the table that are
treated like pointers create a new function ucm_reloc_get_pointer() that
applies the architecture check was added

This issue was discovered when using UCX_MEM_MMAP_HOOK_MODE=reloc on RISC-V.

Fixes: openucx#9283

Co-developed-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
@rbradford
Copy link
Contributor Author

pls squash

Thanks! Consolidated into three logical commits - unfortunately it looks like there is some unrelated CI problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants