Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FMV] Runtime Resolver Function #74

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

BeMg
Copy link
Contributor

@BeMg BeMg commented Apr 19, 2024

This PR proposes a runtime resolver function that retrieves the environment information. Since this resolver function is expected to be available and interchangeable for both libgcc and compiler-rt, a formal specification for the resolver function interface is necessary.


When generating the resolver function for function multiversioning, a mechanism is necessary to obtain the environment information.

To achieve this goal, several steps need to be taken:

  1. Collect the required extensions for a particular function.
  2. Transform these required extensions into a platform-dependent form.
  3. Query whether the environment fulfills these requirements during runtime.

Step 1 is handled by the compiler, while step 3 must follow the necessary steps from the platform during runtime.

This RFC aims to propose how the compiler and runtime function can tackle step 2.

Here is a example

__attribute__((target_clones("default", "arch=rv64gcv"))) int bar() {
    return 1;
}

In this example, there are two versions of function bar. One for default, another for "rv64gcv".

If the environment meets the requirements, then bar can utilize the arch=rv64gcv version. Otherwise, it will invoke the default version.

This process be controlled by the ifunc resolver function.

ptr bar.resolver() {
   if (isFulFill(...))
      return "bar.arch=rv64gcv";
   return bar.default;
}

The isFulFill should available during the program runtime.

The version arch=rv64gcv require

i, m, a, f, d, c, v, zicsr, zifencei, zve32f, zve32x, zve64d, zve64f, zve64x, zvl128b, zvl32b, zvl64b,

The problem 2 is about where to maintain the relationship between extension names and platform-dependent probe forms.

Here are three possible approach to achieve goal.

  1. Encode all required extensions into a string format, then let the platform implement its own probe approach based on the string inside the runtime function. This approach maintains the relationship between extension names and platform-dependent probe forms inside the runtime function.
ptr bar.resolver() {
   if (isFulFill("i_m_a_f_d_c_v_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b"))
      return bar.arch=rv64gcv;
   return bar.default;
}

bool isFulFill(char *ReqExts) {
    if (isLinux())
       return doLinuxRISCVExtensionProbe(ReqExts);
    if (isFreeBSD())
       return doFreeBSDRISCVExtensionProbe(ReqExts);
    // Other platform
    ....
    return false;
}
  • Pros
    • Human readable
    • Relatively high portability
    • Provides a uniform interface for all platforms
  • Cons
    • Requires extra effort for string processing in the runtime function.
  1. Encode all required extensions into a compiler-defined key, then let the platform implement its own probe approach inside the runtime. This approach maintains the relationship between the compiler-defined key for extensions and the platform-dependent probe form inside the runtime function.
// Assume compiler define
// i -> 1
// m -> 2
...

ptr bar.resolver() {
   if (isFulFill([1, 2, 3, 8, ...], length))
      return bar.arch=rv64gcv;
   return bar.default;
}

bool isFulFill(int *ReqExts, length) {
    if (isLinux())
       return doLinuxRISCVExtensionProbe(ReqExts, length);
    if (isFreeBSD())
       return doFreeBSDRISCVExtensionProbe(ReqExts, length);
    // Other platform
    ....
    return false;
}
  • Pros
    • Doesn't require string processing during runtime
    • Provides a uniform interface for all platforms
  • Cons
    • Requires maintaining the relationship between the compiler-defined key for extensions and the concrete extension names inside runtime function.
  1. Define a different runtime function for each platform and construct any necessary information during compilation time if necessary for the platform. This approach maintains the relationship between extension names and platform-dependent probe forms inside the compiler.
// If compiler compile for linux, then use bar.resolver.linux
ptr bar.resolver.linux() {
   if (isFulFillLinux(LinuxProbeObject))
      return bar.arch=rv64gcv;
   return bar.default;
}

ptr bar.resolver.freebsd() {
   if (isFulFillFreeBSD(FreeBSDProbeObject))
      return bar.arch=rv64gcv;
   return bar.default;
}

// Other platform bar.resolver
...

bool isFulFillLinux(LinuxProbeObject Obj) {
   return doLinuxProbe(Obj);
}

bool isFulFillFreeBSD(FreeBSDProbeObject Obj) {
   return doFreeBSDProbe(Obj);
}

// Other platform isFulFill
...

  • Pros
    • Relatively simple implementation for the runtime function
  • Cons
    • Does not provide a uniform interface for all platforms

@BeMg
Copy link
Contributor Author

BeMg commented Apr 19, 2024

@BeMg
Copy link
Contributor Author

BeMg commented Apr 19, 2024

cc @kito-cheng

@topperc
Copy link
Contributor

topperc commented Apr 19, 2024

Is two word "FullFill" supposed to be the single word "Fulfill"?

@topperc
Copy link
Contributor

topperc commented Apr 20, 2024

Do we intend to support __builtin_cpu_supports which is built on the same interface as function multiversioning on other targets like X86? That will require a reasonably fast query mechanism. String processing may be too much for that.

@BeMg
Copy link
Contributor Author

BeMg commented Apr 22, 2024

Is two word "FullFill" supposed to be the single word "Fulfill"?

Oops, I think there is a typo here. Updated.

@BeMg
Copy link
Contributor Author

BeMg commented Apr 22, 2024

Do we intend to support __builtin_cpu_supports which is built on the same interface as function multiversioning on other targets like X86? That will require a reasonably fast query mechanism. String processing may be too much for that.

If we only allow one extension each time. Does it provide a reasonably fast query mechanism? Or must it be some kind of bit operation to determine support?

For example, compiler generate this resolver function base on __builtin_cpu_supports. And compiler-rt/libgcc use the method 1 to implement __builtin_cpu_supports.

ptr bar.resolver() {
   if (__builtin_cpu_supports("i") && 
       __builtin_cpu_supports("m") && 
       __builtin_cpu_supports("a") && 
       __builtin_cpu_supports("f") && 
       __builtin_cpu_supports("d") && 
       __builtin_cpu_supports("c") && 
       __builtin_cpu_supports("v") && 
       __builtin_cpu_supports("zicsr") && 
...
       __builtin_cpu_supports("zvl64b"))
      return bar.arch=rv64gcv;
   return bar.default;
}

@topperc
Copy link
Contributor

topperc commented Apr 22, 2024

Do we intend to support __builtin_cpu_supports which is built on the same interface as function multiversioning on other targets like X86? That will require a reasonably fast query mechanism. String processing may be too much for that.

If we only allow one extension each time. Does it provide a reasonably fast query mechanism? Or must it be some kind of bit operation to determine support?

For example, compiler generate this resolver function base on __builtin_cpu_supports. And compiler-rt/libgcc use the method 1 to implement __builtin_cpu_supports.


ptr bar.resolver() {

   if (__builtin_cpu_supports("i") && 

       __builtin_cpu_supports("m") && 

       __builtin_cpu_supports("a") && 

       __builtin_cpu_supports("f") && 

       __builtin_cpu_supports("d") && 

       __builtin_cpu_supports("c") && 

       __builtin_cpu_supports("v") && 

       __builtin_cpu_supports("zicsr") && 

...

       __builtin_cpu_supports("zvl64b"))

      return bar.arch=rv64gcv;

   return bar.default;

}

My concern is that each time you pass a string into the compiler-rt interface, it will need to execute multiple strcmps to compare the input string against every extension name the library knows about to figure out which extension is being asked for. That gets expensive if called very often.

On x86, builtin_cpu_supports calls the library the first time to update some global variables. After the first time it is a load and a bit test

@jrtc27
Copy link

jrtc27 commented May 8, 2024

If you use a sensible data structure like a trie you can do it linearly in the length of the input string

@BeMg
Copy link
Contributor Author

BeMg commented May 24, 2024

To enhance both the performance(compare to string base) and portability(compare to hwprobe base), I have updated the runtime interface with a new layer for each queryable extension. This approach is similar to approach 2 described in the PR's description. This comment aims to explain it with a concrete example using the IFUNC resolver function and __builtin_cpu_supports.

Two structures are defined in the runtime library to store the status of hardware-enabled extensions:

Each queryable extension has a unique position inside the structure bit to represent whether it is enabled. For example: extension m enable bit could be stored inside __riscv_feature_bit.features[0] & (1 << 5)

struct {
	unsigned length;
    unsigned long long features[MAXLENGTH];
} __riscv_feature_bit;

struct {
    unsigned vendorID;
    unsigned length;
    unsigned long long features[MAXLENGTH];
} __riscv_vendor_feature_bit;

Additionally, there is a function to initialize these two structures using a system-provided mechanism:

void __init_riscv_features_bit();

In summary, this approach uses __riscv_feature_bit and __riscv_vendor_feature_bit to represent whether an extension is enabled. They are initialized by __init_riscv_features_bit. Both structures are defined in compiler-rt/libgcc.


When the compiler emits the IFUNC resolver function, it can use these structures to check whether all extension requirements are fulfilled.

Here is a simple example for a resolver:

; -target-feature +i
__attribute__((target_clones("default", "arch=rv64im"))) int foo1(void) {
  return 1;
}
func_ptr foo1.resolver() {
	__init_riscv_features_bit();
	if (MAX_QUERY_LENGTH > __riscv_feature_bits.length)
		raise_error();

    // Try arch=rv64im
	unsigned long long rv64im_require_feature_0 = constant_build_during_compiation_time();
	unsigned long long rv64im_require_feature_1 = constant_build_during_compiation_time();
	...
	if (
	((rv64im_require_feature_0 & __riscv_feature_bits.features[0]) == rv64im_require_feature_0) &&
	((rv64im_require_feature_1 & __riscv_feature_bits.features[1]) == rv64im_require_feature_1) &&
	...)
		return foo1.rv64im;

	return foo1.default;
}

@jrtc27
Copy link

jrtc27 commented May 24, 2024

Who's specifying which bit is what?

@BeMg
Copy link
Contributor Author

BeMg commented May 24, 2024

My idea is that bit is only meaningful for runtime function and compiler that using __riscv_feature_bits. For function multiversioning, I will allocate non-colliding bits for extensions and remain unchanged. If there is new extension, allocate the available bit or extend the __riscv_feature_bits.features size when it be used by function multiversioning. Vendor extension is guarded by vendorID, so it can be allocated by vendor itself without collosion with other vendor extension.

The remaining problem is how to synchronize the extension bitmask across LLVM, compiler-rt, GCC, and libgcc. I don't have a solution for this yet.

@kito-cheng Any ideas on how we can achieve this synchronization?

@BeMg
Copy link
Contributor Author

BeMg commented Jun 3, 2024

Update: add the extension groupid/bitmask definitions for synchronization across LLVM, compiler-rt, GCC, and libgcc.


cc @kito-cheng @topperc

@kito-cheng
Copy link
Collaborator

This proposal got positive feedback from RISC-V GNU community :)

BeMg added a commit to BeMg/llvm-project that referenced this pull request Jun 5, 2024
Base on riscv-non-isa/riscv-c-api-doc#74.

This patch defines the groupid/bitmask in RISCVFeatures.td and generates the corresponding table in RISCVTargetParserDef.inc.

The groupid/bitmask of extensions provides an abstraction layer between the compiler and runtime functions.
riscv-c-api.md Outdated Show resolved Hide resolved
riscv-c-api.md Outdated Show resolved Hide resolved
riscv-c-api.md Outdated Show resolved Hide resolved
riscv-c-api.md Outdated Show resolved Hide resolved
riscv-c-api.md Outdated Show resolved Hide resolved
riscv-c-api.md Outdated Show resolved Hide resolved
riscv-c-api.md Outdated Show resolved Hide resolved
BeMg added a commit to BeMg/llvm-project that referenced this pull request Jun 11, 2024
…ature_bits/__init_riscv_features_bit

Base on riscv-non-isa/riscv-c-api-doc#74, this patch defines the __riscv_feature_bits and __riscv_vendor_feature_bits structures to store the enabled feature bits at runtime.

It also introduces the __init_riscv_features_bit function to update these structures based on the platform query mechanism.

Additionally, the groupid/bitmask definitions from riscv-non-isa/riscv-c-api-doc#74 are declared and used to update the __riscv_feature_bits and __riscv_vendor_feature_bits structures.
BeMg added a commit to BeMg/llvm-project that referenced this pull request Jun 11, 2024
Base on riscv-non-isa/riscv-c-api-doc#74.

This patch defines the groupid/bitmask in RISCVFeatures.td and generates the corresponding table in RISCVTargetParserDef.inc.

The groupid/bitmask of extensions provides an abstraction layer between the compiler and runtime functions.
BeMg added a commit to BeMg/llvm-project that referenced this pull request Jun 17, 2024
Base on riscv-non-isa/riscv-c-api-doc#74.

This patch defines the groupid/bitmask in RISCVFeatures.td and generates the corresponding table in RISCVTargetParserDef.inc.

The groupid/bitmask of extensions provides an abstraction layer between the compiler and runtime functions.
BeMg added a commit to BeMg/llvm-project that referenced this pull request Jun 17, 2024
…ature_bits/__init_riscv_features_bit

Base on riscv-non-isa/riscv-c-api-doc#74, this patch defines the __riscv_feature_bits and __riscv_vendor_feature_bits structures to store the enabled feature bits at runtime.

It also introduces the __init_riscv_features_bit function to update these structures based on the platform query mechanism.

Additionally, the groupid/bitmask definitions from riscv-non-isa/riscv-c-api-doc#74 are declared and used to update the __riscv_feature_bits and __riscv_vendor_feature_bits structures.
@BeMg BeMg requested a review from 4vtomat June 18, 2024 07:03
Copy link
Collaborator

@kito-cheng kito-cheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor comment :)

riscv-c-api.md Outdated Show resolved Hide resolved
riscv-c-api.md Outdated Show resolved Hide resolved
riscv-c-api.md Outdated Show resolved Hide resolved
riscv-c-api.md Outdated Show resolved Hide resolved
riscv-c-api.md Outdated Show resolved Hide resolved
riscv-c-api.md Outdated Show resolved Hide resolved
riscv-c-api.md Outdated Show resolved Hide resolved
@palmer-dabbelt
Copy link
Contributor

IMO it's way simpler to just have the resolver call hwprobe directly, rather than trying to introduce this intermediate format and the associated library helper functions. We don't even need to specify anything here: the compiler could just generate the hwprobe calls directly and then call into the VDSO via the provided argument to the IFUNC resolver.

That said: this is essentially just duplicating one of the early hwprobe designs, and thus has a bunch of design flaws we spent a few versions sorting out. So if you want to go with it, probably best to sort out things like:

  • It's a bitmask, so we can only represent single-bit features.
  • There's no way to have a sparse bitmask in this format, so callers have to allocate all the earlier-defined 0s for new extensions.
  • There's no support for heterogeneous-ISA systems or extensions from multiple vendors.
  • The bit definitions are imprecise.
  • This relies on RVI's binary compatibility story, which we've gotten burned by a bunch of times.
  • Performance features aren't defined, and don't naturally fit into a bitmask.
  • Resolving global symbols during IFUNCs has a long tail of problems.
  • What's going to call that initialization function?

So I'd recommend doing basically nothing here: we already have all the tools we need to implement FMV at the binary/library level, we just need to mark the multi-target attributes as legal so we can implement them.

@topperc
Copy link
Contributor

topperc commented Jun 25, 2024

IMO it's way simpler to just have the resolver call hwprobe directly, rather than trying to introduce this intermediate format and the associated library helper functions. We don't even need to specify anything here: the compiler could just generate the hwprobe calls directly and then call into the VDSO via the provided argument to the IFUNC resolver.

The resolver isn't the only use of this. I'm assuming we should support __builtin_cpu_supports like other targets?

@topperc
Copy link
Contributor

topperc commented Jun 25, 2024

  • What's going to call that initialization function?

On X86, it's called by the resolver function. Only the first call does anything real, the other calls early out if its already been done.

I suggested we should cache the information rather than doing a syscall of hwprobe for every multiversion function.

@kito-cheng
Copy link
Collaborator

I am not sure if compiler can generate code to invoke vDSO direct, but this part is like optimization on reducing the overhead of query the capability of host machine, I am kinda less concern around this since current proposal can cache that when first call __init_riscv_features_bit.

For other concern:

We intend to add extension first, and we believe bit mask is enough for now, and our goal is reach same capability as IFUNC in glibc, which we don't intend to address heterogeneous-ISA systems or extensions from multiple vendors yet, and we may extend the syntax on future if needed.

And for IFUNC...I believe there are few security issue around that, but I don't see we have other choice for short-term, both LLVM and GCC are didn't provide such infrastructure without IFUNC, and I am not sure it worth to spend another half year to doing that is worth, also we don't document down we use IFUNC, so we can change the implementation to get rid of IFUNC stuffs in future if we think it's necessary, and the __init_riscv_features_bit and __riscv_feature_bits still can be used once we switch to different implementation.

@BeMg
Copy link
Contributor Author

BeMg commented Jul 11, 2024

Remove the bitmask that can't be query by hwprobe directly. And update Bitpos base on current support extension alphabetical order.

BeMg added a commit to BeMg/llvm-project that referenced this pull request Jul 11, 2024
Base on riscv-non-isa/riscv-c-api-doc#74.

This patch defines the groupid/bitmask in RISCVFeatures.td and generates the corresponding table in RISCVTargetParserDef.inc.

The groupid/bitmask of extensions provides an abstraction layer between the compiler and runtime functions.
BeMg added a commit to BeMg/llvm-project that referenced this pull request Jul 12, 2024
Base on riscv-non-isa/riscv-c-api-doc#74.

This patch defines the groupid/bitmask in RISCVFeatures.td and generates the corresponding table in RISCVTargetParserDef.inc.

The groupid/bitmask of extensions provides an abstraction layer between the compiler and runtime functions.
riscv-c-api.md Outdated Show resolved Hide resolved
riscv-c-api.md Outdated Show resolved Hide resolved
BeMg added a commit to BeMg/llvm-project that referenced this pull request Jul 16, 2024
Base on riscv-non-isa/riscv-c-api-doc#74.

This patch defines the groupid/bitmask in RISCVFeatures.td and generates the corresponding table in RISCVTargetParserDef.inc.

The groupid/bitmask of extensions provides an abstraction layer between the compiler and runtime functions.
riscv-c-api.md Outdated Show resolved Hide resolved
BeMg added a commit to BeMg/llvm-project that referenced this pull request Jul 18, 2024
Base on riscv-non-isa/riscv-c-api-doc#74.

This patch defines the groupid/bitmask in RISCVFeatures.td and generates the corresponding table in RISCVTargetParserDef.inc.

The groupid/bitmask of extensions provides an abstraction layer between the compiler and runtime functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants