-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[libc] New version of the mem* framework
This version is more composable and also simpler at the expense of being more explicit and more verbose. It also provides minimal implementations for ARM platforms. Codegen can be checked here https://godbolt.org/z/x19zvE59v Differential Revision: https://reviews.llvm.org/D135134
- Loading branch information
Showing
26 changed files
with
1,740 additions
and
1,895 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
# The mem* framework | ||
|
||
The framework handles the following mem* functions: | ||
- `memcpy` | ||
- `memmove` | ||
- `memset` | ||
- `bzero` | ||
- `bcmp` | ||
- `memcmp` | ||
|
||
## Building blocks | ||
|
||
These functions can be built out of a set of lower-level operations: | ||
- **`block`** : operates on a block of `SIZE` bytes. | ||
- **`tail`** : operates on the last `SIZE` bytes of the buffer (e.g., `[dst + count - SIZE, dst + count]`) | ||
- **`head_tail`** : operates on the first and last `SIZE` bytes. This is the same as calling `block` and `tail`. | ||
- **`loop_and_tail`** : calls `block` in a loop to consume as much as possible of the `count` bytes and handle the remaining bytes with a `tail` operation. | ||
|
||
As an illustration, let's take the example of a trivial `memset` implementation: | ||
|
||
```C++ | ||
extern "C" void memset(const char* dst, int value, size_t count) { | ||
if (count == 0) return; | ||
if (count == 1) return Memset<1>::block(dst, value); | ||
if (count == 2) return Memset<2>::block(dst, value); | ||
if (count == 3) return Memset<3>::block(dst, value); | ||
if (count <= 8) return Memset<4>::head_tail(dst, value, count); // Note that 0 to 4 bytes are written twice. | ||
if (count <= 16) return Memset<8>::head_tail(dst, value, count); // Same here. | ||
return Memset<16>::loop_and_tail(dst, value, count); | ||
} | ||
``` | ||
Now let's have a look into the `Memset` structure: | ||
```C++ | ||
template <size_t Size> | ||
struct Memset { | ||
static constexpr size_t SIZE = Size; | ||
static inline void block(Ptr dst, uint8_t value) { | ||
// Implement me | ||
} | ||
static inline void tail(Ptr dst, uint8_t value, size_t count) { | ||
block(dst + count - SIZE, value); | ||
} | ||
static inline void head_tail(Ptr dst, uint8_t value, size_t count) { | ||
block(dst, value); | ||
tail(dst, value, count); | ||
} | ||
static inline void loop_and_tail(Ptr dst, uint8_t value, size_t count) { | ||
size_t offset = 0; | ||
do { | ||
block(dst + offset, value); | ||
offset += SIZE; | ||
} while (offset < count - SIZE); | ||
tail(dst, value, count); | ||
} | ||
}; | ||
``` | ||
|
||
As you can see, the `tail`, `head_tail` and `loop_and_tail` are higher order functions that build on each others. Only `block` really needs to be implemented. | ||
In earlier designs we were implementing these higher order functions with templated functions but it appears that it is more readable to have the implementation explicitly stated. | ||
**This design is useful because it provides customization points**. For instance, for `bcmp` on `aarch64` we can provide a better implementation of `head_tail` using vector reduction intrinsics. | ||
|
||
## Scoped specializations | ||
|
||
We can have several specializations of the `Memset` structure. Depending on the target requirements we can use one or several scopes for the same implementation. | ||
|
||
In the following example we use the `generic` implementation for the small sizes but use the `x86` implementation for the loop. | ||
```C++ | ||
extern "C" void memset(const char* dst, int value, size_t count) { | ||
if (count == 0) return; | ||
if (count == 1) return generic::Memset<1>::block(dst, value); | ||
if (count == 2) return generic::Memset<2>::block(dst, value); | ||
if (count == 3) return generic::Memset<3>::block(dst, value); | ||
if (count <= 8) return generic::Memset<4>::head_tail(dst, value, count); | ||
if (count <= 16) return generic::Memset<8>::head_tail(dst, value, count); | ||
return x86::Memset<16>::loop_and_tail(dst, value, count); | ||
} | ||
``` | ||
### The `builtin` scope | ||
Ultimately we would like the compiler to provide the code for the `block` function. For this we rely on dedicated builtins available in Clang (e.g., [`__builtin_memset_inline`](https://clang.llvm.org/docs/LanguageExtensions.html#guaranteed-inlined-memset)) | ||
### The `generic` scope | ||
In this scope we define pure C++ implementations using native integral types and clang vector extensions. | ||
### The arch specific scopes | ||
Then comes implementations that are using specific architectures or microarchitectures features (e.g., `rep;movsb` for `x86` or `dc zva` for `aarch64`). | ||
The purpose here is to rely on builtins as much as possible and fallback to `asm volatile` as a last resort. |
Oops, something went wrong.