Skip to content

Conversation

@fowles
Copy link
Contributor

@fowles fowles commented Aug 4, 2022

Important: this integration must be merged, not squashed!

@fowles fowles merged commit fb8edd2 into protocolbuffers:main Aug 4, 2022
@ilatypov
Copy link

ilatypov commented Sep 8, 2022

This change replaced the inline loop unrolling with a run-time loop over 8-byte blocks (thank you!), The old code relied on recursive template instantiations to generate a sequence of SwapBlock() inline instances for chunks of 8, 4, 2 and 1 bytes, depending on the compile-time-remaining size.

// Swaps two blocks of memory of size kSize:
template <size_t kSize>
void memswap(char* a, char* b) {
#if __SIZEOF_INT128__
using Buffer = __uint128_t;
#else
using Buffer = uint64_t;
#endif
constexpr size_t kBlockSize = sizeof(Buffer);
Buffer buf;
for (size_t i = 0; i < kSize / kBlockSize; ++i) {
memcpy(&buf, a, kBlockSize);
memcpy(a, b, kBlockSize);
memcpy(b, &buf, kBlockSize);
a += kBlockSize;
b += kBlockSize;
}
// Swap the leftover bytes, could be zero.
memcpy(&buf, a, kSize % kBlockSize);
memcpy(a, b, kSize % kBlockSize);
memcpy(b, &buf, kSize % kBlockSize);
}

Pasting my reproducing of the idea from repeated_field.h v3.20.0.

#include <cstdint>
#include <cstring>
#include <algorithm>

namespace googpb {

template <int kSize>
inline typename std::enable_if<(kSize == 0), void>::type memswap(char*, char*) {
}

template <typename T>
inline void SwapBlock(char* p, char* q) {
  T tmp;
  memcpy(&tmp, p, sizeof(T));
  memcpy(p, q, sizeof(T));
  memcpy(q, &tmp, sizeof(T));
}

#define PROTO_MEMSWAP_DEF_SIZE(reg_type, max_size)                           \
  template <int kSize>                                                       \
  typename std::enable_if<(kSize >= sizeof(reg_type) && kSize < (max_size)), \
                          void>::type                                        \
  memswap(char* p, char* q) {                                                \
    SwapBlock<reg_type>(p, q);                                               \
    memswap<kSize - sizeof(reg_type)>(p + sizeof(reg_type),                  \
                                      q + sizeof(reg_type));                 \
  }

PROTO_MEMSWAP_DEF_SIZE(uint8_t, 2)
PROTO_MEMSWAP_DEF_SIZE(uint16_t, 4)
PROTO_MEMSWAP_DEF_SIZE(uint32_t, 8)
PROTO_MEMSWAP_DEF_SIZE(uint64_t, (1u << 31))
}

void InternalSwap(void* other) {
  char state[1000];
  ::googpb::memswap<9>((char*)&state, (char*)&other);
}

bithium pushed a commit to bithium/protobuf that referenced this pull request Sep 4, 2023
Integrate from Piper for C++, Java, and Python
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants