Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc] Export the RPC interface from libc #71432

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Nov 6, 2023

Summary:
This patch adds new extensions that allow us to export the RPC interface
from libc to other programs. This should allow external users of the
GPU libc to interface with the RPC client (which more or less behaves
like syscalls in this context). This is done by wrapping the interface
into a C-style function call.

Obviously, this approach is far less safe than the carefully crafted C++
interface. For example, we now expose the internal packet buffer, and it
is theoretically possible to open a single port with conflicting opcodes
and break the whole interface. So, extra care will be needed when
interacting with this. However, the usage is very similar as shown by
the new test.

This somewhat stretches the concept of libc just doing libc things,
but I think this is important enough to justify it. It is difficult to
split this out, as we do not want to have the situation where we have
multiple RPC clients running at one time, so for now it makes sense to
just leave it in libc.

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 6, 2023

@llvm/pr-subscribers-libc

Author: Joseph Huber (jhuber6)

Changes

Summary:
This patch adds new extensions that allow us to export the RPC interface
from libc to other programs. This should allow external users of the
GPU libc to interface with the RPC client (which more or less behaves
like syscalls in this context). This is done by wrapping the interface
into a C-style function call.

Obviously, this approach is far less safe than the carefully crafted C++
interface. For example, we now expose the internal packet buffer, and it
is theoretically possible to open a single port with conflicting opcodes
and break the whole interface. So, extra care will be needed when
interacting with this. However, the usage is very similar as shown by
the new test.

I was unsure whether or not to use function pointers to give a send / recv interface that is more true to what the C++ version offers.
However, I elected to simply return the buffer (moving a wait for
ownership inside of this interface) as function pointers on the GPU can
be somewhat difficult to work with. The downside is now that the
interface can hang if it is not ordered correctly, i.e., calling
rpc_get_buffer before a send call and after a recv call. I am open
to changing this if others have opinions.

This somewhat stretches the concept of libc just doing libc things,
but I think this is important enough to justify it. It is difficult to
split this out, as we do not want to have the situation where we have
multiple RPC clients running at one time, so for now it makes sense to
just leave it in libc.


Patch is 21.41 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/71432.diff

20 Files Affected:

  • (modified) libc/config/gpu/entrypoints.txt (+5)
  • (modified) libc/include/CMakeLists.txt (+1)
  • (modified) libc/include/gpu/rpc.h.def (+3)
  • (modified) libc/include/llvm-libc-types/CMakeLists.txt (+1)
  • (added) libc/include/llvm-libc-types/rpc_port_t.h (+21)
  • (modified) libc/spec/gpu_ext.td (+31-1)
  • (modified) libc/src/__support/RPC/rpc.h (+15-3)
  • (modified) libc/src/gpu/CMakeLists.txt (+55)
  • (added) libc/src/gpu/rpc_close_port.cpp (+25)
  • (added) libc/src/gpu/rpc_close_port.h (+20)
  • (added) libc/src/gpu/rpc_get_buffer.cpp (+25)
  • (added) libc/src/gpu/rpc_get_buffer.h (+20)
  • (added) libc/src/gpu/rpc_open_port.cpp (+26)
  • (added) libc/src/gpu/rpc_open_port.h (+20)
  • (added) libc/src/gpu/rpc_recv.cpp (+25)
  • (added) libc/src/gpu/rpc_recv.h (+20)
  • (added) libc/src/gpu/rpc_send.cpp (+25)
  • (added) libc/src/gpu/rpc_send.h (+20)
  • (modified) libc/test/integration/startup/gpu/CMakeLists.txt (+13)
  • (added) libc/test/integration/startup/gpu/rpc_external_interface_test.cpp (+99)
diff --git a/libc/config/gpu/entrypoints.txt b/libc/config/gpu/entrypoints.txt
index 5dda6aa71d36f81..47bcb36db49d55b 100644
--- a/libc/config/gpu/entrypoints.txt
+++ b/libc/config/gpu/entrypoints.txt
@@ -139,6 +139,11 @@ set(TARGET_LIBC_ENTRYPOINTS
 
     # gpu/rpc.h entrypoints
     libc.src.gpu.rpc_host_call
+    libc.src.gpu.rpc_open_port
+    libc.src.gpu.rpc_get_buffer
+    libc.src.gpu.rpc_send
+    libc.src.gpu.rpc_recv
+    libc.src.gpu.rpc_close_port
 )
 
 set(TARGET_LIBM_ENTRYPOINTS
diff --git a/libc/include/CMakeLists.txt b/libc/include/CMakeLists.txt
index 9d170603ffa45cd..0f044ff84f28983 100644
--- a/libc/include/CMakeLists.txt
+++ b/libc/include/CMakeLists.txt
@@ -515,6 +515,7 @@ if(LIBC_TARGET_ARCHITECTURE_IS_GPU)
     DEPENDS
       .llvm_libc_common_h
       .llvm-libc-types.rpc_opcodes_t
+      .llvm-libc-types.rpc_port_t
   )
 endif()
 
diff --git a/libc/include/gpu/rpc.h.def b/libc/include/gpu/rpc.h.def
index 0438cd65e7be22e..f178007ee7083f5 100644
--- a/libc/include/gpu/rpc.h.def
+++ b/libc/include/gpu/rpc.h.def
@@ -11,7 +11,10 @@
 
 #include <__llvm-libc-common.h>
 
+#include <llvm-libc-types/size_t.h>
+
 #include <llvm-libc-types/rpc_opcodes_t.h>
+#include <llvm-libc-types/rpc_port_t.h>
 
 %%public_api()
 
diff --git a/libc/include/llvm-libc-types/CMakeLists.txt b/libc/include/llvm-libc-types/CMakeLists.txt
index 3c0cc7bbc71dacb..8e0bf6a50c22bdd 100644
--- a/libc/include/llvm-libc-types/CMakeLists.txt
+++ b/libc/include/llvm-libc-types/CMakeLists.txt
@@ -91,3 +91,4 @@ add_header(wint_t HDR wint_t.h)
 add_header(sa_family_t HDR sa_family_t.h)
 add_header(struct_sockaddr HDR struct_sockaddr.h)
 add_header(rpc_opcodes_t HDR rpc_opcodes_t.h)
+add_header(rpc_port_t HDR rpc_port_t.h)
diff --git a/libc/include/llvm-libc-types/rpc_port_t.h b/libc/include/llvm-libc-types/rpc_port_t.h
new file mode 100644
index 000000000000000..b0e6669a9f88a53
--- /dev/null
+++ b/libc/include/llvm-libc-types/rpc_port_t.h
@@ -0,0 +1,21 @@
+//===-- Definition of type rpc_port_t -------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef __LLVM_LIBC_TYPES_RPC_PORT_T_H__
+#define __LLVM_LIBC_TYPES_RPC_PORT_T_H__
+
+typedef struct {
+  __UINT8_TYPE__ reserved[32];
+} rpc_port_t;
+
+typedef struct {
+  __UINT64_TYPE__ data[8];
+} rpc_buffer_t;
+
+
+#endif // __LLVM_LIBC_TYPES_RPC_PORT_T_H__
diff --git a/libc/spec/gpu_ext.td b/libc/spec/gpu_ext.td
index dce81ff7786203b..56094d4d50d3d16 100644
--- a/libc/spec/gpu_ext.td
+++ b/libc/spec/gpu_ext.td
@@ -1,8 +1,13 @@
+def RPCPortT : NamedType<"rpc_port_t">;
+def RPCPortPtrT : PtrType<RPCPortT>;
+def RPCBufferT : NamedType<"rpc_buffer_t">;
+def RPCBufferPtrT : PtrType<RPCBufferT>;
+
 def GPUExtensions : StandardSpec<"GPUExtensions"> {
   HeaderSpec RPC = HeaderSpec<
     "gpu/rpc.h",
     [], // Macros
-    [], // Types
+    [RPCPortT, RPCBufferT, SizeTType], // Types
     [], // Enumerations
     [
         FunctionSpec<
@@ -10,6 +15,31 @@ def GPUExtensions : StandardSpec<"GPUExtensions"> {
             RetValSpec<VoidType>,
             [ArgSpec<VoidPtr>, ArgSpec<VoidPtr>, ArgSpec<SizeTType>]
         >,
+        FunctionSpec<
+            "rpc_open_port",
+            RetValSpec<RPCPortT>,
+            [ArgSpec<UnsignedIntType>]
+        >,
+        FunctionSpec<
+            "rpc_get_buffer",
+            RetValSpec<RPCBufferPtrT>,
+            [ArgSpec<RPCPortPtrT>]
+        >,
+        FunctionSpec<
+            "rpc_send",
+            RetValSpec<VoidType>,
+            [ArgSpec<RPCPortPtrT>]
+        >,
+        FunctionSpec<
+            "rpc_recv",
+            RetValSpec<VoidType>,
+            [ArgSpec<RPCPortPtrT>]
+        >,
+        FunctionSpec<
+            "rpc_close_port",
+            RetValSpec<VoidType>,
+            [ArgSpec<RPCPortPtrT>]
+        >,
     ]
   >;
   let Headers = [
diff --git a/libc/src/__support/RPC/rpc.h b/libc/src/__support/RPC/rpc.h
index 08c1dfd10d6d7f3..66c9fb297767780 100644
--- a/libc/src/__support/RPC/rpc.h
+++ b/libc/src/__support/RPC/rpc.h
@@ -319,6 +319,14 @@ template <bool T, typename S> struct Port {
 
   LIBC_INLINE uint16_t get_index() const { return index; }
 
+  /// Waits until this port owns the buffer and returns a pointer to it.
+  LIBC_INLINE Buffer *get_buffer() const {
+    uint32_t in = owns_buffer ? out ^ T : process.load_inbox(lane_mask, index);
+    process.wait_for_ownership(lane_mask, index, out, in);
+
+    return &process.packet[index].payload.slot[gpu::get_lane_id()];
+  }
+
   LIBC_INLINE void close() {
     // The server is passive, if it own the buffer when it closes we need to
     // give ownership back to the client.
@@ -347,7 +355,9 @@ struct Client {
       : process(port_count, buffer) {}
 
   using Port = rpc::Port<false, Packet<gpu::LANE_SIZE>>;
-  template <uint16_t opcode> LIBC_INLINE Port open();
+
+  template <uint16_t opcode> LIBC_INLINE Port open() { return open(opcode); }
+  LIBC_INLINE Port open(const uint16_t opcode);
 
 private:
   Process<false, Packet<gpu::LANE_SIZE>> process;
@@ -510,8 +520,10 @@ LIBC_INLINE void Port<T, S>::recv_n(void **dst, uint64_t *size, A &&alloc) {
 /// only open a port if we find an index that is in a valid sending state. That
 /// is, there are send operations pending that haven't been serviced on this
 /// port. Each port instance uses an associated \p opcode to tell the server
-/// what to do.
-template <uint16_t opcode> LIBC_INLINE Client::Port Client::open() {
+/// what to do. It is very important that the \p opcode value is identical
+/// across each lane. Use the template version of this unless absolutely
+/// necessary.
+LIBC_INLINE Client::Port Client::open(const uint16_t opcode) {
   // Repeatedly perform a naive linear scan for a port that can be opened to
   // send data.
   for (uint32_t index = 0;; ++index) {
diff --git a/libc/src/gpu/CMakeLists.txt b/libc/src/gpu/CMakeLists.txt
index e20228516b5112d..822b7ba2f89fe62 100644
--- a/libc/src/gpu/CMakeLists.txt
+++ b/libc/src/gpu/CMakeLists.txt
@@ -8,3 +8,58 @@ add_entrypoint_object(
     libc.src.__support.RPC.rpc_client
     libc.src.__support.GPU.utils
 )
+
+add_entrypoint_object(
+  rpc_open_port
+  SRCS
+    rpc_open_port.cpp
+  HDRS
+    rpc_open_port.h
+  DEPENDS
+    libc.src.__support.RPC.rpc_client
+    libc.src.__support.GPU.utils
+)
+
+add_entrypoint_object(
+  rpc_close_port
+  SRCS
+    rpc_close_port.cpp
+  HDRS
+    rpc_close_port.h
+  DEPENDS
+    libc.src.__support.RPC.rpc_client
+    libc.src.__support.GPU.utils
+)
+
+add_entrypoint_object(
+  rpc_get_buffer
+  SRCS
+    rpc_get_buffer.cpp
+  HDRS
+    rpc_get_buffer.h
+  DEPENDS
+    libc.src.__support.RPC.rpc_client
+    libc.src.__support.GPU.utils
+)
+
+add_entrypoint_object(
+  rpc_send
+  SRCS
+    rpc_send.cpp
+  HDRS
+    rpc_send.h
+  DEPENDS
+    libc.src.__support.RPC.rpc_client
+    libc.src.__support.GPU.utils
+)
+
+add_entrypoint_object(
+  rpc_recv
+  SRCS
+    rpc_recv.cpp
+  HDRS
+    rpc_recv.h
+  DEPENDS
+    libc.src.__support.RPC.rpc_client
+    libc.src.__support.GPU.utils
+)
diff --git a/libc/src/gpu/rpc_close_port.cpp b/libc/src/gpu/rpc_close_port.cpp
new file mode 100644
index 000000000000000..f4d9f4b194a4b7b
--- /dev/null
+++ b/libc/src/gpu/rpc_close_port.cpp
@@ -0,0 +1,25 @@
+//===---------- GPU implementation of the external RPC port interface -----===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/gpu/rpc_close_port.h"
+
+#include "src/__support/GPU/utils.h"
+#include "src/__support/RPC/rpc_client.h"
+#include "src/__support/common.h"
+
+namespace LIBC_NAMESPACE {
+
+static_assert(sizeof(rpc_port_t) == sizeof(rpc::Client::Port), "ABI mismatch");
+static_assert(sizeof(rpc_buffer_t) == sizeof(rpc::Buffer), "ABI mismatch");
+
+LLVM_LIBC_FUNCTION(void, rpc_close_port, (rpc_port_t * handle)) {
+  rpc::Client::Port *port = reinterpret_cast<rpc::Client::Port *>(handle);
+  port->close();
+}
+
+} // namespace LIBC_NAMESPACE
diff --git a/libc/src/gpu/rpc_close_port.h b/libc/src/gpu/rpc_close_port.h
new file mode 100644
index 000000000000000..2be39ce3698a384
--- /dev/null
+++ b/libc/src/gpu/rpc_close_port.h
@@ -0,0 +1,20 @@
+//===-- Implementation header for RPC functions -----------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_GPU_RPC_CLOSE_PORT_H
+#define LLVM_LIBC_SRC_GPU_RPC_CLOSE_PORT_H
+
+#include <gpu/rpc.h>
+
+namespace LIBC_NAMESPACE {
+
+void rpc_close_port(rpc_port_t *handle);
+
+} // namespace LIBC_NAMESPACE
+
+#endif // LLVM_LIBC_SRC_GPU_RPC_CLOSE_PORT_H
diff --git a/libc/src/gpu/rpc_get_buffer.cpp b/libc/src/gpu/rpc_get_buffer.cpp
new file mode 100644
index 000000000000000..10097b404c81c46
--- /dev/null
+++ b/libc/src/gpu/rpc_get_buffer.cpp
@@ -0,0 +1,25 @@
+//===---------- GPU implementation of the external RPC port interface -----===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/gpu/rpc_get_buffer.h"
+
+#include "src/__support/GPU/utils.h"
+#include "src/__support/RPC/rpc_client.h"
+#include "src/__support/common.h"
+
+namespace LIBC_NAMESPACE {
+
+static_assert(sizeof(rpc_port_t) == sizeof(rpc::Client::Port), "ABI mismatch");
+static_assert(sizeof(rpc_buffer_t) == sizeof(rpc::Buffer), "ABI mismatch");
+
+LLVM_LIBC_FUNCTION(rpc_buffer_t *, rpc_get_buffer, (rpc_port_t * handle)) {
+  rpc::Client::Port *port = reinterpret_cast<rpc::Client::Port *>(handle);
+  return reinterpret_cast<rpc_buffer_t *>(port->get_buffer());
+}
+
+} // namespace LIBC_NAMESPACE
diff --git a/libc/src/gpu/rpc_get_buffer.h b/libc/src/gpu/rpc_get_buffer.h
new file mode 100644
index 000000000000000..5a07db1a5470085
--- /dev/null
+++ b/libc/src/gpu/rpc_get_buffer.h
@@ -0,0 +1,20 @@
+//===-- Implementation header for RPC functions -----------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_GPU_RPC_GET_BUFFER_H
+#define LLVM_LIBC_SRC_GPU_RPC_GET_BUFFER_H
+
+#include <gpu/rpc.h>
+
+namespace LIBC_NAMESPACE {
+
+rpc_buffer_t *rpc_get_buffer(rpc_port_t *handle);
+
+} // namespace LIBC_NAMESPACE
+
+#endif // LLVM_LIBC_SRC_GPU_RPC_GET_BUFFER_H
diff --git a/libc/src/gpu/rpc_open_port.cpp b/libc/src/gpu/rpc_open_port.cpp
new file mode 100644
index 000000000000000..73dab5066fa1bfb
--- /dev/null
+++ b/libc/src/gpu/rpc_open_port.cpp
@@ -0,0 +1,26 @@
+//===---------- GPU implementation of the external RPC port interface -----===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/gpu/rpc_open_port.h"
+
+#include "src/__support/CPP/bit.h"
+#include "src/__support/GPU/utils.h"
+#include "src/__support/RPC/rpc_client.h"
+#include "src/__support/common.h"
+
+namespace LIBC_NAMESPACE {
+
+static_assert(sizeof(rpc_port_t) == sizeof(rpc::Client::Port), "ABI mismatch");
+static_assert(sizeof(rpc_buffer_t) == sizeof(rpc::Buffer), "ABI mismatch");
+
+LLVM_LIBC_FUNCTION(rpc_port_t, rpc_open_port, (unsigned opcode)) {
+  rpc::Client::Port port = rpc::client.open(static_cast<uint16_t>(opcode));
+  return cpp::bit_cast<rpc_port_t>(port);
+}
+
+} // namespace LIBC_NAMESPACE
diff --git a/libc/src/gpu/rpc_open_port.h b/libc/src/gpu/rpc_open_port.h
new file mode 100644
index 000000000000000..553fa59120aaf68
--- /dev/null
+++ b/libc/src/gpu/rpc_open_port.h
@@ -0,0 +1,20 @@
+//===-- Implementation header for RPC functions -----------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_GPU_RPC_OPEN_PORT_H
+#define LLVM_LIBC_SRC_GPU_RPC_OPEN_PORT_H
+
+#include <gpu/rpc.h>
+
+namespace LIBC_NAMESPACE {
+
+rpc_port_t rpc_open_port(unsigned opcode);
+
+} // namespace LIBC_NAMESPACE
+
+#endif // LLVM_LIBC_SRC_GPU_RPC_OPEN_PORT_H
diff --git a/libc/src/gpu/rpc_recv.cpp b/libc/src/gpu/rpc_recv.cpp
new file mode 100644
index 000000000000000..96086dc2b17a345
--- /dev/null
+++ b/libc/src/gpu/rpc_recv.cpp
@@ -0,0 +1,25 @@
+//===---------- GPU implementation of the external RPC port interface -----===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/gpu/rpc_recv.h"
+
+#include "src/__support/GPU/utils.h"
+#include "src/__support/RPC/rpc_client.h"
+#include "src/__support/common.h"
+
+namespace LIBC_NAMESPACE {
+
+static_assert(sizeof(rpc_port_t) == sizeof(rpc::Client::Port), "ABI mismatch");
+static_assert(sizeof(rpc_buffer_t) == sizeof(rpc::Buffer), "ABI mismatch");
+
+LLVM_LIBC_FUNCTION(void, rpc_recv, (rpc_port_t * handle)) {
+  rpc::Client::Port *port = reinterpret_cast<rpc::Client::Port *>(handle);
+  port->recv([](rpc::Buffer *) { /* Filled by rpc_get_buffer() */ });
+}
+
+} // namespace LIBC_NAMESPACE
diff --git a/libc/src/gpu/rpc_recv.h b/libc/src/gpu/rpc_recv.h
new file mode 100644
index 000000000000000..926565c3b0c16a0
--- /dev/null
+++ b/libc/src/gpu/rpc_recv.h
@@ -0,0 +1,20 @@
+//===-- Implementation header for RPC functions -----------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_GPU_RPC_RECV_H
+#define LLVM_LIBC_SRC_GPU_RPC_RECV_H
+
+#include <gpu/rpc.h>
+
+namespace LIBC_NAMESPACE {
+
+void rpc_recv(rpc_port_t *handle);
+
+} // namespace LIBC_NAMESPACE
+
+#endif // LLVM_LIBC_SRC_GPU_RPC_RECV_H
diff --git a/libc/src/gpu/rpc_send.cpp b/libc/src/gpu/rpc_send.cpp
new file mode 100644
index 000000000000000..4f4f5bcc2a5eb80
--- /dev/null
+++ b/libc/src/gpu/rpc_send.cpp
@@ -0,0 +1,25 @@
+//===---------- GPU implementation of the external RPC port interface -----===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/gpu/rpc_send.h"
+
+#include "src/__support/GPU/utils.h"
+#include "src/__support/RPC/rpc_client.h"
+#include "src/__support/common.h"
+
+namespace LIBC_NAMESPACE {
+
+static_assert(sizeof(rpc_port_t) == sizeof(rpc::Client::Port), "ABI mismatch");
+static_assert(sizeof(rpc_buffer_t) == sizeof(rpc::Buffer), "ABI mismatch");
+
+LLVM_LIBC_FUNCTION(void, rpc_send, (rpc_port_t * handle)) {
+  rpc::Client::Port *port = reinterpret_cast<rpc::Client::Port *>(handle);
+  port->send([](rpc::Buffer *) { /* Filled by rpc_get_buffer() */ });
+}
+
+} // namespace LIBC_NAMESPACE
diff --git a/libc/src/gpu/rpc_send.h b/libc/src/gpu/rpc_send.h
new file mode 100644
index 000000000000000..e367514181f2ba8
--- /dev/null
+++ b/libc/src/gpu/rpc_send.h
@@ -0,0 +1,20 @@
+//===-- Implementation header for RPC functions -----------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_GPU_RPC_SEND_H
+#define LLVM_LIBC_SRC_GPU_RPC_SEND_H
+
+#include <gpu/rpc.h>
+
+namespace LIBC_NAMESPACE {
+
+void rpc_send(rpc_port_t *handle);
+
+} // namespace LIBC_NAMESPACE
+
+#endif // LLVM_LIBC_SRC_GPU_RPC_SEND_H
diff --git a/libc/test/integration/startup/gpu/CMakeLists.txt b/libc/test/integration/startup/gpu/CMakeLists.txt
index 7555986b16df452..18764ad2bcaf0b0 100644
--- a/libc/test/integration/startup/gpu/CMakeLists.txt
+++ b/libc/test/integration/startup/gpu/CMakeLists.txt
@@ -44,6 +44,19 @@ add_integration_test(
    rpc_interface_test.cpp
 )
 
+add_integration_test(
+  startup_rpc_external_interface_test
+  SUITE libc-startup-tests
+  SRCS
+    rpc_external_interface_test.cpp
+  DEPENDS
+    libc.src.gpu.rpc_open_port
+    libc.src.gpu.rpc_get_buffer
+    libc.src.gpu.rpc_send
+    libc.src.gpu.rpc_recv
+    libc.src.gpu.rpc_close_port
+)
+
 add_integration_test(
   startup_rpc_stream_test
   SUITE libc-startup-tests
diff --git a/libc/test/integration/startup/gpu/rpc_external_interface_test.cpp b/libc/test/integration/startup/gpu/rpc_external_interface_test.cpp
new file mode 100644
index 000000000000000..ae5221f58fc901d
--- /dev/null
+++ b/libc/test/integration/startup/gpu/rpc_external_interface_test.cpp
@@ -0,0 +1,99 @@
+//===-- Loader test to check the external RPC interface with the loader ---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include <gpu/rpc.h>
+
+#include "src/gpu/rpc_close_port.h"
+#include "src/gpu/rpc_get_buffer.h"
+#include "src/gpu/rpc_open_port.h"
+#include "src/gpu/rpc_recv.h"
+#include "src/gpu/rpc_send.h"
+
+#include "include/llvm-libc-types/test_rpc_opcodes_t.h"
+#include "src/__support/GPU/utils.h"
+#include "src/__support/RPC/rpc_client.h"
+#include "test/IntegrationTest/test.h"
+
+using namespace LIBC_NAMESPACE;
+
+// Test to ensure that we can use aribtrary combinations of sends and recieves
+// as long as they are mirrored using the external interface.
+static void test_interface(bool end_with_send) {
+  uint64_t cnt = 0;
+  // rpc::Client::Port port = rpc::client.open<RPC_TEST_INTERFACE>();
+  rpc_port_t port = LIBC_NAMESPACE::rpc_open_port(RPC_TEST_INTERFACE);
+
+  // port.send([&](rpc::Buffer *buffer) { buffer->data[0] = end_with_send; });
+  rpc_buffer_t *buffer = LIBC_NAMESPACE::rpc_get_buffer(&port);
+  buffer->data[0] = end_with_send;
+  LIBC_NAMESPACE::rpc_send(&port);
+
+  // port.send([&](rpc::Buffer *buffer) { buffer->data[0] = cnt = cnt + 1; });
+  buffer = LIBC_NAMESPACE::rpc_get_buffer(&port);
+  buffer->data[0] = cnt = cnt + 1;
+  LIBC_NAMESPACE::rpc_send(&port);
+
+  // port.recv([&](rpc::Buffer *buffer) { cnt = buffer->data[0]; });
+  LIBC_NAMESPACE::rpc_recv(&port);
+  buffer = LIBC_NAMESPACE::rpc_get_buffer(&port);
+  cnt = buffer->data[0];
+
+  // port.send([&](rpc::Buffer *buffer) { buffer->data[0] = cnt = cnt + 1; });
+  buffer = LIBC_NAMESPACE::rpc_get_buffer(&port);
+  buffer->data[0] = cnt = cnt + 1;
+  LIBC_NAMESPACE::rpc_send(&port);
+
+  // port.recv([&](rpc::Buffer *buffer) { cnt = buffer->data[0]; });
+  LIBC_NAMESPACE::rpc_recv(&port);
+  buffer = LIBC_NAMESPACE::rpc_get_buffer(&port);
+  cnt = buffer->data[0];
+
+  // port.send([&](rpc::Buff...
[truncated]

Copy link

github-actions bot commented Nov 6, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

/// what to do.
template <uint16_t opcode> LIBC_INLINE Client::Port Client::open() {
/// what to do. It is very important that the \p opcode value is identical
/// across each lane. Use the template version of this unless absolutely
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see how the template version is different than the version with the argument, other than the opcode being locked in at compile time. Could you explain a bit more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is some GPU specific magic. Basically GPUs use so-called "SIMT" execution where we have execution units that are 32 or 64 lanes wide that all execute at once. These are called "warps" or "wavefronts" depending on the platform. This mean that some "threads" are effectively held hostage by other threads if they are in the same warp. So, the unit of least parallelism is actually a warp / wavefront here, which is why the RPC interface handles a whole warp / wavefront at a time. So, given a size of 32 and some code like,

int value = thread_id % 2 ? 0 : 1;
some_fn(value);

Will result in a warp with alternating values, but the whole warp / wavefront is still "convergent" that means that all "threads" are still active. If we apply this logic to opening a port, we could have the situation where a single warp tries to handle two separate opcodes, which is a break of the asusmption that each RPC call handles a single warp / wavefront. We can avoid this by forcing it to be a compile time constant via the template.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thank you for explaining

@@ -0,0 +1,26 @@
//===---------- GPU implementation of the external RPC port interface -----===//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: Your cpp files have the title too far to the right in the first line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out.


namespace LIBC_NAMESPACE {

static_assert(sizeof(rpc_port_t) == sizeof(rpc::Client::Port), "ABI mismatch");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to have these checks in a more centralized place? Maybe in rpc_client.h?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is that we can't export the original C++ definitions, this was something Siva was adamant about because we didn't want details from libc leaking from the headers. So basically rpc_client.h is exported, but in the implementation .cpp file we make sure that the definitions are compatible with what we expect. That ensures that if we change the interface this doens't silently break.

>,
FunctionSpec<
"rpc_send",
RetValSpec<VoidType>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

design question: Would it make sense to give send, recv, and close return values of some sort? Probably an int so it can return various error codes or 0 on success to match other libc functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's necessary, there shouldn't be any failure modes once a port has been opened. Because we are executing in a unified address space, we don't need to worry about network effects like gRPC or similar. Internally we pretty much just spin until memory happens and then continue. We could potentially expand the internface to allow the lambda we're copying to fail, but I don't think there's any utility in that. If it's for consistently I could make it return an int and just hardwire it to zero.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No error codes. It is fundamentally crucial that these functions do not and can not fail.

This permits implementing arbitrary functions that execute across the two machines with exactly the interface they have on one machine. No extra integer value to mean "network failed", no replacing double return types with a pair of double and boolean, no extra global variable named rpcerrno that people are supposed to check.

This trickery relative to the usual RPC problem of messing up the interfaces works because communication on a shared memory machine only fails when the whole machine has fallen over, which is non-recoverable, and each function does not dynamically allocate a resource that can fail.

At least by design - in practice we may currently be undersizing the ports structure to guarantee progress in the worst case scheduling, but if so that's fixable by making N large enough for a given machine. The cost is order of a few megabytes of memory which in the scheme of things doesn't matter for a singleton.

Copy link
Collaborator

@JonChesterfield JonChesterfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function pointers are a bad thing. We should be exporting the C++ interface as C++. not doing this dance through a C shim in exchange for widening the scope for programmer error and performance cost, plus further rolling the die with the compiler correctness.

The implementation has reinterpret cast all over the place and shouldn't. Declare the port as something like
struct port
{
union {
uint64_t d;
void* p;
};
};

or just as struct {uint64_t}, and generally if people decide to read or write that integer themselves they get to keep all the pieces.

@@ -17,4 +17,7 @@ typedef struct {
__UINT64_TYPE__ data[8];
} rpc_buffer_t;

typedef void (*rpc_callback_t)(rpc_buffer_t *buffer, void *data,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't call these things uint32_t, we probably can't call them rpc_buffer_t either. Presumably __LIBC_RPC_BUFFER_T or whatever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are macros provided by GCC and clang that pretty much expand to the same thing, don't think we need to follow the naming convention.

@@ -17,9 +17,12 @@ namespace LIBC_NAMESPACE {
static_assert(sizeof(rpc_port_t) == sizeof(rpc::Client::Port), "ABI mismatch");
static_assert(sizeof(rpc_buffer_t) == sizeof(rpc::Buffer), "ABI mismatch");

LLVM_LIBC_FUNCTION(void, rpc_recv, (rpc_port_t * handle)) {
LLVM_LIBC_FUNCTION(void, rpc_recv,
(rpc_port_t * handle, rpc_callback_t callback, void *data)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would these be reinterpret casts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rpc_port_t is more or less a type-erased buffer. We convert it back to the real type since we know what it is internally. To the user, it's just like 32 bytes.

LIBC_NAMESPACE::rpc_recv(
&port,
[](rpc_buffer_t *buffer, void *data, uint32_t) {
*reinterpret_cast<uint64_t *>(data) = buffer->data[0];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise, why is there type punning happening here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the input data for a function pointer interface needs to be void and the implementation defines what it is. Same story with the numerous HSA callbacks or pthreads.

@JonChesterfield
Copy link
Collaborator

There's a fundamental layering violation in what we've done here which is probably the root cause of the API grief.

Libc depends on syscall, libc does not contain syscall. We should aspire to put the RPC machinery somewhere else, leave the implementation of fopen and similar in libc, but have the templated state machine exist independent of the exported API constructs of libc. It's a good candidate for some interpretations of llvm/offload.

Suggest we take the bits of rpc that would be nice to export in type safe, non-function-pointery fashion and make those usable in a header only sense, not wire it up into libc's interface, and #include them from within openmp or whatever with total disregard to the exported C interface until such time as they have a new home.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Nov 9, 2023

The implementation has reinterpret cast all over the place and shouldn't. Declare the port as something like

Exporting an opaque handle is a pretty common idiom. The interface is completely hidden from the user, they just need to know it's data. Doing bit casts should be legal here.

There's a fundamental layering violation in what we've done here which is probably the root cause of the API grief.

Libc depends on syscall, libc does not contain syscall. We should aspire to put the RPC machinery somewhere else, leave the implementation of fopen and similar in libc, but have the templated state machine exist independent of the exported API constructs of libc. It's a good candidate for some interpretations of llvm/offload.

Somewhat agree in the long term. The general issue is then who's running the show in the case of a separate implementation. Maybe we could just make the actual client weak_odr and merge all references to it from its implementation in some .cpp file somewhere. Though it would be somewhat painful to link an external project in the tests.

Suggest we take the bits of rpc that would be nice to export in type safe, non-function-pointery fashion and make those usable in a header only sense, not wire it up into libc's interface, and #include them from within openmp or whatever with total disregard to the exported C interface until such time as they have a new home.

This whole interface could be exported as send_n and recv_n if nothing else. That would require the user providing their own buffer. Could go that route instead for the time being. As it stands it's much more work to do any other restructuring.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Mar 11, 2024

I have updated this to only export the send_n and recv_n interface. This removes all the complexity with how to handle the internal buffer and such. I think it's perfectly reasonable to give people an interface to just stream N bytes back and forth because it can be used to create the other interface. It's less optimal in typical usage, but should be optimized sufficiently well from my investigations.

@jhuber6 jhuber6 force-pushed the ExternalRPC branch 2 times, most recently from 7ab42ef to 6820c7a Compare March 11, 2024 17:53
Copy link
Collaborator

@JonChesterfield JonChesterfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reinterpret_casts I cautioned against in November are still here, this time with clear cut undefined behaviour.

rpc::Client::Port *port = reinterpret_cast<rpc::Client::Port *>(handle);
port-> bang

Handle is 32 bytes. It's not an instance of port. Totally invalid C++, you need to handle aliasing with far more paranoia than this to be correct. Even if you're going with "UB is fine, clang isn't currently breaking me", that's a misaligned load and I doubt GPUs support that gracefully.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Mar 15, 2024

The reinterpret_casts I cautioned against in November are still here, this time with clear cut undefined behaviour.

rpc::Client::Port *port = reinterpret_cast<rpc::Client::Port *>(handle);
port-> bang

Handle is 32 bytes. It's not an instance of port. Totally invalid C++, you need to handle aliasing with far more paranoia than this to be correct. Even if you're going with "UB is fine, clang isn't currently breaking me", that's a misaligned load and I doubt GPUs support that gracefully.

Could you explain a bit further into what's undefined here? Basically we are doing this

struct Port { ... }
struct port_t { uint8_t data[sizeof(Port)]; }

port_t p = bit_cast<port_t>(Port());
void *ptr = &p
Port &original = *reinterpret_cast<Port>(ptr);

The sketchy part may be the original bitcast, because the base type is not trivially constructable. But because we already created it, shouldn't it be valid to do an in-memory bitcast? After that it's pretty much just typical type erasure since it's referring to the same memory.

@jhuber6 jhuber6 force-pushed the ExternalRPC branch 2 times, most recently from 369b63b to 09abb45 Compare March 18, 2024 19:38
Summary:
This patch adds new extensions that allow us to export the RPC interface
from `libc` to other programs. This should allow external users of the
GPU `libc` to interface with the RPC client (which more or less behaves
like syscalls in this context). This is done by wrapping the interface
into a C-style function call.

Obviously, this approach is far less safe than the carefully crafted C++
interface. For example, we now expose the internal packet buffer, and it
is theoretically possible to open a single port with conflicting opcodes
and break the whole interface. So, extra care will be needed when
interacting with this. However, the usage is very similar as shown by
the new test.

This somewhat stretches the concept of `libc` just doing `libc` things,
but I think this is important enough to justify it. It is difficult to
split this out, as we do not want to have the situation where we have
multiple RPC clients running at one time, so for now it makes sense to
just leave it in `libc`.
template <typename To, typename From>
LIBC_INLINE constexpr cpp::enable_if_t<
(sizeof(To) == sizeof(From)) &&
// Implementation of bit_cast that cannot use the compiler builtin must be
// trivially-constructible To, to avoid UB in the implementation.
#if !LIBC_HAS_BUILTIN(__builtin_bit_cast)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this would be the case - even if we had to DIY a bitcast from scratch, wouldn't default constructible suffice?

Also aren't these things all trivially constructible anyway, in which case I don't know why there's an #if around this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't write this code, but it's likely going off of what the documentation says https://en.cppreference.com/w/cpp/numeric/bit_cast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Port is not trivially constructable because it uses a reference instead of a pointer.

/// port using the platform's returned value.
template <uint16_t opcode>
[[clang::convergent]] LIBC_INLINE Client::Port Client::open() {
/// port using the platform's returned value. It is required that \p opcode is
Copy link
Collaborator

@JonChesterfield JonChesterfield Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loss of invariant is not great - what used to be a language sema enforced invariant is now a comment.

It seems relatively likely that someone would try to call a different function on different lanes of the wave by passing a variable for opcode, since it's an internal implementation quirk that we do not support that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not the best solution but not strictly required for correctness. This code isn't nullable, so there's no way for it to fail. Otherwise we could just return some null value if the ballot isn't all equal to the first lane or something. Regardless, it's not ideal.

LLVM_LIBC_FUNCTION(void, rpc_close_port, (rpc_port_t * handle)) {
rpc::Client::Port port = cpp::bit_cast<rpc::Client::Port>(*handle);
port.close();
*handle = cpp::bit_cast<rpc_port_t>(port);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be returning the port after it has been closed? Presumably the caller can do nothing with that value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same restriction the C++ interface has, we just require that it be closed by the user. I remember toying around with destructors but it was just too much overhead and pain for what we should just require as a part of using it.


namespace LIBC_NAMESPACE {

void rpc_close_port(rpc_port_t *handle);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be passing the struct around by value instead?

LLVM_LIBC_FUNCTION(void, rpc_recv_n,
(rpc_port_t * handle, void *dst, size_t *size)) {
rpc::Client::Port port = cpp::bit_cast<rpc::Client::Port>(*handle);
port.recv_n(dst, reinterpret_cast<uint64_t *>(size));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just pick one of size_t and uint64_t and use it consistently?

I think there are 32 bit nvptx things out there based on needing to install gcc multilib to use it and there's a credible risk size_t is 32 bit on 32 bit platforms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32-bit NVPTX is officially removed since CUDA 11.2 I believe.

Copy link
Collaborator

@JonChesterfield JonChesterfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left various comments inline.

I think exporting RPC from libc by wrapping the C++ in a layer of C is fundamentally the wrong thing to be doing and the signal to noise ratio of this patch somewhat supports that.

The RPC layer benefits from being in C++. We'd like it to be easily usable outside of libc, e.g. from the openmp runtime. The llvm libc is implemented in C++.

Instead of doing this force-through-C dance with the associated unsafety and complexity, and given that the libc project doesn't want to export C++ interfaces, we should move RPC to some other subproject. The new offload one seems an obvious fit. Then libc uses the C++ version directly from that subproject.

The analogy of libc depending on syscall, and rpc being an implementation of syscall, seems self evident. It's a historical quirk that we didn't have anywhere better to put the text files in the repo.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Mar 19, 2024

Left various comments inline.

I think exporting RPC from libc by wrapping the C++ in a layer of C is fundamentally the wrong thing to be doing and the signal to noise ratio of this patch somewhat supports that.

The RPC layer benefits from being in C++. We'd like it to be easily usable outside of libc, e.g. from the openmp runtime. The llvm libc is implemented in C++.

Instead of doing this force-through-C dance with the associated unsafety and complexity, and given that the libc project doesn't want to export C++ interfaces, we should move RPC to some other subproject. The new offload one seems an obvious fit. Then libc uses the C++ version directly from that subproject.

The analogy of libc depending on syscall, and rpc being an implementation of syscall, seems self evident. It's a historical quirk that we didn't have anywhere better to put the text files in the repo.

I more or less agree with the fact that this should eventually be moved somewhere else. What I'm imagining is that all the GPU specific utils, like wrappers around builtins and stuff, and this should all be in a header, likely in the offload project. The problem is that we don't have that project yet. But I've asked around and including from other projects is fine generally. The real difficulty is how to handle the RPC server portion. For example, with printf we like being able to use the LLVM libc printf interface as it lets us do custom parsing and handling of the string. Throwing libraries around a bunch of different CMake invocations is a huge pain in general so I'd need to think about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants