Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

erts: Decouple use of single-mapped memory from perf support #6340

Merged

Conversation

frej
Copy link
Contributor

@frej frej commented Oct 1, 2022

Running the JIT with single-mapped RWX memory for JIT:ed machine code is needed for perf support. But as it is also useful for running under QEMU user mode emulation [1] this patch adds a new flag +JMsingle bool which controls the use of single-mapped RWX memory without triggering the output of perf-related metadata.

The naming of the flag is intended to follow the (perceived) pattern introduced by the +JPperf-flag, that is, J is a JIT-related option and M stands for memory related things.

Personally I use the additional patch below in order to use the same file system image for development on an amd64 build host and for deployment to an aarch64-target board. The patch detects when the VM is running using QEMU and if so, enables single-mapped memory. As the patch only works on Linux and doesn't try to identify the version of QEMU (if the bug gets fixed), I don't think it belongs to OTP master. But as it is nevertheless useful, I include it here:

diff --git a/erts/emulator/beam/erl_init.c b/erts/emulator/beam/erl_init.c
index 42216da0b2..6f11c7586b 100644
--- a/erts/emulator/beam/erl_init.c
+++ b/erts/emulator/beam/erl_init.c
@@ -2408,6 +2408,40 @@ erl_start(int argc, char **argv)
 	erts_usage();
     }

+
+#if defined(HAVE_LINUX_PERF_SUPPORT)
+    /* We want to detect if we are running under QEMU user mode
+       emulation in order to activate single-mapped RWX memory for the
+       JIT:ed code. This only works on Linux, so we use
+       HAVE_LINUX_PERF_SUPPORT to conditionally compile this piece of
+       detection code.
+
+       To detect QEMU user mode we make use of the fact that our
+       parent process will always be QEMU, we detect that by looking
+       up our parent process and then checking if the final path
+       component of the symlink /proc/<parent>/exe starts with
+       "qemu-".
+    */
+    {
+        char symlink_buf[21 + 11]; /* space for any 64 bit integer
+                                      + /proc/%d/exe and terminator */
+        char target_buf[MAXPATHLEN];
+        ssize_t l;
+        erts_snprintf(symlink_buf, sizeof(symlink_buf),
+                      "/proc/%d/exe", getppid());
+        l = readlink(symlink_buf, target_buf, sizeof(target_buf));
+        if (l > 0 && l != sizeof(target_buf)) {
+            char *last_path_separator;
+
+            target_buf[l] = 0;
+            last_path_separator = memrchr(target_buf, '/', l);
+            if (last_path_separator && strncmp(last_path_separator + 1,
+                                               "qemu-", 5) == 0)
+                erts_jit_single_map = 1;
+        }
+    }
+#endif
+
 /* Output format on windows for sprintf defaults to three exponents.
  * We use two-exponent to mimic normal sprintf behaviour.
  */

[1] There is a bug in QEMU,
see https://gitlab.com/qemu-project/qemu/-/issues/1034

@github-actions
Copy link
Contributor

github-actions bot commented Oct 1, 2022

CT Test Results

       3 files     132 suites   44m 35s ⏱️
1 499 tests 1 447 ✔️ 51 💤 1
1 893 runs  1 823 ✔️ 69 💤 1

For more details on these failures, see this check.

Results for commit 8da7a93.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@okeuday
Copy link
Contributor

okeuday commented Oct 2, 2022

@frej The current QEMU bug discussion indicated it wasn't fixable at that level (i.e., it is related to the Linux kernel instead of QEMU). So, the +JMsingle bool flag is likely necessary for other virtualization using Erlang/OTP + JIT too, right? (not only a QEMU problem)

@dmorneau
Copy link

dmorneau commented Oct 2, 2022

What is the downside / tradeoff of enabling the flag? It could be useful to mention in the doc. This looks really useful!

@frej
Copy link
Contributor Author

frej commented Oct 2, 2022

@okeuday My take away from the bug discussion was that the current translation strategy in QEMU for user mode emulation doesn't consider the case when a physical page is mapped into more than one place in the emulated process' address space. When the JIT updates a page using the RW-mapping, QEMU fails to update the executable mapping and when we then branch into the new code, we crash.

As the memory is (at least on Linux) allocated by acquiring an fd to an anonymous file (or POSIX SHM) and then mmap:ing it with different permissions, you could track that in QEMU, it would probably be costly and only be needed for JITs. The information is also available in /proc/ if your process has the CAP_SYS_ADMIN capability, so that's why I don't consider it unfixable even if it realistically never will be fixed.

So, the +JMsingle bool flag is likely necessary for other virtualization using Erlang/OTP + JIT too, right? (not only a QEMU problem)

That depends, if the virtualization implementation does dynamic translation of the JIT:ed code (and doesn't consider multiple maps), it will be needed. If it does not and the virtualization implementation doesn't screw up multiple mmap:ings of a single file, it won't be needed.

@frej
Copy link
Contributor Author

frej commented Oct 2, 2022

What is the downside / tradeoff of enabling the flag? It could be useful to mention in the doc. This looks really useful!

The downside of enabling the flag is that you are using RWX pages, but the pro is that BEAM runs without segfaulting. The case for not using RWX pages is too complicated to cover in the OTP documentation, look to what SpiderMonkey says about it.

@rickard-green rickard-green added the team:VM Assigned to OTP team VM label Oct 3, 2022
Copy link
Contributor

@bjorng bjorng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but I suggest you add a test case with a smoke test of the new option (that is, that the emulator can be started when the option is given).

@@ -149,7 +155,6 @@ static JitAllocator *pick_allocator() {
"memory. Either allow this or disable the "
"'+JPperf' option.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the +JMsingle option be mentioned as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, will fix.

@frej
Copy link
Contributor Author

frej commented Oct 3, 2022

@bjorng I will add a smoke test, but probably not until the weekend as this is fallout from a hobby project.

@frej frej force-pushed the frej/decouple-use-of-single-mapped-memory branch from 6affacd to 18b5c6f Compare October 4, 2022 18:16
@frej
Copy link
Contributor Author

frej commented Oct 4, 2022

The review suggestions are now incorporated.

@bjorng bjorng added the testing currently being tested, tag is used by OTP internal CI label Oct 5, 2022
@bjorng
Copy link
Contributor

bjorng commented Oct 5, 2022

Thanks! Added to our daily builds.

@bjorng
Copy link
Contributor

bjorng commented Oct 6, 2022

The test case fails on an Apple Silicon Mac because the emulator crashes with JMsingle true.

Apparently, on Apple Macs it is not allowed to have memory that is both writable and executable, but that is not detected when initializing the allocator. The crash occurs later when attempting to write into the allocated memory that does not appear to be writable.

I have pushed a suggestion for a fix as a fixup commit.

@bjorng bjorng removed the testing currently being tested, tag is used by OTP internal CI label Oct 6, 2022
@@ -180,6 +180,48 @@ annotate(Config) ->
[Symbol, Anno])
end.

run_jmsingle_test(Param, ExpectSuccess, ErrorMsg) ->
Cmd = "erl +JMsingle " ++ Param ++ " -noshell " ++
"-eval 'io:format(\"All is well~n\"),erlang:halt(0).'",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will most likely not work on Windows. As far as I know, single quotes are not supported on Windows. The following is more likely to work:

Suggested change
"-eval 'io:format(\"All is well~n\"),erlang:halt(0).'",
"-eval erlang:display(all_is_well),erlang:halt(0).",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed a fixup for this, I will wait for a clarification from @garazdawi before dealing with the netbsd issue.

Comment on lines 211 to 212
%% +JMsingle true does not work on macOS running
%% on Apple Silicon computers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os:type() == {unix,netbsd} has the same issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the allocator error out at creation time or do we have to check for the os in JitAllocator::create_allocator() just like on the mac?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It errors out at creation time. So the only thing we need to fix is the testcase.

@frej
Copy link
Contributor Author

frej commented Oct 7, 2022

@garazdawi , @bjorng: I have pushed a fixup for the NetBSD issue, should I wait for results from more platforms, or is it time to squash the fixups (if the CI passes)?

@bjorng bjorng added the testing currently being tested, tag is used by OTP internal CI label Oct 9, 2022
@bjorng
Copy link
Contributor

bjorng commented Oct 10, 2022

You can squash the commits now. Tests have been run successfully on all platforms.

Running the JIT with single-mapped RWX memory for JIT:ed machine code
is needed for `perf` support. But as it is also useful for running
under QEMU user mode emulation [1] this patch adds a new flag
`+JMsingle bool` which controls the use of single-mapped RWX memory
without triggering the output of perf-related metadata.

The naming of the flag is intended to follow the (perceived) pattern
introduced by the `+JPperf`-flag, that is, `J` is a JIT-related option
and `M` stands for memory related things.

Thanks to Björn Gustavsson <bjorn@erlang.org> for suggesting how to
deal with Apple-silicon aarch64 and how to make the smoke test work on
Windows.

Personally I use the additional patch below in order to use the same
file system image for development on an amd64 build host and for
deployment to an aarch64-target board. The patch detects when the VM
is running using QEMU and if so, enables single-mapped memory. As the
patch only works on Linux and doesn't try to identify the version of
QEMU (if the bug gets fixed), I don't think it belongs on OTP
master. But as it is nevertheless useful, I include it here:

```
diff --git a/erts/emulator/beam/erl_init.c b/erts/emulator/beam/erl_init.c
index 42216da0b2..6f11c7586b 100644
--- a/erts/emulator/beam/erl_init.c
+++ b/erts/emulator/beam/erl_init.c
@@ -2408,6 +2408,40 @@ erl_start(int argc, char **argv)
 	erts_usage();
     }

+
+#if defined(HAVE_LINUX_PERF_SUPPORT)
+    /* We want to detect if we are running under QEMU user mode
+       emulation in order to activate single-mapped RWX memory for the
+       JIT:ed code. This only works on Linux, so we use
+       HAVE_LINUX_PERF_SUPPORT to conditionally compile this piece of
+       detection code.
+
+       To detect QEMU user mode we make use of the fact that our
+       parent process will always be QEMU, we detect that by looking
+       up our parent process and then checking if the final path
+       component of the symlink /proc/<parent>/exe starts with
+       "qemu-".
+    */
+    {
+        char symlink_buf[21 + 11]; /* space for any 64 bit integer
+                                      + /proc/%d/exe and terminator */
+        char target_buf[MAXPATHLEN];
+        ssize_t l;
+        erts_snprintf(symlink_buf, sizeof(symlink_buf),
+                      "/proc/%d/exe", getppid());
+        l = readlink(symlink_buf, target_buf, sizeof(target_buf));
+        if (l > 0 && l != sizeof(target_buf)) {
+            char *last_path_separator;
+
+            target_buf[l] = 0;
+            last_path_separator = memrchr(target_buf, '/', l);
+            if (last_path_separator && strncmp(last_path_separator + 1,
+                                               "qemu-", 5) == 0)
+                erts_jit_single_map = 1;
+        }
+    }
+#endif
+
 /* Output format on windows for sprintf defaults to three exponents.
  * We use two-exponent to mimic normal sprintf behaviour.
  */
```

[1] There is a bug in QEMU,
    see https://gitlab.com/qemu-project/qemu/-/issues/1034
@frej frej force-pushed the frej/decouple-use-of-single-mapped-memory branch from e7e5c44 to 8da7a93 Compare October 10, 2022 17:38
@bjorng bjorng merged commit 1ea968e into erlang:master Oct 11, 2022
@bjorng
Copy link
Contributor

bjorng commented Oct 11, 2022

Thanks!

pletcher added a commit to AjaxMultiCommentary/ajmc-multicommentary that referenced this pull request Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants