Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explicit compiler fence when pushing frames to ensure safe profiling #11036

Merged
merged 1 commit into from
Jul 3, 2024

Conversation

ivoanjo
Copy link
Contributor

@ivoanjo ivoanjo commented Jun 21, 2024

What does this PR do?

This PR tweaks the vm_push_frame function to add an explicit compiler fence (atomic_signal_fence) to ensure profilers that use signals to interrupt applications (stackprof, vernier, pf2, Datadog profiler) can safely sample from the signal handler.

Motivation:

The vm_push_frame function was specifically tweaked in #3296 to initialize the a frame before updating the cfp pointer.

But since there's nothing stopping the compiler from reordering the initialization of a frame (*cfp =) with the update of the cfp pointer (ec->cfp = cfp) we've been hesitant to rely on this on the Datadog profiler.

In practice, after some experimentation + talking to folks, this reordering does not seem to happen.

But since modern compilers have a way for us to exactly tell them not to do the reordering (atomic_signal_fence), this seems even better.

I've actually extracted vm_push_frame into the "Compiler Explorer" website, which you can use to see the assembly output of this function across many compilers and architectures: https://godbolt.org/z/3oxd1446K

On that link you can observe two things across many compilers:

  1. The compilers are not reordering the writes
  2. The barrier does not change the generated assembly output (== has no cost in practice)

Additional Notes:

The checks added in configure.ac define two new macros:

  • HAVE_STDATOMIC_H
  • HAVE_DECL_ATOMIC_SIGNAL_FENCE

Since Ruby generates an arch-specific config.h header with these macros upon installation, this can be used by profilers and other libraries to test if Ruby was compiled with the fence enabled.

How to test the change?

As I mentioned above, you can check https://godbolt.org/z/3oxd1446K to confirm the compiled output of vm_push_frame does not change in most compilers (at least all that I've checked on that site).

**What does this PR do?**

This PR tweaks the `vm_push_frame` function to add an explicit compiler
fence (`atomic_signal_fence`) to ensure profilers that use signals
to interrupt applications (stackprof, vernier, pf2, Datadog profiler)
can safely sample from the signal handler.

**Motivation:**

The `vm_push_frame` was specifically tweaked in
ruby#3296 to initialize the a frame
before updating the `cfp` pointer.

But since there's nothing stopping the compiler from reordering
the initialization of a frame (`*cfp =`) with the update of the cfp
pointer (`ec->cfp = cfp`) we've been hesitant to rely on this on
the Datadog profiler.

In practice, after some experimentation + talking to folks, this
reordering does not seem to happen.

But since modern compilers have a way for us to exactly tell them
not to do the reordering (`atomic_signal_fence`), this seems even
better.

I've actually extracted `vm_push_frame` into the "Compiler Explorer"
website, which you can use to see the assembly output of this function
across many compilers and architectures: https://godbolt.org/z/3oxd1446K

On that link you can observe two things across many compilers:
1. The compilers are not reordering the writes
2. The barrier does not change the generated assembly output
   (== has no cost in practice)

**Additional Notes:**

The checks added in `configure.ac` define two new macros:
* `HAVE_STDATOMIC_H`
* `HAVE_DECL_ATOMIC_SIGNAL_FENCE`

Since Ruby generates an arch-specific `config.h` header with
these macros upon installation, this can be used by profilers
and other libraries to test if Ruby was compiled with the fence enabled.

**How to test the change?**

As I mentioned above, you can check https://godbolt.org/z/3oxd1446K
to confirm the compiled output of `vm_push_frame` does not change
in most compilers (at least all that I've checked on that site).
@ko1 ko1 merged commit 64fef3b into ruby:master Jul 3, 2024
107 checks passed
@ivoanjo ivoanjo deleted the ivoanjo/ensure-order-in-vm-push-frame branch July 3, 2024 09:48
ivoanjo added a commit to DataDog/ruby that referenced this pull request Jul 3, 2024
… frames to ensure safe profiling

**What does this PR do?**

This PR tweaks the `vm_push_frame` function to add an explicit compiler
fence (`atomic_signal_fence`) to ensure profilers that use signals
to interrupt applications (stackprof, vernier, pf2, Datadog profiler)
can safely sample from the signal handler.

This is a backport of ruby#11036 to Ruby 3.2 .

**Motivation:**

The `vm_push_frame` was specifically tweaked in
ruby#3296 to initialize the a frame
before updating the `cfp` pointer.

But since there's nothing stopping the compiler from reordering
the initialization of a frame (`*cfp =`) with the update of the cfp
pointer (`ec->cfp = cfp`) we've been hesitant to rely on this on
the Datadog profiler.

In practice, after some experimentation + talking to folks, this
reordering does not seem to happen.

But since modern compilers have a way for us to exactly tell them
not to do the reordering (`atomic_signal_fence`), this seems even
better.

I've actually extracted `vm_push_frame` into the "Compiler Explorer"
website, which you can use to see the assembly output of this function
across many compilers and architectures: https://godbolt.org/z/3oxd1446K

On that link you can observe two things across many compilers:
1. The compilers are not reordering the writes
2. The barrier does not change the generated assembly output
   (== has no cost in practice)

**Additional Notes:**

The checks added in `configure.ac` define two new macros:
* `HAVE_STDATOMIC_H`
* `HAVE_DECL_ATOMIC_SIGNAL_FENCE`

Since Ruby generates an arch-specific `config.h` header with
these macros upon installation, this can be used by profilers
and other libraries to test if Ruby was compiled with the fence enabled.

**How to test the change?**

As I mentioned above, you can check https://godbolt.org/z/3oxd1446K
to confirm the compiled output of `vm_push_frame` does not change
in most compilers (at least all that I've checked on that site).
k0kubun pushed a commit that referenced this pull request Jul 3, 2024
…ensure safe profiling (#11090)

**What does this PR do?**

This PR tweaks the `vm_push_frame` function to add an explicit compiler
fence (`atomic_signal_fence`) to ensure profilers that use signals
to interrupt applications (stackprof, vernier, pf2, Datadog profiler)
can safely sample from the signal handler.

This is a backport of #11036 to Ruby 3.3 .

**Motivation:**

The `vm_push_frame` was specifically tweaked in
#3296 to initialize the a frame
before updating the `cfp` pointer.

But since there's nothing stopping the compiler from reordering
the initialization of a frame (`*cfp =`) with the update of the cfp
pointer (`ec->cfp = cfp`) we've been hesitant to rely on this on
the Datadog profiler.

In practice, after some experimentation + talking to folks, this
reordering does not seem to happen.

But since modern compilers have a way for us to exactly tell them
not to do the reordering (`atomic_signal_fence`), this seems even
better.

I've actually extracted `vm_push_frame` into the "Compiler Explorer"
website, which you can use to see the assembly output of this function
across many compilers and architectures: https://godbolt.org/z/3oxd1446K

On that link you can observe two things across many compilers:
1. The compilers are not reordering the writes
2. The barrier does not change the generated assembly output
   (== has no cost in practice)

**Additional Notes:**

The checks added in `configure.ac` define two new macros:
* `HAVE_STDATOMIC_H`
* `HAVE_DECL_ATOMIC_SIGNAL_FENCE`

Since Ruby generates an arch-specific `config.h` header with
these macros upon installation, this can be used by profilers
and other libraries to test if Ruby was compiled with the fence enabled.

**How to test the change?**

As I mentioned above, you can check https://godbolt.org/z/3oxd1446K
to confirm the compiled output of `vm_push_frame` does not change
in most compilers (at least all that I've checked on that site).
ivoanjo added a commit to DataDog/ruby that referenced this pull request Jul 4, 2024
… frames to ensure safe profiling

**What does this PR do?**

This PR tweaks the `vm_push_frame` function to add an explicit compiler
fence (`atomic_signal_fence`) to ensure profilers that use signals
to interrupt applications (stackprof, vernier, pf2, Datadog profiler)
can safely sample from the signal handler.

This is a backport of ruby#11036 to Ruby 3.2 .

**Motivation:**

The `vm_push_frame` was specifically tweaked in
ruby#3296 to initialize the a frame
before updating the `cfp` pointer.

But since there's nothing stopping the compiler from reordering
the initialization of a frame (`*cfp =`) with the update of the cfp
pointer (`ec->cfp = cfp`) we've been hesitant to rely on this on
the Datadog profiler.

In practice, after some experimentation + talking to folks, this
reordering does not seem to happen.

But since modern compilers have a way for us to exactly tell them
not to do the reordering (`atomic_signal_fence`), this seems even
better.

I've actually extracted `vm_push_frame` into the "Compiler Explorer"
website, which you can use to see the assembly output of this function
across many compilers and architectures: https://godbolt.org/z/3oxd1446K

On that link you can observe two things across many compilers:
1. The compilers are not reordering the writes
2. The barrier does not change the generated assembly output
   (== has no cost in practice)

**Additional Notes:**

The checks added in `configure.ac` define two new macros:
* `HAVE_STDATOMIC_H`
* `HAVE_DECL_ATOMIC_SIGNAL_FENCE`

Since Ruby generates an arch-specific `config.h` header with
these macros upon installation, this can be used by profilers
and other libraries to test if Ruby was compiled with the fence enabled.

**How to test the change?**

As I mentioned above, you can check https://godbolt.org/z/3oxd1446K
to confirm the compiled output of `vm_push_frame` does not change
in most compilers (at least all that I've checked on that site).
ivoanjo added a commit to DataDog/ruby that referenced this pull request Jul 5, 2024
… frames to ensure safe profiling

**What does this PR do?**

This PR tweaks the `vm_push_frame` function to add an explicit compiler
fence (`atomic_signal_fence`) to ensure profilers that use signals
to interrupt applications (stackprof, vernier, pf2, Datadog profiler)
can safely sample from the signal handler.

This is a backport of ruby#11036 to Ruby 3.2 .

**Motivation:**

The `vm_push_frame` was specifically tweaked in
ruby#3296 to initialize the a frame
before updating the `cfp` pointer.

But since there's nothing stopping the compiler from reordering
the initialization of a frame (`*cfp =`) with the update of the cfp
pointer (`ec->cfp = cfp`) we've been hesitant to rely on this on
the Datadog profiler.

In practice, after some experimentation + talking to folks, this
reordering does not seem to happen.

But since modern compilers have a way for us to exactly tell them
not to do the reordering (`atomic_signal_fence`), this seems even
better.

I've actually extracted `vm_push_frame` into the "Compiler Explorer"
website, which you can use to see the assembly output of this function
across many compilers and architectures: https://godbolt.org/z/3oxd1446K

On that link you can observe two things across many compilers:
1. The compilers are not reordering the writes
2. The barrier does not change the generated assembly output
   (== has no cost in practice)

**Additional Notes:**

The checks added in `configure.ac` define two new macros:
* `HAVE_STDATOMIC_H`
* `HAVE_DECL_ATOMIC_SIGNAL_FENCE`

Since Ruby generates an arch-specific `config.h` header with
these macros upon installation, this can be used by profilers
and other libraries to test if Ruby was compiled with the fence enabled.

**How to test the change?**

As I mentioned above, you can check https://godbolt.org/z/3oxd1446K
to confirm the compiled output of `vm_push_frame` does not change
in most compilers (at least all that I've checked on that site).
ivoanjo added a commit to DataDog/ruby that referenced this pull request Jul 5, 2024
… frames to ensure safe profiling

**What does this PR do?**

This PR tweaks the `vm_push_frame` function to add an explicit compiler
fence (`atomic_signal_fence`) to ensure profilers that use signals
to interrupt applications (stackprof, vernier, pf2, Datadog profiler)
can safely sample from the signal handler.

This is a backport of ruby#11036 to Ruby 3.2 .

**Motivation:**

The `vm_push_frame` was specifically tweaked in
ruby#3296 to initialize the a frame
before updating the `cfp` pointer.

But since there's nothing stopping the compiler from reordering
the initialization of a frame (`*cfp =`) with the update of the cfp
pointer (`ec->cfp = cfp`) we've been hesitant to rely on this on
the Datadog profiler.

In practice, after some experimentation + talking to folks, this
reordering does not seem to happen.

But since modern compilers have a way for us to exactly tell them
not to do the reordering (`atomic_signal_fence`), this seems even
better.

I've actually extracted `vm_push_frame` into the "Compiler Explorer"
website, which you can use to see the assembly output of this function
across many compilers and architectures: https://godbolt.org/z/3oxd1446K

On that link you can observe two things across many compilers:
1. The compilers are not reordering the writes
2. The barrier does not change the generated assembly output
   (== has no cost in practice)

**Additional Notes:**

The checks added in `configure.ac` define two new macros:
* `HAVE_STDATOMIC_H`
* `HAVE_DECL_ATOMIC_SIGNAL_FENCE`

Since Ruby generates an arch-specific `config.h` header with
these macros upon installation, this can be used by profilers
and other libraries to test if Ruby was compiled with the fence enabled.

**How to test the change?**

As I mentioned above, you can check https://godbolt.org/z/3oxd1446K
to confirm the compiled output of `vm_push_frame` does not change
in most compilers (at least all that I've checked on that site).
ivoanjo added a commit to DataDog/ruby that referenced this pull request Jul 5, 2024
… frames to ensure safe profiling

**What does this PR do?**

This PR tweaks the `vm_push_frame` function to add an explicit compiler
fence (`atomic_signal_fence`) to ensure profilers that use signals
to interrupt applications (stackprof, vernier, pf2, Datadog profiler)
can safely sample from the signal handler.

This is a backport of ruby#11036 to Ruby 3.2 .

**Motivation:**

The `vm_push_frame` was specifically tweaked in
ruby#3296 to initialize the a frame
before updating the `cfp` pointer.

But since there's nothing stopping the compiler from reordering
the initialization of a frame (`*cfp =`) with the update of the cfp
pointer (`ec->cfp = cfp`) we've been hesitant to rely on this on
the Datadog profiler.

In practice, after some experimentation + talking to folks, this
reordering does not seem to happen.

But since modern compilers have a way for us to exactly tell them
not to do the reordering (`atomic_signal_fence`), this seems even
better.

I've actually extracted `vm_push_frame` into the "Compiler Explorer"
website, which you can use to see the assembly output of this function
across many compilers and architectures: https://godbolt.org/z/3oxd1446K

On that link you can observe two things across many compilers:
1. The compilers are not reordering the writes
2. The barrier does not change the generated assembly output
   (== has no cost in practice)

**Additional Notes:**

The checks added in `configure.ac` define two new macros:
* `HAVE_STDATOMIC_H`
* `HAVE_DECL_ATOMIC_SIGNAL_FENCE`

Since Ruby generates an arch-specific `config.h` header with
these macros upon installation, this can be used by profilers
and other libraries to test if Ruby was compiled with the fence enabled.

**How to test the change?**

As I mentioned above, you can check https://godbolt.org/z/3oxd1446K
to confirm the compiled output of `vm_push_frame` does not change
in most compilers (at least all that I've checked on that site).
ivoanjo added a commit to DataDog/ruby that referenced this pull request Jul 5, 2024
… frames to ensure safe profiling

**What does this PR do?**

This PR tweaks the `vm_push_frame` function to add an explicit compiler
fence (`atomic_signal_fence`) to ensure profilers that use signals
to interrupt applications (stackprof, vernier, pf2, Datadog profiler)
can safely sample from the signal handler.

This is a backport of ruby#11036 to Ruby 3.2 .

**Motivation:**

The `vm_push_frame` was specifically tweaked in
ruby#3296 to initialize the a frame
before updating the `cfp` pointer.

But since there's nothing stopping the compiler from reordering
the initialization of a frame (`*cfp =`) with the update of the cfp
pointer (`ec->cfp = cfp`) we've been hesitant to rely on this on
the Datadog profiler.

In practice, after some experimentation + talking to folks, this
reordering does not seem to happen.

But since modern compilers have a way for us to exactly tell them
not to do the reordering (`atomic_signal_fence`), this seems even
better.

I've actually extracted `vm_push_frame` into the "Compiler Explorer"
website, which you can use to see the assembly output of this function
across many compilers and architectures: https://godbolt.org/z/3oxd1446K

On that link you can observe two things across many compilers:
1. The compilers are not reordering the writes
2. The barrier does not change the generated assembly output
   (== has no cost in practice)

**Additional Notes:**

The checks added in `configure.ac` define two new macros:
* `HAVE_STDATOMIC_H`
* `HAVE_DECL_ATOMIC_SIGNAL_FENCE`

Since Ruby generates an arch-specific `config.h` header with
these macros upon installation, this can be used by profilers
and other libraries to test if Ruby was compiled with the fence enabled.

**How to test the change?**

As I mentioned above, you can check https://godbolt.org/z/3oxd1446K
to confirm the compiled output of `vm_push_frame` does not change
in most compilers (at least all that I've checked on that site).
nagachika pushed a commit that referenced this pull request Jul 6, 2024
…mes to ensure safe profiling

**What does this PR do?**

This PR tweaks the `vm_push_frame` function to add an explicit compiler
fence (`atomic_signal_fence`) to ensure profilers that use signals
to interrupt applications (stackprof, vernier, pf2, Datadog profiler)
can safely sample from the signal handler.

This is a backport of #11036 to Ruby 3.2 .

**Motivation:**

The `vm_push_frame` was specifically tweaked in
#3296 to initialize the a frame
before updating the `cfp` pointer.

But since there's nothing stopping the compiler from reordering
the initialization of a frame (`*cfp =`) with the update of the cfp
pointer (`ec->cfp = cfp`) we've been hesitant to rely on this on
the Datadog profiler.

In practice, after some experimentation + talking to folks, this
reordering does not seem to happen.

But since modern compilers have a way for us to exactly tell them
not to do the reordering (`atomic_signal_fence`), this seems even
better.

I've actually extracted `vm_push_frame` into the "Compiler Explorer"
website, which you can use to see the assembly output of this function
across many compilers and architectures: https://godbolt.org/z/3oxd1446K

On that link you can observe two things across many compilers:
1. The compilers are not reordering the writes
2. The barrier does not change the generated assembly output
   (== has no cost in practice)

**Additional Notes:**

The checks added in `configure.ac` define two new macros:
* `HAVE_STDATOMIC_H`
* `HAVE_DECL_ATOMIC_SIGNAL_FENCE`

Since Ruby generates an arch-specific `config.h` header with
these macros upon installation, this can be used by profilers
and other libraries to test if Ruby was compiled with the fence enabled.

**How to test the change?**

As I mentioned above, you can check https://godbolt.org/z/3oxd1446K
to confirm the compiled output of `vm_push_frame` does not change
in most compilers (at least all that I've checked on that site).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants