Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark JIT code as writeable / executable depending on the situation #5032

Merged
merged 2 commits into from
Dec 1, 2021

Conversation

tenderlove
Copy link
Member

Some platforms don't want memory to be marked as writeable and
executable at the same time. This commit introduces two functions for
marking code blocks as either writeable or executable depending on the
situation. When we need to write to memory, we call cb_mark_writeable
and when we're done writing to memory, call cb_mark_executable

This is an initial stab at implementing memory protection for JIT code

@jeremyevans
Copy link
Contributor

With this change and a reversion of 119626d, ruby --yjit runs on OpenBSD/amd64. Startup time goes from 0.25s to 3.59s with --yjit, so it appears to be doing something and not ignored (mapping memory maybe?).

@tenderlove
Copy link
Member Author

Startup time goes from 0.25s to 3.59s with --yjit,

Yikes. I'll keep investigating

@maximecb
Copy link
Contributor

The problem might be that it's not fine-grained enough. I read in an online thread that the running time of mprotect is O(n_pages). You could try changing cb_mark_writeable and cb_write_executable to do mprotect on a single page instead. That wouldn't be correct, but it would allow you to benchmark startup time doing mprotect on a single page vs many pages.

@tenderlove
Copy link
Member Author

I need to find an OpenBSD machine to test on. This change seems to have no impact on boot time either on macOS or Ubuntu (AMD).

The test I used is this:

require "benchmark/ips"

Benchmark.ips do |x|
  x.report("startup") do
    system "./miniruby --yjit -e 0"
  end
end

Results on macOS:

Master:

$ ruby test.rb
Warming up --------------------------------------
             startup     1.000  i/100ms
Calculating -------------------------------------
             startup      6.071  (± 0.0%) i/s -     31.000  in   5.107563s

This branch:

$ ruby test.rb
Warming up --------------------------------------
             startup     1.000  i/100ms
Calculating -------------------------------------
             startup      6.282  (± 0.0%) i/s -     32.000  in   5.096036s

Ubuntu master:

ruby test.rb
Warming up --------------------------------------
             startup     1.000  i/100ms
Calculating -------------------------------------
             startup     13.390  (± 0.0%) i/s -     67.000  in   5.004164s

This branch:

ruby test.rb
Warming up --------------------------------------
             startup     1.000  i/100ms
Calculating -------------------------------------
             startup     13.214  (± 0.0%) i/s -     67.000  in   5.070824s

@jeremyevans
Copy link
Contributor

I'm not sure this change would slow anything down on other operating systems. The 0.25s (no yjit) to 3.59s (yjit) may just be the overhead of using yjit on OpenBSD. You cannot test on OpenBSD without this change, because before the change, yjit doesn't run on OpenBSD.

@tenderlove
Copy link
Member Author

You cannot test on OpenBSD without this change, because before the change, yjit doesn't run on OpenBSD.

I mean, I want to test @maximecb's theory that mprotect is slow.

@maximecb
Copy link
Contributor

I don't see any reason why YJIT would just be slow on OpenBSD specifically, besides mprotect. It's still an x86 CPU.

I think mprotect might be slow when you try to apply it to many pages at once but would be faster if touching fewer pages. Interesting if it's still fast on macOS and Ubuntu when doing mprotect on large memory regions at once 🤔

@tenderlove
Copy link
Member Author

I think mprotect might be slow when you try to apply it to many pages at once but would be faster if touching fewer pages. Interesting if it's still fast on macOS and Ubuntu when doing mprotect on large memory regions at once 🤔

That's what I was thinking. But maybe it's just one big mapping and the OS is able to do something more efficient? 🤷🏻‍♀️

yjit_asm.h Outdated Show resolved Hide resolved
@maximecb
Copy link
Contributor

maximecb commented Oct 28, 2021

@tenderlove if you have time, do you think you could make a microbenchmark for mprotect on BSD?

Say, allocate some large N number of pages. Then, in a a loop, mark the whole thing executable, then writable but not executable. And benchmark this with N=1 and with N=10000.

We could benchmark this on both BSD and macOS, and it would help us understand if the number of pages matters or not.

If you don't have time maybe I can write the code and you can run it.

@tenderlove
Copy link
Member Author

We could benchmark this on both BSD and macOS, and it would help us understand if the number of pages matters or not.

If you don't have time maybe I can write the code and you can run it.

Ya, I'll do that today. Looks like I've got access to a BSD machine now.

@tenderlove
Copy link
Member Author

I made a repository here that has a benchmark for mprotect. I tried to make the allocation similar to what we do in YJIT. The benchmark makes allocations in multiples of PAGE_SIZE, then changes protection on the region in a loop.

On the CI server, the output is like this:

rubyci-openbsd$ ITERATIONS=1000000 make benchmark
for number in 1 100 1000 10000 100000 ; do  export PAGE_MULTIPLE=$number;  time ./mprotect_perf;  done
Allocating page size 4096
Iterating 1000000 times
    0m03.15s real     0m00.96s user     0m02.13s system
Allocating page size 409600
Iterating 1000000 times
    0m05.01s real     0m01.35s user     0m03.51s system
Allocating page size 4096000
Iterating 1000000 times
    0m04.91s real     0m01.53s user     0m03.33s system
Allocating page size 40960000
Iterating 1000000 times
    0m05.14s real     0m01.46s user     0m03.62s system
Allocating page size 409600000
Iterating 1000000 times
    0m06.86s real     0m01.25s user     0m05.60s system

It seems like setting the protection on the region is pretty fast regardless of the size. Even with a 409MB region we're still seeing 145772 iterations per second.

As @jeremyevans said, we've never been able to run YJIT on BSD before, so we don't know that mprotect is the bottleneck. My guess is that it's the memset we do on YJIT boot, but probably perf would know better.

@maximecb
Copy link
Contributor

Hmmm. Could avoid doing the memset to see if it has an impact?

@tenderlove
Copy link
Member Author

I was able to reproduce a slow down in boot time:

rubyci-openbsd$ time ./miniruby -v -e' '        
ruby 3.1.0dev (2021-10-29) [x86_64-openbsd7.0]
    0m00.03s real     0m00.01s user     0m00.01s system
rubyci-openbsd$ time ./miniruby --yjit -v -e' ' 
ruby 3.1.0dev (2021-10-29) +YJIT [x86_64-openbsd7.0]
    0m00.43s real     0m00.13s user     0m00.25s system

Seems like the boot time issues is the memset call. If I comment out memset, boot is like this:

rubyci-openbsd$ time ./miniruby --yjit -v -e' ' 
ruby 3.1.0dev (2021-10-29) +YJIT [x86_64-openbsd7.0]
    0m00.04s real     0m00.01s user     0m00.02s system

@tenderlove
Copy link
Member Author

Specifically this memset call. I left the mprotect calls. The change I applied is this:

diff --git a/yjit_asm.c b/yjit_asm.c
index b5e3bd12c7..6a33d3f3fa 100644
--- a/yjit_asm.c
+++ b/yjit_asm.c
@@ -218,7 +218,7 @@ uint8_t *alloc_exec_mem(uint32_t mem_size)
     // Fill the executable memory with INT3 (0xCC) so that
     // executing uninitialized memory will fault
     cb_mark_writeable(cb);
-    memset(mem_block, 0xCC, mem_size);
+    //memset(mem_block, 0xCC, mem_size);
     cb_mark_executable(cb);
 
     return mem_block;

@maximecb
Copy link
Contributor

Very surprising. Does it take this long if you have a dummy program that just does memset and mprotect? :O

@tenderlove tenderlove force-pushed the lock-unlock-jit-code branch 4 times, most recently from 77179be to 47114c2 Compare November 18, 2021 21:53
@tenderlove
Copy link
Member Author

I found these tests were timing out in CI. I tracked down the culprit, and it was unfortunately the mprotect calls. The initial benchmarks showed that mprotect was not a problem, but that's because I got the benchmark wrong. The original benchmark would just mmap some memory, then test changing protection on that memory.

Of course if you mmap some memory, the OS lazily gives you the memory, so in order to demonstrate the performance issue I changed the benchmark to write to the JIT pages before attempting to change protection.

The numbers I got back showed essentially a linear growth relationship between the number of pages mapped and the time it takes to call mprotect (IOW the more pages we have the slower it gets). However, changing the protection for just one page is still quite fast.

I didn't want to do book keeping to keep track of every page we've written to, so @jhawthorn and I came up with a scheme. We changed the function that writes to the JIT buffer. It checks the codeblock to see if the last write was on the same page. If the last write was on the same page, just write to the code block, otherwise unlock the page then write to the code block.

Since it only keeps track of the "last written" page, we just mark the whole JIT buffer as executable (relying on the OS to figure out which pages changed protection).

This strategy seems to bring boot time down a bunch. It's still slower than "no JIT", but faster than it was.

Before the change

time ./ruby -v --yjit --yjit-call-threshold=1 --yjit-max-versions=1 -e'0'
ruby 3.1.0dev (2021-11-18T18:19:46Z lock-unlock-jit-code 88deb89e5b) +YJIT [x86_64-linux]
last_commit=Call rb_vm_barrier whenever we change code to "writeable"
-e:1: warning: possibly useless use of a literal in void context

________________________________________________________
Executed in    1.72 secs   fish           external 
   usr time   60.40 millis  414.00 micros   59.99 millis 
   sys time  1655.84 millis  155.00 micros  1655.68 millis 

After the change

time ./ruby -v --yjit --yjit-call-threshold=1 --yjit-max-versions=1 -e'0'
ruby 3.1.0dev (2021-11-18T21:29:48Z lock-unlock-jit-code f05248cfae) +YJIT [x86_64-linux]
last_commit=Change strategy for unlocking pages in the JIT buffer
-e:1: warning: possibly useless use of a literal in void context

________________________________________________________
Executed in  110.34 millis    fish           external 
   usr time   58.85 millis  358.00 micros   58.50 millis 
   sys time   50.83 millis  133.00 micros   50.70 millis

No JIT

time ./ruby -v -e'0'
ruby 3.1.0dev (2021-11-18T21:29:48Z lock-unlock-jit-code f05248cfae) [x86_64-linux]
last_commit=Change strategy for unlocking pages in the JIT buffer
-e:1: warning: possibly useless use of a literal in void context

________________________________________________________
Executed in   43.76 millis    fish           external 
   usr time   43.65 millis  337.00 micros   43.31 millis 
   sys time    0.13 millis  126.00 micros    0.00 millis

@tenderlove tenderlove force-pushed the lock-unlock-jit-code branch 3 times, most recently from 78c0ca9 to c066f84 Compare November 19, 2021 17:14
@tenderlove tenderlove marked this pull request as ready for review November 22, 2021 22:08
@tenderlove
Copy link
Member Author

I think this is ready for review. I was able to get it running on the BSD CI server and my results are like this:

rubyci-openbsd$ time ./miniruby -I./lib -I. -I.ext/common  ./tool/runruby.rb --extout=.ext  -- --yjit -v -e'require "rubygems"'
ruby 3.1.0dev (2021-11-19) +YJIT [x86_64-openbsd7.0]
    0m00.58s real     0m00.14s user     0m00.35s system
rubyci-openbsd$ time ./miniruby -I./lib -I. -I.ext/common  ./tool/runruby.rb --extout=.ext  -- -v -e'require "rubygems"'        
ruby 3.1.0dev (2021-11-19) [x86_64-openbsd7.0]
    0m00.14s real     0m00.10s user     0m00.04s system
rubyci-openbsd$ 

Seems like around ~40ms overhead was added versus the interpreter. We should test to see how this impacts benchmarks too though.

Final thought: should we try to conditionally enable this on platforms that require these protections? That way platforms that don't care don't take the speed hit? Thoughts @maximecb @jeremyevans ?

@jeremyevans
Copy link
Contributor

I think it should be enabled everywhere by default, with maybe a flag if you want to switch to the less-secure-but-faster mode. It's important to understand that the main reason for this change is it improves security (memory that is both writable and executable makes exploitation much easier). That it is also more portable to operating systems that enforce the restriction is a bonus.

@maximecb
Copy link
Contributor

Would be curious how this affects startup time and railsbench performance on Linux. We have some easy to run benchmarks in yjit-bench.

@eregon
Copy link
Member

eregon commented Nov 23, 2021

If I read the numbers above correctly it's 0.58vs vs 0.14s so that would be a 440ms difference, or ~4.14x slower for startup and loading RubyGems.

@jeremyevans
Copy link
Contributor

@eregon It looks like 0.58vs vs 0.14s is the difference between yjit and no yjit, not the difference between yjit with this patch and yjit without this patch. I think the consideration when deciding whether to enable this protection by default when using yjit is going to depend on the difference between yjit with this patch and yjit without this patch.

Copy link
Member

@XrXr XrXr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever approach to let the OS figure out which pages need to change protection. If it's fast enough then we can probably enable it for all systems. It would be nice if we could avoid introducing a new option for this.
+1 for benchmarking on Linux based systems to measure perf degradation. Maybe we want to add a benchmark for ruby --disable-gems -e 'require "rubygems"' to https://github.com/Shopify/yjit-bench

If necessary, we could do bookkeeping for our pages. It would also allow us to fill with int3 on first write which helps boot perf. One of my machine spends >100ms on boot memsetting 256MiB with 0xcc which isn't idea.

}

if (ocb) {
cb_mark_all_executable(ocb);
Copy link
Member

@XrXr XrXr Nov 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This callback is invoked per iseq, so it's going to make this large mprotect call O(ISEQ_COUNT) times. That seems like it'd be bad for compaction perf. Maybe we need to touch gc.c so we make everything executable when exiting the GC....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure what to do about this, but I feel like we can postpone fixing this situation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can postpone since we'll no longer have global cb/ocb next year with the code pages.

yjit_iface.c Outdated
cb_mark_position_writeable(cb, offset_to_value);

// Object could cross a page boundary, so unlock there as well
cb_mark_position_writeable(cb, offset_to_value + SIZEOF_VALUE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an off-by-one error. If it so happens that we are writing exactly the last SIZEOF_VALUE bytes of a page, this changes protection on two pages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I thought so too (wrt the off by one), but I was worried about fixing the build first. Unlocking too much memory shouldn't make anything fail. I'll fix the OB1!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two system calls instead of one though? Can have performance implications?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maximecb As long as the VALUE doesn't span multiple pages, we'll only do one system call. I linked to it in another comment, but as long as offset_to_value + SIZEOF_VALUE - 1 falls on the same page as the previous call we'll only do one system call. This second call is in the off chance that the pointer happens to span two pages (and in that case we would need two system calls).

@tenderlove
Copy link
Member Author

Here are the perf results from my AMD machine on this branch:

ruby_version="ruby 3.1.0dev (2021-11-29T17:59:12Z lock-unlock-jit-code dfb721f5ea) [x86_64-linux] last_commit=fix OB1"
git_branch="lock-unlock-jit-code"
git_commit="dfb721f5ea"

-------------  -----------  ----------  ---------  ----------  -----------  ------------
bench          interp (ms)  stddev (%)  yjit (ms)  stddev (%)  interp/yjit  yjit 1st itr
30k_ifelse     1557.0       0.0         198.2      0.1         7.86         3.39        
30k_methods    4233.9       0.0         470.4      0.1         9.00         7.79        
activerecord   119.3        0.2         81.8       0.2         1.46         1.37        
binarytrees    287.9        2.1         212.1      2.5         1.36         1.34        
cfunc_itself   76.0         0.5         29.5       0.4         2.58         2.58        
erubi          337.7        0.6         336.0      0.7         1.01         1.00        
erubi_rails    24.3         3.0         19.9       9.6         1.22         1.00        
fannkuchredux  5656.9       0.1         5591.5     0.4         1.01         1.01        
fib            162.0        2.9         40.8       1.4         3.97         3.93        
getivar        88.9         0.8         23.5       0.3         3.78         0.98        
hexapdf        2258.3       1.1         1635.3     1.2         1.38         1.32        
jekyll         6764.4       1.9         5729.7     2.0         1.18         1.17        
keyword_args   194.6        1.9         36.6       1.1         5.32         5.29        
lee            915.7        0.7         626.7      1.2         1.46         1.49        
liquid-render  146.1        0.9         88.4       1.6         1.65         1.54        
mail           127.5        0.3         105.9      0.2         1.20         1.00        
nbody          95.9         0.6         63.2       0.4         1.52         1.54        
optcarrot      4539.0       0.9         2182.6     0.6         2.08         2.08        
psych-load     1793.6       0.1         1332.2     0.1         1.35         1.35        
railsbench     1889.5       1.1         1483.8     1.6         1.27         1.23        
respond_to     206.0        0.5         136.3      1.4         1.51         1.52        
rubykon        9371.5       0.4         4322.6     0.4         2.17         2.17        
setivar        61.0         0.8         25.2       4.3         2.42         1.08        
-------------  -----------  ----------  ---------  ----------  -----------  ------------
Legend:
- interp/yjit: ratio of interp/yjit time. Higher is better. Above 1 represents a speedup.
- 1st itr: ratio of interp/yjit time for the first benchmarking iteration.

Here it is from the master branch

ruby_version="ruby 3.1.0dev (2021-11-29T16:29:34Z master af59d35570) [x86_64-linux]"
git_branch="master"
git_commit="af59d35570"

-------------  -----------  ----------  ---------  ----------  -----------  ------------
bench          interp (ms)  stddev (%)  yjit (ms)  stddev (%)  interp/yjit  yjit 1st itr
30k_ifelse     1558.0       0.0         197.3      0.1         7.90         5.52        
30k_methods    4232.7       0.0         452.7      0.1         9.35         8.71        
activerecord   119.5        0.2         82.3       0.2         1.45         1.42        
binarytrees    292.6        2.2         222.0      2.8         1.32         1.31        
cfunc_itself   73.8         0.5         30.7       0.2         2.40         2.45        
erubi          353.5        0.6         349.7      0.7         1.01         1.00        
erubi_rails    23.9         2.7         20.1       9.4         1.19         1.26        
fannkuchredux  6202.2       0.3         6187.6     0.1         1.00         1.00        
fib            160.6        0.8         40.8       0.4         3.94         3.91        
getivar        88.1         0.5         24.7       2.4         3.56         0.97        
hexapdf        2267.1       1.0         1647.1     1.1         1.38         1.35        
jekyll         6812.0       1.7         5734.9     2.3         1.19         1.18        
keyword_args   192.8        2.2         39.0       0.2         4.95         4.93        
lee            913.0        0.8         616.7      1.0         1.48         1.50        
liquid-render  146.3        0.8         89.6       1.5         1.63         1.60        
mail           127.5        0.4         107.2      0.2         1.19         1.06        
nbody          95.2         1.2         63.5       0.3         1.50         1.47        
optcarrot      4438.2       0.9         2145.6     0.6         2.07         2.06        
psych-load     1807.2       0.0         1368.8     0.1         1.32         1.33        
railsbench     1897.8       1.0         1503.4     1.5         1.26         1.25        
respond_to     194.6        1.7         132.1      1.3         1.47         1.49        
rubykon        9282.9       0.3         4245.4     0.5         2.19         2.19        
setivar        58.8         0.8         25.1       5.1         2.34         1.00        
-------------  -----------  ----------  ---------  ----------  -----------  ------------
Legend:
- interp/yjit: ratio of interp/yjit time. Higher is better. Above 1 represents a speedup.
- 1st itr: ratio of interp/yjit time for the first benchmarking iteration.

Looks like yjit (ms) is basically the same (maybe slightly slower on on 30k_methods?).

@maximecb
Copy link
Contributor

Looks pretty good. Can you also measure the boot time on Linux?

Otherwise, are we ready to merge?

Comment on lines +58 to +60
// Keep track of the current aligned write position.
// Used for changing protection when writing to the JIT buffer
uint32_t current_aligned_write_pos;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the aligned write position? What does it mean???

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is similar to the write_pos variable but it's aligned on multiples of the OS page size. That way when we write to the buffer, we can compare and only call the mprotect system call when we're on a "new" page. It allows us to amortize the cost of an mprotect system call.

I can add all of ^^^ as a comment if it helps (the same info is in the commit message for this code)

@tenderlove
Copy link
Member Author

Looks pretty good. Can you also measure the boot time on Linux?

Here's boot time on my Linux machine.

Without the patch:

aaron@whiteclaw ~/g/ruby (master)> time ./miniruby -I./lib -I. -I.ext/common  ./tool/runruby.rb --extout=.ext  -- --yjit -v -e'require "rubygems"'
ruby 3.1.0dev (2021-11-29T16:29:34Z master af59d35570) +YJIT [x86_64-linux]

________________________________________________________
Executed in  116.77 millis    fish           external 
   usr time   44.31 millis  532.00 micros   43.77 millis 
   sys time   71.81 millis  174.00 micros   71.63 millis 

With the patch:

> time ./miniruby -I./lib -I. -I.ext/common  ./tool/runruby.rb --extout=.ext  -- --yjit -v -e'require "rubygems"'
ruby 3.1.0dev (2021-11-29T17:59:12Z lock-unlock-jit-code dfb721f5ea) +YJIT [x86_64-linux]
last_commit=fix OB1

________________________________________________________
Executed in  122.55 millis    fish           external 
   usr time   47.39 millis  526.00 micros   46.87 millis 
   sys time   74.37 millis  167.00 micros   74.21 millis 

It seems extremely close on Linux.

Otherwise, are we ready to merge?

I think so! 😄

@XrXr
Copy link
Member

XrXr commented Nov 29, 2021

Here's boot time on my Linux machine.

Sorry, could you try after make install to be end-to-end?

@tenderlove
Copy link
Member Author

Sorry, could you try after make install to be end-to-end?

Sure. Here it is with this PR:

aaron@whiteclaw ~/g/ruby (lock-unlock-jit-code)> which ruby
/home/aaron/.rubies/ruby-yjit/bin/ruby
aaron@whiteclaw ~/g/ruby (lock-unlock-jit-code)> time ruby --yjit -v -e'require "rubygems"'
ruby 3.1.0dev (2021-11-29T17:59:12Z lock-unlock-jit-code dfb721f5ea) +YJIT [x86_64-linux]
last_commit=fix OB1

________________________________________________________
Executed in  112.75 millis    fish           external 
   usr time   44.55 millis  527.00 micros   44.02 millis 
   sys time   68.21 millis  169.00 micros   68.04 millis 

aaron@whiteclaw ~/g/ruby (lock-unlock-jit-code)> 

Here it is without the PR:

aaron@whiteclaw ~/g/ruby (master)> which ruby
/home/aaron/.rubies/ruby-yjit/bin/ruby
aaron@whiteclaw ~/g/ruby (master)> time ruby --yjit -v -e'require "rubygems"'
ruby 3.1.0dev (2021-11-29T16:29:34Z master af59d35570) +YJIT [x86_64-linux]

________________________________________________________
Executed in  110.65 millis    fish           external 
   usr time   45.23 millis  517.00 micros   44.72 millis 
   sys time   65.21 millis  165.00 micros   65.04 millis 

Copy link
Contributor

@maximecb maximecb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Aaron for completing this and for being patient with Alan and I's questioning, producing benchmark results.

I'm sufficiently convinced that the impact on Linux platforms is minimal.

tenderlove and others added 2 commits December 1, 2021 11:24
Some platforms don't want memory to be marked as writeable and
executable at the same time. When we write to the code block, we
calculate the OS page that the buffer position maps to.  Then we call
`mprotect` to allow writes on that particular page.  As an optimization,
we cache the "last written" aligned page which allows us to amortize the
cost of the `mprotect` call.  In other words, sequential writes to the
same page will only call `mprotect` on the page once.

When we're done writing, we call `mprotect` on the entire JIT buffer.
This means we don't need to keep track of which pages were marked as
writeable, we let the OS take care of that.

Co-authored-by: John Hawthorn <john@hawthorn.email>
If YJIT isn't enabled, or hasn't finished booting, cb / ocb could be
null.  This commit just checks to make sure they're available before
marking as executable

Co-Authored-By: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-Authored-By: Kevin Newton <kddnewton@gmail.com>
@tenderlove tenderlove merged commit 4079f0d into ruby:master Dec 1, 2021
@tenderlove tenderlove deleted the lock-unlock-jit-code branch December 1, 2021 20:46
@tenderlove
Copy link
Member Author

@jeremyevans I merged this PR so I've reverted 119626d

@jeremyevans
Copy link
Contributor

@jeremyevans I merged this PR so I've reverted 119626d

Awesome, thank you so much. I tested and the startup impact of --yjit on OpenBSD/amd64 is much smaller now:

$ time ruby -e ''
    0m00.24s real     0m00.24s user     0m00.01s system
$ time ruby --yjit -e ''
    0m00.54s real     0m00.26s user     0m00.19s system

@tenderlove
Copy link
Member Author

excellent!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
6 participants