Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize performance of area draw methods #75

Open
AndyObtiva opened this issue Sep 28, 2023 · 11 comments
Open

Optimize performance of area draw methods #75

AndyObtiva opened this issue Sep 28, 2023 · 11 comments

Comments

@AndyObtiva
Copy link
Collaborator

AndyObtiva commented Sep 28, 2023

Hi,

I am opening an issue here that is related to the following issues in other projects:

The work that needs to be done was mentioned in these comments on the libui-ng issue:

@kojix2 : kou measured the time required to call a C function from Ruby with FFI and native C extension and concluded that FFI consumes about 10 to 100 times longer.

@cody271 : I would suggest a native C extension just for the uiDraw calls, the rest of the bindings for controls can stay FFI.

As you already saw on Twitter, I announced that a 2-hour workshop that I proposed for RubyConf 2023 was accepted for November 13, 2023 (Workshop Title: How To Build Desktop Applications in Ruby): https://rubyconf-2023.sessionize.com/session/531448

Just like how I announced a new release of glimmer-dsl-libui in RubyConf 2022 that you helped contribute to in a new release of LibUI, it would be great if I could announce in RubyConf 2023 a new release of glimmer-dsl-libui that includes optimized area drawing via an optimized LibUI Ruby binding with a C extension for the uiDraw calls only (LibUI.draw_* methods like draw_path_arc_to, draw_path_line_to, and draw_path_add_rectangle).

If that is not possible by November 13, 2023, then if you could make a new release by then to resolve other simpler issues or expose new behavior from libui-ng, that would be great too (unless you have already exposed all the latest features in libui-ng).

Otherwise, it would still be useful to have this optimization implemented eventually as it would be great for a variety of graphical applications like games, live diagramming applications with thousands of nodes, and analytical applications.

@kojix2
Copy link
Owner

kojix2 commented Oct 5, 2023

I admit that there is a problem with the slow drawing of LibUI.

It is suggested that a switch from Fiddle to C extension may be necessary to solve this problem. I think this problem is too big to address all at once. I will try to think of a way to break the problem up and address it little by little.

@kojix2
Copy link
Owner

kojix2 commented Oct 23, 2023

The reason I hesitated to take on this issue was because I thought I would need to rewrite many structs from Fiddle::Struct to TypedData Objects. This means redoing the entire LibUI.

While I've written several C extension Gems before, most of rb_data_type_t were copy & paste & modified code from other projects, and I don't have a good understanding of TypedData Objects.

In fact, we don't need to make major changes to deal with this issue. Just get the address from a Fiddle::Pointer object and pass it to the libui function. I've learned from kou how to implement a method in a C extension that takes FFI::Pointer as an argument. You can also write C extensions that take Fiddle::Pointer as arguments in a similar manner.
The idea is simple, but I didn't realize it.

@kojix2
Copy link
Owner

kojix2 commented Nov 29, 2023

I created a branch named native and added some draw-related methods written in C, mostly by AI coding assistants. This attempt is just getting started, but I have a feeling it won't be fast. We need to investigate what is the bottleneck.

@AndyObtiva
Copy link
Collaborator Author

I got accepted to present at RubyConf again in 2024. I will conduct a 2-hour workshop just like last year. I will teach "How To Build Basic Desktop Applications in Ruby".

It would be nice if the libui performance optimization for area draw methods was ready by November 13 when RubyConf 2024 begins.

@kojix2
Copy link
Owner

kojix2 commented Sep 14, 2024

Unfortunately, I don’t think we’ll find a solution soon.

Since it’s been a while, I wanted to share my thoughts. I don’t think the performance issues are because of libffi. This is just a guess, but when I tested ruby-htslib’s performance, libffi didn’t seem to cause any big slowdowns.

I’m starting to think the real problem is with Ruby’s callbacks, which are closures. In C, callbacks are just functions, but Fiddle and Ruby-FFI create callback functions as closures. This is a big difference, and it might be causing the performance problems. Maybe creating and copying these closures is what’s slowing things down.

Honestly, I feel a bit stuck this time, but I’ll keep investigating.

@kojix2
Copy link
Owner

kojix2 commented Sep 19, 2024

Some Basic Understandings

  • Closures cannot be used in C.
  • Instead, C APIs typically accept function pointers and user data as arguments.
  • C's libui also provides such functions.
  • However, Ruby's LibUI does not use this mechanism to create closures.
  • Fiddle and Ruby-FFI utilize libffi.
  • libffi provides functionality for handling closures.
  • LibUI uses closures via libffi through Fiddle.
  • libffi can dynamically generate anonymous functions.
  • Dynamically generating functions in C involves placing data in memory according to the ABI to function as a callable function.

Current Hypotheses

  • Benchmarks suggest that anonymous function creation is time-consuming.
    • These anonymous functions are likely dynamically created by libffi.
    • There seems to be some overhead in creating closures.

@AndyObtiva
Copy link
Collaborator Author

AndyObtiva commented Oct 9, 2024

It might still be a useful idea to try the DragonRuby Game Toolkit automatic generator of Ruby C Extensions based on C Headers like the libui-ng header.

It wouldn't hurt to try it and see what performance it offers even if you might decide not to rely on it long-term or not to rely on it for all libui methods (like by keeping some bound with ffi). I am sure trying the DragonRuby Game Toolkit automatic C extension generator is a good learning exercise regardless.

@kojix2
Copy link
Owner

kojix2 commented Oct 9, 2024

I do not have a complete understanding of this matter. I have written what I think at this point in time in Japanese and had ChatGPT summarize it in English. Of course, what I have written here is not everything, but this is what I believe to be the most important aspect of this issue at this time.


I must apologize for not being able to dedicate enough time and effort to this problem. However, I believe that simply replacing FFI with a C extension will not solve the issue.

Issues with Using Closures as C Callbacks

In general, C functions cannot take closures (functions that keep track of external variables) as arguments. Instead, C functions often use a data argument to pass a pointer to external information. This allows the function to access data outside of its own scope, simulating the behavior of a closure.

In Ruby, closures are represented by blocks, and a block can reference variables outside of itself. On the other hand, Ruby’s methods cannot reference external variables. This is a key difference between C functions and Ruby’s blocks or methods.

Using Closures in Crystal

To understand this better, let's look at Crystal, a language similar to Ruby. In Crystal, you can use the Box class to pass closures to C functions. Here is an example:

Crystal Official Documentation: Passing a Closure to a C Function

module Ticker
  @@box : Pointer(Void)?

  def self.on_tick(&callback : Int32 ->)
    boxed_data = Box.box(callback)
    @@box = boxed_data

    LibTicker.on_tick(->(tick, data) {
      data_as_callback = Box(typeof(callback)).unbox(data)
      data_as_callback.call(tick)
    }, boxed_data)
  end
end

In this example, the Box class stores the closure in memory, and its pointer is passed as the data argument to the C function. This allows us to simulate closures within the C function’s context.

Limitations When There Is No data Argument

However, if a C function does not have a data argument, this technique will not work. In Crystal, when a closure references variables outside its own scope and is passed to such a function, it will raise an error at runtime. This behavior is unusual for a static language and can cause unexpected terminations, instead of typical segmentation faults in C.

FFI’s Approach to Implementing Closures

With FFI, dynamic C functions are generated using libffi, which allows us to simulate closures even without a data argument. In Ruby, we can use Fiddle::Closure::BlockCaller to create a C function pointer that behaves like a Ruby block. This method involves some performance overhead because it requires dynamically generating a function that wraps the Ruby block.

The area_handler Structure

The area_handler structure, which is causing the issue (benchmarking suggests that), looks like this:

struct uiAreaHandler {
    void (*Draw)(uiAreaHandler *, uiArea *, uiAreaDrawParams *);
    void (*MouseEvent)(uiAreaHandler *, uiArea *, uiAreaMouseEvent *);
    void (*MouseCrossed)(uiAreaHandler *, uiArea *, int left);
    void (*DragBroken)(uiAreaHandler *, uiArea *);
    int (*KeyEvent)(uiAreaHandler *, uiArea *, uiAreaKeyEvent *);
};

We can assign Ruby closures to these function pointers using Fiddle::Closure::BlockCaller like this:

do_nothing = Fiddle::Closure::BlockCaller.new(0, [0]) {}
key_event  = Fiddle::Closure::BlockCaller.new(1, [0]) { 0 }

handler.Draw         = handler_draw_event
handler.MouseEvent   = do_nothing
handler.MouseCrossed = do_nothing
handler.DragBroken   = do_nothing
handler.KeyEvent     = key_event

In this setup, BlockCaller transforms Ruby blocks into C function pointers. However, since this process involves libffi, it may introduce some performance overhead. Also, if a C function does not have a data argument, it is difficult to use closures effectively.

Conclusion

For structures like area_handler, using Fiddle::Closure::BlockCaller allows us to register Ruby blocks as C function pointers. However, because there is no data argument, it is not possible to pass closures directly. Rewriting the code as a C extension will not solve this issue, as the same limitation still applies.

To improve performance in such cases, it might be possible to use a method (Fiddle::Closure) instead, so that it can be passed as a simple function instead of a closure. However, this approach will limit the flexibility of the implementation.

The current performance issue is likely not due to the overhead of using FFI itself, but rather because of the added complexity of supporting closures—a feature not originally available in C—in order to make them usable from Ruby.

This is my current understanding of the issue. I hope this explanation helps clarify the situation, and I look forward to exploring more efficient solutions in the future.


Example: Using qsort in Ruby

Here is a concrete example using the qsort function in Ruby with Fiddle::Closure:

require 'fiddle'
include Fiddle

class Compare < Fiddle::Closure
  def call(x, y)
    x.to_s(1) <=> y.to_s(1)
  end
end

libc = Fiddle.dlopen("/lib/libc.so.6")
qs = Fiddle::Function.new(libc["qsort"], [TYPE_VOIDP, TYPE_INT, TYPE_INT, TYPE_VOIDP], TYPE_VOID)
s = "7x0cba(Uq)"
qs.call(s, s.size, 1, Compare.new(TYPE_INT, [TYPE_VOIDP, TYPE_VOIDP]))
p s # =>  "()07Uabcqx"

In this example, the Compare class inherits from Fiddle::Closure and defines the call method, allowing it to be used as a C function pointer in qsort. This enables us to write a custom comparison function in Ruby and pass it to the C library.

@AndyObtiva
Copy link
Collaborator Author

AndyObtiva commented Oct 10, 2024

Thank you for the summary.

Using methods in place of block closures should be good enough. If you can provide an example of that working without performance issues, that would be great.

Rewriting the code as a C extension would not solve the problem because the same limitations apply

This can only be confirmed 100% by trying to build a C extension even if in theory, this might be true. If the other option of using methods doesn’t work out, I think trying to build a C extension would be worth it even if it ends up failing in resolving the draw slowdown issue.

@kojix2
Copy link
Owner

kojix2 commented Oct 14, 2024

I'm going to work on this problem this week, but please don't get your hopes up too much.
This issue is more difficult for me to solve than the others.

@kojix2
Copy link
Owner

kojix2 commented Oct 20, 2024

This week, I made some progress in my thinking.
I realized that some of my previous ideas were wrong. As usual, I wrote a notebook for the record and translated it into English using ChatGPT.


It all started when I began using per to measure performance and Hotspot to visualize it. Although I don’t have a formal background in computing, I knew that Hotspot makes it easy for anyone to visualize how much time native functions consume.

AndyObtiva/glimmer-dsl-libui#55

image

From the screenshots, I noticed that some functions, marked with ?? [??] ?? [??] ?? [??] ?? [??], were consuming a lot of time. However, the exact function names were missing, which was puzzling. (This makes the problem difficult since it’s unclear what’s actually taking up time.)

At first, I thought the missing function names were because libui-ng wasn’t built with debug mode enabled. But even after rebuilding it with debug mode, nothing changed. Although libui-ng functions were recorded, the unknown function names remained hidden.

Next, I suspected that these were dynamically generated anonymous functions from libffi. Since performance slowed down significantly with larger inputs (e.g., 1000 or 2000 elements) and much time was spent inside these unknown functions, I thought memory copying might be involved.

I also wondered if FFI itself was responsible for copying memory, but this turned out to be a misunderstanding.

The following blog explains how FFI calls back functions:
Cast a Closure to a Function Pointer: How libffi closure works

According to the blog, if *data is not passed as an argument, FFI places the pointer to *data at a fixed offset, which is referred to as a "trampoline." This trampoline only involves pointers and does not copy memory. Therefore, using the trampoline instead of passing a closure through *data does not add overhead. This means my initial assumption that FFI was copying memory was wrong.

Furthermore, both FFI::Closure and FFI::Closure::BlockCaller use the same trampoline mechanism, meaning there is little practical difference between them. Thus, switching from FFI::Closure::BlockCaller to FFI::Closure will not solve the issue.

However, memory copying does happen, but it occurs inside Ruby, not FFI.
A Proc object contains a structure called rb_env_t, which holds a stack frame. This environment is copied using env_clone. New memory is allocated with ALLOC_N(VALUE, env->env_size), and local variables are copied with MEMCPY(new_body, env->env, VALUE, env->env_size). The pointer ep refers to the position of the current stack frame. The MEMCPY macro ensures that copying is skipped if env_size meets certain conditions.

That’s what I have figured out so far...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants