Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

toolkit-agnostic Core->GUI Communication (Draft B) #1765

Draft
wants to merge 140 commits into
base: master
Choose a base branch
from

Conversation

umlaeute
Copy link
Contributor

@umlaeute umlaeute commented Sep 21, 2022

this is just another draft, in addition to #1763 to resolve #1695

i started with a simple GUI-object (namely cnv, as it doesn't have any iolets), and implemented:

  • a common framework for how to create "widgets" on the GUI side
  • this mostly establishes a pseudo-API for common tasks, like creating/deleting/moving/selecting/reconfiguring widgets
  • an implementation of cnv that lives entirely on the GUI side
  • simplify g_mycanvas.c to do that...

internal state

so far this adds a bit of state to the GUI, namely:

  • an object knows which canvas(es) it is attached to. therefore we no longer need to send the canvas along when "interacting" with the object
  • cnv caches it's main color on the GUI, so selecting an object just tells the object that it is now selected (or not), and we don't have to send along the selected/unselected color (EDIT: actually in this case we can just make the 'unselected' rectangle indicator invisible)

state of the draft

very early, apparently.
so early, that it is likely to break unrelated functionality (e.g. i started replacing parts of the common iemgui C-side implementation, without worrying whether this works for all iemgui objects so far...)

regressions

  • radio: setting both the number and the color in one go, forgets the active radiobutton

since we are now reading the actual position back from the canvas, the zoom
is already applied, and we must not re-apply it.
and send an extra message for object-specific configurations.
LATER move the common code into g_all_guis.c
rather than using the background color.

and rename to activation proc from 'flash' to 'activate'
(it's a re-usable name for all iemguis)
@umlaeute
Copy link
Contributor Author

umlaeute commented Oct 12, 2022

yes. from Pd-core's POV, there is somewhat less work to do, but the GUI now has to do some more lifting (and there's a couple of indirections built into my current system, e.g. the command dispatcher).

now I dearly love the command dispatcher (e.g. allowing us to use a single ::pdwidget::displace $obj 10 0 to move any object, without knowing its implementation details) and would hate to replace it with something more direct.

@umlaeute
Copy link
Contributor Author

so time for some profiling.
the test-patch is a stripped down version of the one mentioned above, containing 32*24 numberboxes without iolets.
the test setup just draws these numberboxes entirely on the GUI side (the Pd-core doesn't even know about it), basically replaying what Pd would send and timing the replay.
This is on Linux, i7-7700 (amd64), Tcl/Tk-8.6.12

reference

test time (ms) notes
reference 21 this just mimicks how Pd draws the nbx by issuing the raw Tk calls

21ms for 768 objects is not that bad.

this implementation (with some hacks)

test time (ms) notes
rewrite 1066 current state of this PR
condensed 619 instead of multiple calls to ::pdwidget::config, merge them all into the ::pdwidget::create call
no-label 334 drop the (useless) setup of the gatom-label (as the test patch doesn't have any labels)
no-label + no-select 251 also drop the proc that tells newly created objects that they are "not-selected".

the "no-label" and "no-select" options are only here for comparision. in the end, we cannot just omit them (although there is room for optimization. in any case: text is obviously expensive)

there are two obvious ways to optimize the current implementation:

  1. on the Tcl side: reduce the number of function indirections ("getting rid" of the behaviour dispatcher)
  2. on the Tk side: simplify the tagging (the current implementation uses orthoginal tags, e.g. -tags [list ${obj} text] and then joins the tags (e.g. ${obj}&&text) to filter items; Pd uses explicit tags (e.g. -tags ${obj}T) which should make the filtering much simpler; i have no idea how Tcl/Tk implements tagging, but i guess checking the hash of a single tag is always simpler as checking two tags and then adding some boolean algebra on top of it).

bypassing the widgetbehavior dispatcher

test time (ms)
condensed 505
no-label 211
no-label + no-select 206

using explicit tags

test time (ms)
rewrite 726
condensed 368
no-label 229
no-label + no-select 179

using explicit tags and bypassing the widgetbehavior dispatcher

test time (ms)
condensed 283
no-labels 138
no-label + no-select 130

as can be seen there's plenty of potential to optimize, but even in the best case (condense the configuration into the constructor; explicit tags; bypassing the proc dispatcher), we are still about 10 times slower than what is currently happening...

@Ant1r
Copy link
Contributor

Ant1r commented Oct 13, 2022

Hey, these are pretty bad news...
So it seems TclTk is indeed very slow... OTOH this confirms that it's a good idea to find a way to get rid of it in the end!...

Maybe there's a possibility to continue in this way (which is to make pd core agnostic from the GUI toolkit, while keeping TclTk at first, before people can try implementing the GUI with another toolkit):

how about writing a temporary C (or C++...) gateway between the core and the GUI, that would do all the widget abstraction layer, and would send only digested commands to TclTk, which would only have to do the final display job? Then we should expect similar performance as the "reference" Pd.

Maybe we would need to run the gateway in a secondary thread, to avoid impacting the real-time requirements with GUI-related processing and dynamic memory allocation.

Thanks a lot for all your time on this anyway!

@giuliomoro
Copy link
Contributor

I tried to use the tcl profiler package and got some results that I am unsure about how to interpret (I don't have a neat way of visualising them yet), but they seem to indicate a lot of time is spent in _do_create_iolets (even if the iolets are all hidden).

Sample output for that function

Profiling information for ::pdwidget::_do_create_iolets
============================================================
            Total calls:  2057
    Caller distribution:
  ::pdwidget::_create_inlets:  2
  ::pdwidget::_create_outlets:  3
  ::pdwidget::_refresh_iolets:  2052
           Compile time:  183209
          Total runtime:  795693929
        Average runtime:  386822
          Runtime StDev:  272634
         Runtime cov(%):  70.5
  Total descendant time:  34299887
Average descendant time:  16674
Descendants:
  ::pd::canvas::get_zoom:  2057
  ::pdwidget::base_tag:  2057 

Full output:
pd-profiler.txt


steps to reproduce

Use this Pd patch (32*24 floatatoms without iolets plus a loadbang-delayed-message):
many-gatoms.pd.zip

Apply these changes:

diff --git a/src/m_glob.c b/src/m_glob.c
index a2fa0120..936d6808 100644
--- a/src/m_glob.c
+++ b/src/m_glob.c
@@ -34,6 +34,7 @@ void glob_start_startup_dialog(t_pd *dummy, t_floatarg flongform);
 void glob_startup_dialog(t_pd *dummy, t_symbol *s, int argc, t_atom *argv);
 void glob_ping(t_pd *dummy);
 void glob_plugindispatch(t_pd *dummy, t_symbol *s, int argc, t_atom *argv);
+void glob_guidispatch(t_pd *dummy, t_symbol *s, int argc, t_atom *argv);
 void glob_watchdog(t_pd *dummy);
 void glob_loadpreferences(t_pd *dummy, t_symbol *s);
 void glob_savepreferences(t_pd *dummy, t_symbol *s);
@@ -104,6 +105,12 @@ void max_default(t_pd *x, t_symbol *s, int argc, t_atom *argv)
     endpost();
 }

+void glob_guidispatch(t_pd *dummy, t_symbol *s, int argc, t_atom *argv)
+{
+	if(1 == argc && argv[0].a_type == A_SYMBOL) //surely can be done better
+		pdgui_vmess(0, "r", argv[0].a_w.w_symbol->s_name);
+}
+
 void glob_plugindispatch(t_pd *dummy, t_symbol *s, int argc, t_atom *argv)
 {
     pdgui_vmess("pdtk_plugin_dispatch", "a", argc, argv);
@@ -179,6 +186,8 @@ void glob_init(void)
         gensym("perf"), A_FLOAT, 0);
     class_addmethod(glob_pdobject, (t_method)glob_compatibility,
         gensym("compatibility"), A_FLOAT, 0);
+    class_addmethod(glob_pdobject, (t_method)glob_guidispatch,
+        gensym("gui-dispatch"), A_GIMME, 0);
     class_addmethod(glob_pdobject, (t_method)glob_plugindispatch,
         gensym("plugin-dispatch"), A_GIMME, 0);
     class_addmethod(glob_pdobject, (t_method)glob_helpintro,
diff --git a/tcl/pd-gui.tcl b/tcl/pd-gui.tcl
index bcaca58a..a98cc1ad 100755
--- a/tcl/pd-gui.tcl
+++ b/tcl/pd-gui.tcl
@@ -8,6 +8,9 @@
 # "." automatically gets a window, we don't want it.  Withdraw it before doing
 # anything else, so that we don't get the automatic window flashing for a
 # second while pd loads.
+package require profiler
+::profiler::init
+
 if { [catch {wm withdraw .} fid] } { exit 2 }

 # This is mainly for OSX as older versions only
@@ -829,6 +832,17 @@ proc load_startup_plugins {} {
     }
 }

+proc do_dump_profile {} {
+	# dump all kind of stuff, although data is repeated
+	# machine-readable
+	puts [::profiler::dump]
+	# human-readable
+	puts [::profiler::print]
+	# sorted by runtime. Last entry will be the more busy one.
+	# consider using other keys instead of totalRuntime
+	puts [::profiler::sortFunctions totalRuntime]
+}
+
 # ------------------------------------------------------------------------------
 # main
 proc main {argc argv} {

Note: the Wish.app 8.6 that is installed in my macos didn't have the profiler package and the wish8.6 I installed via brew required me to manually edit tcllib1.20/profiler/pkgIndex.tcl to fix the error Error in startup script: attempt to provide package profiler 0.4 failed: package profiler 0.5 provided instead

@giuliomoro
Copy link
Contributor

giuliomoro commented Oct 13, 2022

I notice here on macos that if I remove the

cnv lower "${tag}&&${iotag}" "${tag}&&iolets"
cnv raise "${tag}&&${iotag}" "${tag}&&iolets"

from the bottom of _do_create_iolets(), the function's runtime as reported by the profiler decrease greatly, so my results are surely affected by the above mentioned overall sloppiness of modern tk on macos. Hopefully on your system you'll get more meaningful results.

@umlaeute
Copy link
Contributor Author

umlaeute commented Oct 13, 2022

@Ant1r just for the record, i don't think all is lost. it was pretty easy sailing so far, but now the sea is getting rougher. i'm sure we will come up with some acceptable solution.

@giuliomoro thanks for the additional tests. i have profiler (0.6) available in my Tcl/Tk-8.6 installation (on Debian), so that's not a problem. I already used it a bit (to get a quick overview), but found a simple time invocation to be more meaningful for now.
i'm not sure i understand the need for glob_guidispatch in your patch (is this just a leftover that accidentally made it into the patch?)

as for _do_create_iolets: yes that's one of these uses of boolean tag compounds that i mentioned (and which i started to replace with "explicit tags" (e.g. ${tag}_iolets instead of ${tag}&&iolets). as you can see in my timing tables, this reduced the time by a factor of ~1.5.

i've also completely disabled the call to ::pdwidget::refresh_iolets for now, as this is a performance hog (and my numberboxes don't have iolets on purpose)

@giuliomoro
Copy link
Contributor

It looks like we cannot patch second inlets/outlets in this branch. This is because in _getclosest() in src/g_editor.c

static int _getclosest(int numiolets, int posX, int x1, int x2)
{
    int width = x2-x1, closest;
    if(!width)
        return 0;

    switch(numiolets) {
    case 1:
        closest = 0;
        break;
    default:
        closest = ((posX-x1) * (numiolets-1) + width/2)/width;
        break;
    }
    if(closest >= numiolets)
        closest = numiolets - 1;
    if(closest < 0)
        closest = 0;
    return closest;
}

width turns out to be negative which in turns makes closest always 0. Here's a suggested fix:
giuliomoro@8088029

@giuliomoro
Copy link
Contributor

giuliomoro commented Mar 16, 2023

Here's something we've been working on for a few months: a web interface to Pd, which leverages the refactored communication protocol of this branch: https://github.com/BelaPlatform/pure-data-web-GUI

It comprises of three components:

  • pd
  • the "shim": a very thin layer of websocket that forwards messages to/from Pd and the browser
  • the "frontend": the actual HTML5 stuff, written using the svelte framework.

For now, you can try it out with docker following the instructions included in the repo. The easiest way is to run it with the Pd it comes with (which is run inside docker and thus doesn't have audio/MIDI I/O capabilities), but you can easily connect it to your own Pd server instance (assuming it comes from this branch with giuliomoro@8088029).

It is by no mean complete or perfect, but I hope it shows that there is a tcl-tk-compatible, non-tcl-tk future possible for Pd. It even allows (well, with a lot of effort on the user side, but very little effort was put into it from the dev side) to patch on a touchscreen. Feedback welcome.

Also, @umlaeute wondering what you think the next steps for this branch should be. We would be happy to contribute time and effort to it if needed, but need your guidance to put the effort where it's most needed.

@umlaeute
Copy link
Contributor Author

Also, @umlaeute wondering what you think the next steps for this branch should be.

that's a good question.
we need to find some solution for the performance degradation.

i think the benchmark is the current implementation: the performance of this rewrite must about the same (or faster obviously).

this most likely means that the API will need to be a little less abstract (i guess the part that eats most resources is the iolet/connection thingy. so we probably have to ditch the idea for now, that the GUI can handle these on its own).

i also have the feeling that the proc calls (and proc calls from these) add overhead, which shouldn't be there (afaik,Tcl byte-compiles procs, which should make calling a proc that calls multiple TclTk commands faster than calling those commands manually; but it seems this is not the case here)

@Spacechild1
Copy link
Contributor

Spacechild1 commented Mar 17, 2023

Optimization rule nr. 1: measure, measure, measure = run a profiler on the core and GUI process to find the actual hotspots.

@umlaeute Do you have a test patch that exemplifies the performance problems?

afaik,Tcl byte-compiles procs, which should make calling a proc that calls multiple TclTk commands faster than calling those commands manually; but it seems this is not the case here

Yes, this seems really strange to me. I would also expect performance to be better, since we send fewer messages to the GUI. We have to profile to find the actual problem.

@umlaeute
Copy link
Contributor Author

@umlaeute Do you have a test patch that exemplifies the performance problems?

i guess that would be #1765 (comment)

@Spacechild1
Copy link
Contributor

Spacechild1 commented Mar 17, 2023

i guess that would be #1765 (comment)

Thanks, will give it a try.


One thing that came to my mind: I would be useful if there was a way to designate GUI transactions. Currently, we send lots of individual drawing commands, even if they just form one big operation (e.g. redrawing a canvas). If we could (optionally) wrap such commands in some kind of "begin" and "end" messages, this would enable huge optimization opportunities for the GUI backend, such as batching requests, temporarily disabling redrawing, etc.

For example, in canvas_map:

pdgui_vmess("::pd::canvas::begin", "c", x);
for (y = x->gl_list; y; y = y->g_next)
    gobj_vis(y, x, 1);
pdgui_vmess("::pd::canvas::end", "c", x);

This is a pretty common pattern. Two examples that spring to my mind:
https://github.com/pac-dev/protoplug/blob/master/Frameworks/vstsdk2.4_minimal/pluginterfaces/vst2.x/aeffectx.h#L305-L306
https://steinbergmedia.github.io/vst3_doc/vstinterfaces/classSteinberg_1_1Vst_1_1IComponentHandler.html#a8456ad739430267a12dda11a53fe9223.

@Spacechild1
Copy link
Contributor

Spacechild1 commented Mar 17, 2023

To expand on the previous post:

In naive canvas implementations, every change to the canvas may trigger a full redraw. (I think that Tk belongs to this category.) If you issue many such commands in a row, every command will implicitly repeat all previous commands, leading to quadratic complexity. Theoretically, we could patch the Tk library and add commands to temporarily disable any redrawing. Together with ::canvas::begin and ::canvas::end messages, this could improve patch loading times dramatically. But this is just wild speculation :-)

Now, imagine a less naive implementation where a command does not immediately trigger a redraw, but rather schedules a redraw event. Even in this case, ::canvas::begin and ::canvas::end would be huge benefit because we could batch commands, so that several commands only cause a single redraw event.

Finally, the core communicates with the GUI over a TCP socket; although commands are sent consecutively, they might arrive with delays and consequently get scattered over several event loop ticks, again leading to redundant redraws. ::canvas::begin and ::canvas::end would allow the GUI backend to wait for a sufficient amount of commands before dispatching them.

@umlaeute
Copy link
Contributor Author

This is a pretty common pattern. Two examples that spring to my mind

that's really just one example (the "Steinway").

i'm not opposed though...

@Spacechild1
Copy link
Contributor

Spacechild1 commented Mar 17, 2023

that's really just one example (the "Steinway").

Well, you can really find this pattern everywhere: database transactions, builder pattern, OSC bundles, openGL (glBegin / glEnd), you name it. The particular motivation might be different (data consistency, performance, convenience), but the underlying concept is the same: communicate that a sequence of actions forms an atomic unit.

i'm not opposed though...

+1

@Spacechild1
Copy link
Contributor

Here's another example from Pd itself: canvas_suspend_dsp and canvas_resume_dsp are used to enclose multiple actions that might update the DSP graph. The purpose is exactly to avoid redundant calculations and quadratic complexity. ::canvas::begin and ::canvas::end would apply the same idea to GUI rendering.

@giuliomoro
Copy link
Contributor

Just found a bug on this branch: load a patch with a ticked checkbox and it won't show the checkmark upon loading. As expected, the state is actually stored, so you need to click on it twice to see the tick appear. Example patch:

#N canvas 219 208 977 392 12;
#X obj 157 133 tgl 19 1 empty empty empty 0 -10 0 12 #fcfcfc #000000 #000000 1 1;

@umlaeute
Copy link
Contributor Author

I think this is what is described in the top post under the heading "regression"

@giuliomoro
Copy link
Contributor

In naive canvas implementations, every change to the canvas may trigger a full redraw. (I think that Tk belongs to this category.) If you issue many such commands in a row, every command will implicitly repeat all previous commands, leading to quadratic complexity. Theoretically, we could patch the Tk library and add commands to temporarily disable any redrawing. Together with ::canvas::begin and ::canvas::end messages, this could improve patch loading times dramatically. But this is just wild speculation :-)

My understanding* is that tcl/tk always defers drawing. So maybe it's just a matter of queueing messages received after ::canvas::begin and only process the queue when ::canvas::end is received? Also wondering if tricks such as
after x / after idle and update / update idle could be useful, however they are often not.

* hint that contributed to this understanding: the update man page says

Most display updates are performed as idle callbacks, so update idletasks will cause them to run. However, there are some kinds of updates that only happen in response to events, such as those triggered by window size changes; these updates will not occur in update idletasks.

@Spacechild1
Copy link
Contributor

Spacechild1 commented Mar 17, 2023

My understanding* is that tcl/tk always defers drawing.

Ok, so it rather falls into the second category described in #1765 (comment)

So maybe it's just a matter of queueing messages received after ::canvas::begin and only process the queue when ::canvas::end is received?

Yes, that's what I was thinking.

@danomatika
Copy link
Contributor

Maybe doing all the command parsing and string handling to draw on the Tcl side is simply slower because the original implementation just sent the command directly to Tk as opposed to additional Tcl proc layers. I feel like that could be a bottleneck and could use some finer-grained profiling.

@danomatika
Copy link
Contributor

If calling procs and string handling is the slowdown on the Tcl/TK side, one approach could be to write a Tcl plugin to do it in C then send it back to Tk.

@umlaeute
Copy link
Contributor Author

umlaeute commented Aug 31, 2023

i always assume that the tcl guys know what they are doing (language wise): after all interpreting strings is at the core of the language itself, and afaik there's even some jit-compiler hidden somewhere.

maybe we are bypassing this when reading the commands from the socket.

@danomatika
Copy link
Contributor

danomatika commented Aug 31, 2023

i always assume that the tcl guys know what they are doing (language wise): after all interpreting strings is at the core of the language itself, and afaik there's even some jit-compiler hidden somewhere.

maybe we are bypassing this by when reading the commands from the socket.

Yeah, I did too, otherwise it never would have been usable on the original hardware it was developed for in the 90s. The only thing I can think of is some sort of memory allocation on the fly, but that shouldn't be anything we have to deal with directly. Strange.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants