Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/tree: hitting 'select all' hangs the application #300

Closed
erco77 opened this issue Nov 27, 2021 · 34 comments
Closed

test/tree: hitting 'select all' hangs the application #300

erco77 opened this issue Nov 27, 2021 · 34 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@erco77
Copy link
Contributor

erco77 commented Nov 27, 2021

Investigating..

@erco77
Copy link
Contributor Author

erco77 commented Nov 27, 2021

Hmm, it does recover after about 30 or more seconds.

Seems to be the Fl_Simple_Terminal() code taking a really long time to handle all the info print messages triggered by select all (there's 500 items in one of the tree's children) Still, printing 500 lines shouldn't be a problem.

Possibly platform specific; this is linux, and according to gdb, apparently xft calls are involved during the printf() operations, probably doing label measurements. Still investigating.

@erco77
Copy link
Contributor Author

erco77 commented Nov 27, 2021

Yikes, yeah, this is actually a problem in Fl_Simple_Terminal.

I modified examples/simple-terminal.cxx to just print 300 dots and a crlf every second, e.g.

    for ( int t=0; t<300; t++ ) G_tty->printf("."); 
    G_tty->printf("\n");

..and that's causing really large delays in e.g. resizing and such.

This doesn't seem to be xft specific; even rebuilding FLTK with --disable-xft still causes severe slowness.

@erco77
Copy link
Contributor Author

erco77 commented Nov 27, 2021

Hmm, I copied Fl_Simple_Terminal's code over to 1.3.x (this widget is new in 1.4.x), and am /not/ seeing slowness, so perhaps something has changed in FLTK 1.4.x that's slowing down drawing text.

I checked OSX + 1.4.x, and I see the problem there too with test/tree, but to a lesser extent. On an old mac mini, I'm seeing a ~10 sec hang when hitting the "Select All" button.

So this seems to be a cross platform issue.

Might need help with this one -- has anything changed in 1.4.x recently that might affect text drawing speed? I'll try to determine if this is a regression with bisect.. might take a while.

@erco77 erco77 self-assigned this Nov 27, 2021
@erco77 erco77 added the help wanted Extra attention is needed label Nov 27, 2021
@erco77
Copy link
Contributor Author

erco77 commented Nov 27, 2021

Yeesh, I've gone back to a commit from late 2017 and still seeing slowness with test/tree "Select All" so it's not a recent thing. But it's still more efficient somehow in 1.3.x?

Decided not to go back any further in 1.4.x and see if I can optimize Fl_Simple_Terminal's operation. It inherits from Fl_Text_Display which is a touchy beast, so looking to design around Fl_Text_Display's behavior.

@ManoloFLTK
Copy link
Contributor

ManoloFLTK commented Nov 28, 2021

Adding
tty->wrap_mode(Fl_Simple_Terminal::WRAP_NONE, 0);
after line #2069 of file test/tree.cxx (after it was generated by fluid),
has the slowness disappear.

Thus, the slowness seems connected to computations that support line wrapping
repeated each time a line is written to the terminal.
This gives at least a temporary fix for the issue because the wrap mode can be turned off
at the beginning of 'Select All' and set back to WRAP_AT_BOUNDS at its end
(and similarly for 'Deselect All').

Still, it's very strange that the same Simple-Terminal code is fast under 1.3,
since wrap support is presumably the same as in 1.4.

@ManoloFLTK
Copy link
Contributor

ManoloFLTK commented Nov 28, 2021

I would suggest the culprit is in changes in the Fl_Text_Display widget between 1.3 and 1.4,
because if you copy this widget's 1.4 code to 1.3 [this copy requires some tweaking
to remove uses of Fl::screen_driver()], test/tree's 'Select All' and 'Deselect All'
become slow also with 1.3.

@ManoloFLTK
Copy link
Contributor

ManoloFLTK commented Nov 29, 2021

OK. The culprit is eb772d0.

Luckily, this is a one-liner : add
oldTAWidth = - 1;
after line #494 of src/Fl_Text_Display.cxx (as of eb772d0) in function Fl_Text_Display::resize().

This is visible as follows :
git reset --hard eb772d0,
make a copy of src/Fl_Text_Display.cxx and FL/Fl_Text_Display.H
git pull -ff
copy the two saved files to src/ and FL/, respectively.
Build and run test/tree: 'SelectAll' and 'Deselect All' are slow and become slower and slower
each time they are used because the Simple Terminal contains more and more text.

Comment out the single line #495 in src/Fl_Text_Display.cxx to get
// oldTAWidth = - 1;
This undoes eb772d0.
Build and run test/tree: 'SelectAll' and 'Deselect All' run quickly, and their run-time doesn't increase.

Greg: I propose to stop here and let you continue the analysis of this issue, if you agree.

@Albrecht-S
Copy link
Member

@ManoloFLTK Thank you very much, these are great findings. Since that particular commit was done for a reason (STR 3412 which is still open ("pending")) we need to find a better solution to address both issues.

FTR: As the commit message of eb772d0 (svn 12526) mentions this commit disabled "a part of the optimization" introduced in commit 7e94d76 (svn r11208) which obviously causes the observed slowness.

Sigh. :-(

@erco77 I'm too busy to look into this further right now and I can't promise that this may happen soon. I just wanted to give feedback since the mentioned commits were my work.

@erco77
Copy link
Contributor Author

erco77 commented Nov 29, 2021

Yes, thanks Manolo, I found there's a few issues that I can maybe optimize in Fl_Simple_Terminal too; there's three calls that seem to use a lot of CPU, and are called each time its printf() method is called:

  • buffer->append() -- appends the string to the buffer, causing a recalc
  • scroll(newpos) -- forces scrollbar to bottom, but ALSO causes an entire buffer recalc
  • A remove() operation to enforce the screen history size (ALSO causes a buffer recalc)

I'm not sure how I can optimize all that though.. it'd probably be easier to implement the terminal code myself rather than try to fix Fl_Text_Editor lol.

@erco77
Copy link
Contributor Author

erco77 commented Nov 29, 2021

FWIW I'm attaching a profiler report for running test/tree, hitting "Select All", then exiting to show what's taking so long.

tree-master.gprof.txt

To use the profiler:

  • Add -pg to the g++ compile/link commands, rebuild the app
  • Running the app and exiting generates a gmon.out file
  • Run gprof [appname] to view the profile report

@erco77
Copy link
Contributor Author

erco77 commented Nov 29, 2021

@Albrecht-S yes, we want to avoid a "bug fix oscillation" where one bug's fix is another bug's bug.
There's a few times in FLTK where I've seen such oscillations span a decade or more, where it goes back and forth - so much time between the fixes, no one remembers the back-and-forth.

Luckily the commits show the related bugs clearly.

This is going to be really annoying to fix, but the penalty in time is huge; just doing 500 string appends, let alone the calls to scroll() and remove(). We've known for a while Fl_Text_Xxx can get really slow just because of the scrollbar calculations. There must really be a way to do this better, keeping a running maximum for the width and height so these recalcs don't involve so much cpu.

@wcout
Copy link
Contributor

wcout commented Dec 5, 2021

One way out of the dilemma would be to defer buffer updates in Fl_Text_Display.

Following works:
textdisplay_trigger_recalc_display.diff.txt

@Albrecht-S
Copy link
Member

@wcout Great idea, thank you very much for the patch. I can't test it right now but I'm sure Greg (@erco77) will do.

I have one question: wouldn't it make sense to just set a flag in trigger_recalc_display() to delay all the calculations until draw() is called?

Disclaimer: I only "read" the patch, I didn't look at the underlying code and don't know if this is feasible. Maybe you, Chris, know better?

@wcout
Copy link
Contributor

wcout commented Dec 5, 2021

I have one question: wouldn't it make sense to just set a flag in trigger_recalc_display() to delay all the calculations until draw() is called?

Wow! That's a great idea!

And it works:
textdisplay_trigger_recalc_display_v2.diff.txt

Note: Just changed my patch to test it quickly. But maybe it can now be written still easier. I'll leave this to someone else (@erco77 ?).

@Albrecht-S
Copy link
Member

Wow, that's a quick reaction! Thanks again, Chris.

@erco77
Copy link
Contributor Author

erco77 commented Dec 14, 2021

Sure, I'll give this a spin. Anything to avoid actually fixing Fl_Text_Xxx, lol

@wcout
Copy link
Contributor

wcout commented Aug 21, 2022

A similar phenomenon can be observed with the test/table demo. When setting a larger number of columns/rows (e.g. 5000/5000) then changes of Col Width or Row Height lead to considerable hangs. This is because these changes are triggering callbacks for each cell that lead to many log messages on the Fl_Simple_Terminal.

Again these lags are cured with the above patch of Fl_Text_Display.

@erco77
Copy link
Contributor Author

erco77 commented Aug 22, 2022

Lost my momentum on this, but am taking a look now.
The technique seems sound, will do some tests.

@erco77
Copy link
Contributor Author

erco77 commented Aug 22, 2022

Delaying recalc seems a fine solution, but it can have some obvious side effects; if recalc is deferred, apps (and derived classes) that depended on the calculations of scrollbar sizes be updated will now not see these calculations done until the first draw().

An example I can think of is an app that repeatedly append()s text to the widget, and tries to keep the screen scrolled to show the last line of text based on the scrollbar position, which now with this new change wouldn't be calculated until the next draw().

We can't be certain existing apps (and derived classes) won't break in interesting ways if these calculations are suddenly deferred unconditionally.

I'd suggest for backwards compatibility this be considered an "option" that is off by default, and can be turned on by apps (and derived classes like Fl_Simple_Terminal) that need it.

We could add some docs to the Fl_Text_Display/Editor intro that warns if editor operations seem slow during repeat calls to methods that manipulate the editor, to recommend turning on this "deferred recalc" option, and reference the benefits and drawbacks.

I'm not really sure what to call such an option.. whatever we come up with should be forward thinking, in case we want other complex widgets to have a similar flag.

deferred_recalc() sounds a bit nerdy and unobvious to an app developer.. maybe delayed_recalc_display(true|false) or recalc_display_on_draw(true|false)?

@erco77
Copy link
Contributor Author

erco77 commented Aug 22, 2022

Attaching a v2 patch: textdisplay_trigger_recalc_display_v2.diff.txt
This has only small mods to wcout's last patch:

  • Small name change: needs_recalc_display(true|false)
  • Made this function public/virtual, made variable protected/bool
  • Added some doxygen docs for new public method

TODO: Awaiting comments on if we need to make this behavior optional/default off, and if so, will do code implementation + docs.

@wcout
Copy link
Contributor

wcout commented Aug 22, 2022

Perhaps changing Fl_Text_Display is not required at all to solve these issues here - thus avoiding incompatibility woes with Fl_Text_Display-derived classes:

I've noted now, that Fl_Text_Display::recalc_display()is a virtual function already.
As the problem lies with the slowness of Fl_Simple_Terminal, which is inherited from Fl_Text_Display it would be possible to overload recalc_display() in Fl_Simple_Terminal and use the defer logic just there!
The resulting change would be much simpler too:

diff --git a/FL/Fl_Simple_Terminal.H b/FL/Fl_Simple_Terminal.H
index 158bf7d48..ac3d9c539 100644
--- a/FL/Fl_Simple_Terminal.H
+++ b/FL/Fl_Simple_Terminal.H
@@ -138,6 +138,7 @@ private:
   int stable_size_;         // active style table size (in bytes)
   int normal_style_index_;  // "normal" style used by "\033[0m" reset sequence
   int current_style_index_; // current style used for drawing text
+  bool recalc_display_needed_; // true if recalc of display is needed
 
 public:
   Fl_Simple_Terminal(int X,int Y,int W,int H,const char *l=0);
@@ -181,6 +182,7 @@ private:
 protected:
   // Fltk
   virtual void draw();
+  virtual void recalc_display();
 
   // Internal methods
   void enforce_stay_at_bottom();
diff --git a/src/Fl_Simple_Terminal.cxx b/src/Fl_Simple_Terminal.cxx
index 3d776f2d8..b3c28042a 100644
--- a/src/Fl_Simple_Terminal.cxx
+++ b/src/Fl_Simple_Terminal.cxx
@@ -110,6 +110,7 @@ Fl_Simple_Terminal::Fl_Simple_Terminal(int X,int Y,int W,int H,const char *l) :
   orig_vscroll_cb = mVScrollBar->callback();
   orig_vscroll_data = mVScrollBar->user_data();
   mVScrollBar->callback(vscroll_cb, (void*)this);
+  recalc_display_needed_ = false;
 }
 
 /**
@@ -732,6 +733,10 @@ void Fl_Simple_Terminal::draw() {
   //
 #define LEFT_MARGIN 3
 #define RIGHT_MARGIN 3
+  if (recalc_display_needed_) {
+    recalc_display_needed_ = false;
+    Fl_Text_Display::recalc_display();
+  }
   int buflen = buf->length();
   // Force cursor to EOF so it doesn't draw at user's last left-click
   insert_position(buflen);
@@ -746,3 +751,8 @@ void Fl_Simple_Terminal::draw() {
   if (position_to_xy(buflen, &X, &Y)) draw_cursor(X, Y);
   fl_pop_clip();
 }
+
+void Fl_Simple_Terminal::recalc_display() {
+  recalc_display_needed_ = true; // set flag only
+  redraw(); // ensure draw() gets called
+}

I don't think many folks have based their classes on Fl_Simple_Terminal, so a change there will be less intrusive.

@Albrecht-S
Copy link
Member

@wcout Thanks, Chris, this is a smart idea. However, although it would likely solve the issue at hand, i.e. in Fl_Tree and others using Fl_Simple_Terminal, it wouldn't fix the root cause which is in Fl_Text_Display. Recently I used a test program that loads a text file with about four million lines or even more into an Fl_Text_Display widget and I could observe its slowness if wrap mode is on which is the main cause of the slowness. I'm going to test this with either Greg's patch or my mods (WIP, see below) to see how much it can be improved.

FTR: I'm also working on a patch based on Greg's (@erco77's) latest version of your (@wcout's) patch. I'm thinking about the option Greg mentioned to switch the deferred calculation on (and off) in the user program because I'm concerned about backwards compatibility of unmodified programs - which means that the default would be off (non-deferred, or "direct"). I'll post more in a follow-up, but please don't hold your breath, I need some more time for testing.

@Albrecht-S
Copy link
Member

@erco77 Great findings, delaying calculation of internals until the widget gets drawn can have unwanted effects. As you may know I'm working on Fl_Flex (committed but not yet finished) and Fl_Grid (beta stadium, not yet committed). For both widgets I'd like to implement delayed layout calculation until draw() is called but we definitely need to calculate the positions and sizes at least on request so the user program can inspect the widget sizes and positions etc.. I'm doing this now in a method called layout() which calculates the layout of its children. This method is currently called on each resize() operation but I'd like to delay it until either the user calls layout() on the widget or draw() is executing.

Using layout() as a method name could become a standard for FLTK widgets, maybe...

@erco77
Copy link
Contributor Author

erco77 commented Aug 22, 2022

Ya, I'd guess if layout() might be like redraw() in that it schedules updating the internals until draw(), if someone wanted to force a recalculation /now/, they could call recalc_layout() or maybe update_layout().

@Albrecht-S
Copy link
Member

OK, here's some [bad] news. As it stands, recalc_display() can't be called within draw() because it calls redraw() near the end of recalc_display(). This is ignored because the damage() flags are cleared after draw(). Hence the simple patch(es) posted above can't work as expected. So far my tests with my modified patch show that moving the scrollbar does no longer update the display. This may be caused by calling and ignoring redraw(), I'm not sure yet. There may be other side effects as well, and this may be changed so it works, hopefully.

While I'm at it I'm also trying to profile the code and I found some interesting hotspots. For instance, with my current test file (480,000 lines) there are some code sequences that use lots of CPU time, e.g. display_insert() uses 1.8 seconds (!) CPU time in one execution although a comment states "the efficiency of this routine is not as important to the overall performance of the text display" which I'd say is clearly wrong. Imagine what happens if the user resizes the window.

I'm investigating further...

@wcout
Copy link
Contributor

wcout commented Aug 23, 2022

@Albrecht-S So far my tests with my modified patch show that moving the scrollbar does no longer update the display

There was a flaw in my original patch (and @erco77's modified patch). redraw() must be called in trigger_recalc_display() (or needs_recalc_display as it is now called)! I realized this some days ago, but was wondering why it worked without too in the cases of these issue...

void Fl_Text_Display::needs_recalc_display() {
  needs_recalc_display_ = true;
  redraw(); // ensure draw() gets called
}

With this change moving the scrollbar works in editor.
Of course there is still the question about the redraw in recalc_display that is now moot, when called from draw(). In this aspect my original timer version of the patch was more compatibel.

@Albrecht-S
Copy link
Member

@wcout Thanks again for your support. I was going to test with your/Greg's original patches but I can now save this time. I'll add this to my patch (or see how it will be fixed otherwise).

Correction: my observation posted above that redraw() inside draw() does not work is moot because recalc_display() is called at the very beginning of draw(). Thus everything should be fine because redraw() is unnecessary.

@erco77
Copy link
Contributor Author

erco77 commented Dec 18, 2022

So uh, did we come to a consensus on how to fix this?
Albrecht, I think you were the last to mention a patch, can you provide it here?
Would like to see if any of this improves the test application behavior, because it's still pretty slow when there's a large text history in the buffer, and a few hundred lines are added, freezing up the app for 5 or 10 seconds.

@Albrecht-S
Copy link
Member

@erco77 Sorry, I can't remember what the current status is/was and what patch I mentioned above. I'd need some time to come back to this because I'm currently concentrating on other stuff.

Maybe @MatthiasWM will find improvements in #596?

@wcout
Copy link
Contributor

wcout commented Dec 19, 2022

I feel so free to update @erco77 's v2 patch with my change of adding redraw() to Fl_Text_Display::needs_recalc_display() to fix the dragging scrollbar issue @Albrecht-S mentioned above and to latest git:

textdisplay_trigger_recalc_display_v3.diff.txt

Try tree demo:

  • Select tag 'Root'
  • Button "Add 20.000"
  • Button "Select All"

before: hangs "for ever" (had to kill the demo)
after: ~2 seconds

@ManoloFLTK
Copy link
Contributor

See also issue #617 that reports the same issue.

@erco77
Copy link
Contributor Author

erco77 commented Nov 6, 2023

This issue will surely be solved when the new Fl_Terminal code is pulled in from my fork.

@MatthiasWM
Copy link
Contributor

@erco77 can this be closed?

@erco77
Copy link
Contributor Author

erco77 commented Dec 10, 2023

Yep, closed

@erco77 erco77 closed this as completed Dec 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants