PR: Improve performance of code folding and indent guides #16446

mrclary · 2021-09-18T21:38:35Z

Description of Changes

Add jellyfish package which provides optimized algorithms for textdistance
Change algorithm from jaccard to hamming
1000x performance improvement for code folding and indent guides

Issue(s) Resolved

Affirmation

By submitting this Pull Request or typing my (user)name below,
I affirm the Developer Certificate of Origin
with respect to all commits and content included in this PR,
and understand I am releasing the same under Spyder's MIT (Expat) license.

I certify the above statement is true and correct:
@mrclary

pep8speaks · 2021-09-18T21:40:04Z

Hello @mrclary! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-09-28 14:12:03 UTC

mrclary · 2021-09-18T21:45:23Z

@ccordoba12, jellyfish is only available on conda-forge, so we may need to do something there.

ccordoba12 · 2021-09-18T22:44:19Z

Thanks for your help with this @mrclary! @andfoy will take a look at it next week to see if we can use the hamming algorithm instead of jaccard.

andfoy · 2021-09-23T15:46:51Z

Thanks for the improvement @mrclary, as for the difference between the algorithms, the Jaccard one refers to the Jaccard similarity, which is simply the token-wise intersection over the union, that is, the sum of the characters that match per position over the total number of character, whereas the Hamming distance computes the total number of edits (additions, subtractions) that are required to transform a string into another.

We were using the Jaccard distance because it resulted faster from the previous distance (Jaro-Wrinkler, an edit distance as well) because it just relied on local information rather on character neighborhoods. Since this PR improves the runtime again, I think there is no problem, because we're looking for a distance between code blocks that might tell us if they are both similar or not, the only caveat is that heuristic threshold that we use to decide if two strings are the same or not, which we set to 0.8. Unless users complain about code folding mismatches, I think we can leave it as it is, otherwise we might need to increase it.

mrclary · 2021-09-23T17:02:19Z

@andfoy, sounds good. We can always adjust the threshold if needed.
@ccordoba12, what should we do about jellyfish on conda-forge? I remember we had some issues dealing with that when we were upgrading the rtree version.

ccordoba12 · 2021-09-27T02:37:41Z

@mrclary, please rebase on top of the latest 5.x because I moved our tests to use conda-forge in PR #16492. That will solve the failures you're seeing here.

spyder/dependencies.py

ccordoba12 · 2021-09-27T02:44:12Z

spyder/plugins/editor/widgets/codeeditor.py

-        6500: 1800
-    }
+    # linting results arrive
+    SYNC_SYMBOLS_AND_FOLDING_TIMEOUT = 500  # milliseconds


As the constant's name says, this controls updating folding and symbols. So removing the timeouts we have for different file sizes will introduce sluggishness when people have the outline pane visible.

So please revert this change and the others related to it.

I don't find any affect when the outline pane is visible. Updating the outline explorer pane seems to take less than 150ms when viewing the codeeditor.py (5000+ lines, 250+ definitions) and did not affect typing latency in the editor in my tests. I can revert, but I don't think it's necessary. If there is concern for older, slower systems, I think keeping a single timeout may still be better (simpler) and just increasing the timeout (1000ms?).

This is not correct on my tests. In the gifs below I used our codeeditor.py file, and I disabled folding and only have the outline visible so we can have a fair comparison between 5.x and this PR:

This PR

5.x

mrclary · 2021-09-27T21:47:33Z

@ccordoba12, I could not tell what your issue was from the gifs that you posted. If you were experiencing latency, then I could not distinguish it from your typing cadence. Nevertheless, for the sake of getting this PR through, I reverted the tiered timeout.

I must however, point out that this tiered timeout paradigm does not really solve the problem. If you are experiencing any latency due to symbols being updated, then that latency will always be experienced when typing at some small time after the timeout expires but before the symbols update is complete. The real issue, then, is the performance of updating the symbols. I don't know what this is on your system, but I indicated <150ms for me on my system when testing one of our largest files. If the performance cannot be improved, then perhaps we should explore using a Qthread for it, so it does not interrupt the Editor.

ccordoba12 · 2021-09-28T02:06:36Z

I could not tell what your issue was from the gifs that you posted. If you were experiencing latency, then I could not distinguish it from your typing cadence

The pauses you can see in the first gif are caused by latency in the editor, not because I paused while writing. I'm not the best typist, but I think it's clear that in the second gif I'm typing at a higher speed and with less errors. But that was because I was not seeing almost any delay on the text I was writing that distracted me.

I must however, point out that this tiered timeout paradigm does not really solve the problem.

We are perfectly aware of that. This is just the best we can do for now (see below).

The real issue, then, is the performance of updating the symbols. I don't know what this is on your system, but I indicated <150ms for me on my system when testing one of our largest files.

According to usability studies. if actions take longer than 175ms, people think something is broken or malfunctioning with your program. If we apply the same reasoning here, if people see delays closer to that, they'll probably think something isn't working properly.

If the performance cannot be improved, then perhaps we should explore using a Qthread for it, so it does not interrupt the Editor.

Ryan, I know you have the best intentions, but this is getting a bit tiresome because lately it seems you are just criticizing what we have without even asking what we've done to get here. Edgar and I have put a lot of time and effort thinking about how to solve this problem and trying different solutions (QThread's, regular Python threads, and Cython and Rust bindings).

In short, there's no thread nor binding-based solution that works in this case because of Python's GIL. Since our operations are CPU-bound (not IO-bound, where asyncio would be perfect), we need to develop a client-server architecture to process those operations in the server and only display the results in the client (i.e. the Spyder interface). That way our CPU intensive problem would become an IO intensive one. But that's something we don't have time for right now.

ccordoba12 · 2021-09-28T02:17:01Z

According to usability studies. if actions take longer than 175ms, people think something is broken or malfunctioning with your program. If we apply the same reasoning here, if people see delays closer to that, they'll probably think something isn't working properly.

I know this is contradictory because the timeouts we have in place right now are huge compared to 175 ms for big files. However, I think this mostly impacts indent guides. That's because it takes you 1-2 seconds to move the cursor to the folding panel and we show a spinner to tell people when symbols are being updated.

mrclary · 2021-09-28T06:16:50Z

Ryan, I know you have the best intentions, but this is getting a bit tiresome because lately it seems you are just criticizing what we have without even asking what we've done to get here. Edgar and I have put a lot of time and effort thinking about how to solve this problem and trying different solutions

Carlos and Edgar, I sincerely apologize if I seem critical of any of the work that you have done. Thank you for recognizing my good intentions, because I certainly did not intend to criticize anyone's work. You are correct that I did not inquire about what work has already been done; I apologize for that as well. Than you for your explanation; that helps me understand the problem better.

I will try to improve my communication with you and others in the Spyder team in the future so that I can better contribute to the project and avoid any misunderstandings. I certainly don't want to create any tension or bad feelings.

…istance * Change algorithm from jaccard to hamming * 1000x performance improvement for code folding and indent guides

This reverts commit ab9bd39.

ccordoba12 · 2021-09-28T16:29:13Z

Carlos and Edgar, I sincerely apologize if I seem critical of any of the work that you have done. Thank you for recognizing my good intentions, because I certainly did not intend to criticize anyone's work.

Thanks for your words Ryan! I really value your contributions and the effort you've put into make Spyder better in a lot of fronts. For instance, this PR is a huge improvement over what we had before.

I will try to improve my communication with you and others in the Spyder team in the future so that I can better contribute to the project and avoid any misunderstandings

Thanks! I think just asking or inquiring first, before assuming things up front, will make our conversations more fluid and pleasant.

ccordoba12

Thanks @mrclary!

Fixes #14823

mrclary added this to the v5.2.0 milestone Sep 18, 2021

mrclary requested a review from ccordoba12 September 18, 2021 21:38

mrclary changed the base branch from master to 5.x September 18, 2021 21:39

mrclary force-pushed the editor-latency branch from dea7a37 to b97fb9d Compare September 18, 2021 21:40

mrclary force-pushed the editor-latency branch from b97fb9d to bbd72a6 Compare September 18, 2021 21:41

mrclary force-pushed the editor-latency branch from bbd72a6 to 1ae3f01 Compare September 18, 2021 22:13

ccordoba12 requested a review from andfoy September 18, 2021 22:42

ccordoba12 reviewed Sep 27, 2021

View reviewed changes

mrclary force-pushed the editor-latency branch from 1ae3f01 to 067b439 Compare September 27, 2021 15:28

mrclary requested a review from ccordoba12 September 28, 2021 00:22

mrclary added 4 commits September 28, 2021 07:11

* Add jellyfish package which provides optimized algorithms for textd…

0622b03

…istance * Change algorithm from jaccard to hamming * 1000x performance improvement for code folding and indent guides

* Set single timeout for syncing symbols and folding

80b9d45

* Address review comment

4719770

Revert "* Set single timeout for syncing symbols and folding"

410ef6c

This reverts commit ab9bd39.

mrclary force-pushed the editor-latency branch from d647768 to 410ef6c Compare September 28, 2021 14:11

ccordoba12 approved these changes Sep 28, 2021

View reviewed changes

ccordoba12 assigned mrclary Sep 28, 2021

ccordoba12 merged commit d8d1696 into spyder-ide:5.x Sep 28, 2021

ccordoba12 added a commit that referenced this pull request Sep 28, 2021

Merge from 5.x: PR #16446

60b6d92

Fixes #14823

mrclary deleted the editor-latency branch September 28, 2021 22:07

mrclary mentioned this pull request Dec 1, 2021

Spyder Mac app laggy in 5.2.0 #16910

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR: Improve performance of code folding and indent guides #16446

PR: Improve performance of code folding and indent guides #16446

mrclary commented Sep 18, 2021

pep8speaks commented Sep 18, 2021 •

edited

Loading

mrclary commented Sep 18, 2021

ccordoba12 commented Sep 18, 2021

andfoy commented Sep 23, 2021 •

edited

Loading

mrclary commented Sep 23, 2021

ccordoba12 commented Sep 27, 2021

ccordoba12 Sep 27, 2021 •

edited

Loading

mrclary Sep 27, 2021

ccordoba12 Sep 27, 2021 •

edited

Loading

mrclary commented Sep 27, 2021

ccordoba12 commented Sep 28, 2021 •

edited

Loading

ccordoba12 commented Sep 28, 2021 •

edited

Loading

mrclary commented Sep 28, 2021

ccordoba12 commented Sep 28, 2021 •

edited

Loading

ccordoba12 left a comment

PR: Improve performance of code folding and indent guides #16446

PR: Improve performance of code folding and indent guides #16446

Conversation

mrclary commented Sep 18, 2021

Description of Changes

Issue(s) Resolved

Affirmation

pep8speaks commented Sep 18, 2021 • edited Loading

Comment last updated at 2021-09-28 14:12:03 UTC

mrclary commented Sep 18, 2021

ccordoba12 commented Sep 18, 2021

andfoy commented Sep 23, 2021 • edited Loading

mrclary commented Sep 23, 2021

ccordoba12 commented Sep 27, 2021

ccordoba12 Sep 27, 2021 • edited Loading

Choose a reason for hiding this comment

mrclary Sep 27, 2021

Choose a reason for hiding this comment

ccordoba12 Sep 27, 2021 • edited Loading

Choose a reason for hiding this comment

mrclary commented Sep 27, 2021

ccordoba12 commented Sep 28, 2021 • edited Loading

ccordoba12 commented Sep 28, 2021 • edited Loading

mrclary commented Sep 28, 2021

ccordoba12 commented Sep 28, 2021 • edited Loading

ccordoba12 left a comment

Choose a reason for hiding this comment

pep8speaks commented Sep 18, 2021 •

edited

Loading

andfoy commented Sep 23, 2021 •

edited

Loading

ccordoba12 Sep 27, 2021 •

edited

Loading

ccordoba12 Sep 27, 2021 •

edited

Loading

ccordoba12 commented Sep 28, 2021 •

edited

Loading

ccordoba12 commented Sep 28, 2021 •

edited

Loading

ccordoba12 commented Sep 28, 2021 •

edited

Loading