Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR: Improve performance of code folding and indent guides #16446

Merged
merged 4 commits into from
Sep 28, 2021

Conversation

mrclary
Copy link
Contributor

@mrclary mrclary commented Sep 18, 2021

Description of Changes

  • Add jellyfish package which provides optimized algorithms for textdistance
  • Change algorithm from jaccard to hamming
  • 1000x performance improvement for code folding and indent guides

Issue(s) Resolved

Fixes #14823

Affirmation

By submitting this Pull Request or typing my (user)name below,
I affirm the Developer Certificate of Origin
with respect to all commits and content included in this PR,
and understand I am releasing the same under Spyder's MIT (Expat) license.

I certify the above statement is true and correct:
@mrclary

@mrclary mrclary added this to the v5.2.0 milestone Sep 18, 2021
@mrclary mrclary changed the base branch from master to 5.x September 18, 2021 21:39
@pep8speaks
Copy link

pep8speaks commented Sep 18, 2021

Hello @mrclary! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-09-28 14:12:03 UTC

@mrclary
Copy link
Contributor Author

mrclary commented Sep 18, 2021

@ccordoba12, jellyfish is only available on conda-forge, so we may need to do something there.

@ccordoba12
Copy link
Member

Thanks for your help with this @mrclary! @andfoy will take a look at it next week to see if we can use the hamming algorithm instead of jaccard.

@andfoy
Copy link
Member

andfoy commented Sep 23, 2021

Thanks for the improvement @mrclary, as for the difference between the algorithms, the Jaccard one refers to the Jaccard similarity, which is simply the token-wise intersection over the union, that is, the sum of the characters that match per position over the total number of character, whereas the Hamming distance computes the total number of edits (additions, subtractions) that are required to transform a string into another.

We were using the Jaccard distance because it resulted faster from the previous distance (Jaro-Wrinkler, an edit distance as well) because it just relied on local information rather on character neighborhoods. Since this PR improves the runtime again, I think there is no problem, because we're looking for a distance between code blocks that might tell us if they are both similar or not, the only caveat is that heuristic threshold that we use to decide if two strings are the same or not, which we set to 0.8. Unless users complain about code folding mismatches, I think we can leave it as it is, otherwise we might need to increase it.

@mrclary
Copy link
Contributor Author

mrclary commented Sep 23, 2021

@andfoy, sounds good. We can always adjust the threshold if needed.
@ccordoba12, what should we do about jellyfish on conda-forge? I remember we had some issues dealing with that when we were upgrading the rtree version.

@ccordoba12
Copy link
Member

@mrclary, please rebase on top of the latest 5.x because I moved our tests to use conda-forge in PR #16492. That will solve the failures you're seeing here.

spyder/dependencies.py Outdated Show resolved Hide resolved
6500: 1800
}
# linting results arrive
SYNC_SYMBOLS_AND_FOLDING_TIMEOUT = 500 # milliseconds
Copy link
Member

@ccordoba12 ccordoba12 Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the constant's name says, this controls updating folding and symbols. So removing the timeouts we have for different file sizes will introduce sluggishness when people have the outline pane visible.

So please revert this change and the others related to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find any affect when the outline pane is visible. Updating the outline explorer pane seems to take less than 150ms when viewing the codeeditor.py (5000+ lines, 250+ definitions) and did not affect typing latency in the editor in my tests. I can revert, but I don't think it's necessary. If there is concern for older, slower systems, I think keeping a single timeout may still be better (simpler) and just increasing the timeout (1000ms?).

Copy link
Member

@ccordoba12 ccordoba12 Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct on my tests. In the gifs below I used our codeeditor.py file, and I disabled folding and only have the outline visible so we can have a fair comparison between 5.x and this PR:

This PR

slow

5.x

fast

@mrclary
Copy link
Contributor Author

mrclary commented Sep 27, 2021

@ccordoba12, I could not tell what your issue was from the gifs that you posted. If you were experiencing latency, then I could not distinguish it from your typing cadence. Nevertheless, for the sake of getting this PR through, I reverted the tiered timeout.

I must however, point out that this tiered timeout paradigm does not really solve the problem. If you are experiencing any latency due to symbols being updated, then that latency will always be experienced when typing at some small time after the timeout expires but before the symbols update is complete. The real issue, then, is the performance of updating the symbols. I don't know what this is on your system, but I indicated <150ms for me on my system when testing one of our largest files. If the performance cannot be improved, then perhaps we should explore using a Qthread for it, so it does not interrupt the Editor.

@ccordoba12
Copy link
Member

ccordoba12 commented Sep 28, 2021

I could not tell what your issue was from the gifs that you posted. If you were experiencing latency, then I could not distinguish it from your typing cadence

The pauses you can see in the first gif are caused by latency in the editor, not because I paused while writing. I'm not the best typist, but I think it's clear that in the second gif I'm typing at a higher speed and with less errors. But that was because I was not seeing almost any delay on the text I was writing that distracted me.

I must however, point out that this tiered timeout paradigm does not really solve the problem.

We are perfectly aware of that. This is just the best we can do for now (see below).

The real issue, then, is the performance of updating the symbols. I don't know what this is on your system, but I indicated <150ms for me on my system when testing one of our largest files.

According to usability studies. if actions take longer than 175ms, people think something is broken or malfunctioning with your program. If we apply the same reasoning here, if people see delays closer to that, they'll probably think something isn't working properly.

If the performance cannot be improved, then perhaps we should explore using a Qthread for it, so it does not interrupt the Editor.

Ryan, I know you have the best intentions, but this is getting a bit tiresome because lately it seems you are just criticizing what we have without even asking what we've done to get here. Edgar and I have put a lot of time and effort thinking about how to solve this problem and trying different solutions (QThread's, regular Python threads, and Cython and Rust bindings).

In short, there's no thread nor binding-based solution that works in this case because of Python's GIL. Since our operations are CPU-bound (not IO-bound, where asyncio would be perfect), we need to develop a client-server architecture to process those operations in the server and only display the results in the client (i.e. the Spyder interface). That way our CPU intensive problem would become an IO intensive one. But that's something we don't have time for right now.

@ccordoba12
Copy link
Member

ccordoba12 commented Sep 28, 2021

According to usability studies. if actions take longer than 175ms, people think something is broken or malfunctioning with your program. If we apply the same reasoning here, if people see delays closer to that, they'll probably think something isn't working properly.

I know this is contradictory because the timeouts we have in place right now are huge compared to 175 ms for big files. However, I think this mostly impacts indent guides. That's because it takes you 1-2 seconds to move the cursor to the folding panel and we show a spinner to tell people when symbols are being updated.

@mrclary
Copy link
Contributor Author

mrclary commented Sep 28, 2021

Ryan, I know you have the best intentions, but this is getting a bit tiresome because lately it seems you are just criticizing what we have without even asking what we've done to get here. Edgar and I have put a lot of time and effort thinking about how to solve this problem and trying different solutions

Carlos and Edgar, I sincerely apologize if I seem critical of any of the work that you have done. Thank you for recognizing my good intentions, because I certainly did not intend to criticize anyone's work. You are correct that I did not inquire about what work has already been done; I apologize for that as well. Than you for your explanation; that helps me understand the problem better.

I will try to improve my communication with you and others in the Spyder team in the future so that I can better contribute to the project and avoid any misunderstandings. I certainly don't want to create any tension or bad feelings.

@ccordoba12
Copy link
Member

ccordoba12 commented Sep 28, 2021

Carlos and Edgar, I sincerely apologize if I seem critical of any of the work that you have done. Thank you for recognizing my good intentions, because I certainly did not intend to criticize anyone's work.

Thanks for your words Ryan! I really value your contributions and the effort you've put into make Spyder better in a lot of fronts. For instance, this PR is a huge improvement over what we had before.

I will try to improve my communication with you and others in the Spyder team in the future so that I can better contribute to the project and avoid any misunderstandings

Thanks! I think just asking or inquiring first, before assuming things up front, will make our conversations more fluid and pleasant.

Copy link
Member

@ccordoba12 ccordoba12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mrclary!

@ccordoba12 ccordoba12 merged commit d8d1696 into spyder-ide:5.x Sep 28, 2021
ccordoba12 added a commit that referenced this pull request Sep 28, 2021
@mrclary mrclary deleted the editor-latency branch September 28, 2021 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Typing latency in the editor due to folding
4 participants