-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More performance optimization for the Remove Empty Lines command #12544
More performance optimization for the Remove Empty Lines command #12544
Conversation
@ArkadiuszMichalski |
@donho On #12548 I'm trying to change the behavior of this command (ie how to treat the last empty line when doing a selection, how to set selections after doing operation, etc.). So it could be accepted separately (if this new approach is better than the current one), so then in case of any problems in the future it would be easier rejected. But the performance improvement from this PR will be preserved. |
Could you provide any test (file + instructions) to prove this PR has optimized the performance? |
But #12548 is related to other command This PR does not have a dedicated issue. There was one #12462, but you accepted PR #12512 too quickly (while the discussion/work was still ongoing). In the case of tests: For the first improve you can play with https://regex101.com/ (PCRE or PCRE2) and file for test regex.txt. You can see that this new regex are shorter and faster (takes less time and do less steps). For empty line:
For empty line with white space:
For second improving
This can be observed on larger files. We don't once again scan all data (all lines) to remove potentially last. We check directly only last, so basically it's instant. |
Sorry, I meant "close #12535"
We cannot rely on https://regex101.com/ because we don't know about their PCRE implementation. Even they use the same version of boost PCRE in their engine, it's not the same binary code as in Notepad++. So as far as I don't have the tangible proof that this PR improves the performance, the PR cannot be accepted. |
I know this is not the same implementation as in Notepad++. But it makes the presentation easier why these new Regex are more efficient. We don't use capturing group (which is always more efficient), we grab more data in one matching. Hypothetical simple test case:
This new regex In the worst case, when consecutive empty lines differ (they have different white space, for example as above), the last accepted PR doesn't improve anything, but this one does.
Well, but it's necessary to make a file with 1 million lines to prove it (and add time logging)? Please consider this practice case (use only Notepad++):
This old one (4 matches) OK, I have a bigger files for testing: I can also make clearer example for improving |
May I ask why? |
I didn't say he does not do a good job. |
#12535 started as an optimization PR but turned into fixing some issues in "Line Operations" commands. Thank you. |
Thank you for your explanation @Yaron10 . |
@donho #12544 (this PR) - is only for performance optimization for these two commands (better than the recently approved PR). #12535 - that PR change behavior for selection and last empty line for this two commands. I don't know if this change will be accepted (is expected), so I did it separately. It has optimizations from #12544, because I assume it will be accepted after all. But if not then I will revert to the previous approach, keeping the changes for selection and last empty line made following the discussion. What's wrong with the title It all depends on what will be done with this PR. I added a file to test #12544 (comment), so if you need anything else (additional files) let me know. |
I found regular expressions even more efficient than the last ones (and simpler).
This can be checked with https://regex101.com/ (PCRE) and file for test regex.txt.
I also play with this
// remove the last line if it's an empty line.
with some big file. There is no point to checking the entire file when we want to remove the last blank line. We need only check the last two (or one) lines, which can be done directly with Scintilla commands. From now on, this last empty line removal will not slow as the multi-line document grows.This PR also cover two more cases:
This is just a performance improvement. In particular, it doesn't change the behavior of how to treat the last EOL with or without selection. It works as it has been until now. I suggest discussing changing this behavior elsewhere (#12545, #12535).