Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

app-crash after find and replace in a large file #14630

Closed
JeroenNienhuis opened this issue Jan 24, 2024 · 26 comments
Closed

app-crash after find and replace in a large file #14630

JeroenNienhuis opened this issue Jan 24, 2024 · 26 comments
Assignees
Labels
crash issue causing N++ to crash performance issue

Comments

@JeroenNienhuis
Copy link

Description of the Issue

I have a csv file with over 500,000 lines (file size 50MB) and 20 comma's on each line. After a find and replace commas to semicolons NPP crashes after 5 minutes
(Windows notepad did the trick in 5 seconds)

Debug Information

Notepad++ v8.6.2 (32-bit)
Build time : Jan 14 2024 - 02:18:41
Path : C:\Program Files (x86)\Notepad++\notepad++.exe
Command Line : "D:\user\logins.txt"
Admin mode : ON
Local Conf mode : OFF
Cloud Config : OFF
OS Name : Windows 7 Professional (64-bit)
OS Build : 7601.0
Current ANSI codepage : 1252
Plugins :
ComparePlugin (2)
DSpellCheck (1.4.6)
JSMinNPP (1.1903)
mimeTools (3)
NppConverter (4.5)
NppExport (0.4)
NppTextFX (0.2.6)
PoorMansTSqlFormatterNppPlugin (1.6.10.1279)
XMLTools (2.4.11)

@alankilborn
Copy link
Contributor

Have you tried it with plugins removed?

@JeroenNienhuis
Copy link
Author

No

@JeroenNienhuis
Copy link
Author

on another computer it took little over 10 minutes to complete the find and replace without crash. The crash on my computer is probably an issue on my machine, but it would be an improvement if the replacement could be a lot faster.

debug info on the other computer
Notepad++ v8.5.4 (64-bit)
Build time : Jun 17 2023 - 20:42:45
Path : C:\Program Files\Notepad++\notepad++.exe
Command Line : "K:\uitpostgres.csv"
Admin mode : OFF
Local Conf mode : OFF
Cloud Config : OFF
OS Name : Windows 10 Pro (64-bit)
OS Version : 22H2
OS Build : 19045.3930
Current ANSI codepage : 1252
Plugins :
mimeTools (2.9)
NppConverter (4.5)
NppExport (0.4)

@alankilborn
Copy link
Contributor

The crash on my computer is probably an issue on my machine

As plugins are different on the two machines (and non-crashing machine has only the plugins that ship with Notepad++), it would be a valuable experiment to remove plugins from the crashing machine and see if crash repeats or not.


over 10 minutes to complete the find and replace

I wonder if Change History could be having an impact. Is it enabled? If so perhaps turn it off: Display Change History checkbox is in the Settings > Preferences > Margins/Border/Edge


What are the find/replace text, and the search mode?

@JeroenNienhuis
Copy link
Author

On the crashing machine
settings for find and replace:
afbeelding

Just tried it again twice without removing plugins and with the change history disabled. The first time it succeeded in less then 6 minutes. After running a second time without exiting the program first I again got a crash. Perhaps it is a memory issue.

On the non-crashing machine (also without removing plugins and with the change history disabled):
it was faster (more or less twice as fast), but not a very large improvement (Windows notepad is still 60 times as fast).

@JeroenNienhuis
Copy link
Author

another test on the crashing machine:
after program restart I dit 2 trials: the first was OK, then I freed memory before running the second trial. Now the second trial was also OK.
So the crash was caused by memory problems. A solution might be an occasional memory check when running an "expensive" operation to prevent a crash.

@alankilborn
Copy link
Contributor

then I freed memory

What did you do to achive this?

@JeroenNienhuis
Copy link
Author

close another program

@xomx
Copy link
Contributor

xomx commented Jan 24, 2024

I can confirm that this is a N++ issue, I tried:

  • fresh v8.6.2 x64 portable N++ has this problem
  • also some older N++ versions on my machine show this (even my oldest N++ in use x86 v6.9.2)
  • and the equivalent (N++ v8.6.2) Scintilla SciTe does not suffer from it, it is slower than the simple Windows Notepad, but it completes the task in a reasonable amount of time, not like N++

Here are some free csv-files for testing:
csv-files.zip

My N++ does not crash on them on my PC, but I had to terminate it from the task manager because it took too long to finish the replacement op.

Display Change History feature has a negative influence but it is not the main culprit here.
Here is a typical callstack when I "Break All" in the MSVS debugger during the issue (with Display Change History OFF):

npp-14630-callstack

@nyamatongwe
Copy link

The crash in the initial report is likely memory exhaustion from undo data. A 32-bit version of Notepad++ is used so there is probably a limit of 2 or 3 GB of address space that can be used.

@JeroenNienhuis: 500,000 lines (file size 50MB) and 20 comma's on each line

Thus, there are 10 million replaces with each being a deletion of a comma then an insertion of a semi-colon so 2 undo actions. A single basic 32-bit undo action is around 32 bytes so a total of around 640 MB is used which is a significant fraction of available address space. On top of that was the memory used for the change history feature. Windows Notepad implements undo differently and can only undo a single action although replace all is considered a single action.

For this example, the 64-bit version of Notepad++ is less likely to run into memory problems.

@xomx: had to terminate it from the task manager because it took too long to finish the replacement op. ... Here is a typical callstack

Notepad++, unlike SciTE, leaves most notifications enabled so each replace operation will send 5 notifications [BeforeDelete, DeleteText, InsertCheck, BeforeInsert, InsertText]. Command events are also enabled which sends a WM_COMMAND:EN_CHANGE for each notification. These may not all be needed but that depends on what features of Notepad++ use notifications. If extensions can receive notifications, then it is more difficult to turn them off with SCI_SETMODEVENTMASK for notifications and SCI_SETCOMMANDEVENTS for command events.

Using document map or document peeker creates secondary views attached to the document and each notification is sent to each view, doubling or tripling the number of notifications.

@JeroenNienhuis
Copy link
Author

memory exhaustion is clear.
It is possible to somehow speed up the find-and-replace or do I have to use Windows Notepad in case of e.g. a simple replacement in a huge file?

@xomx
Copy link
Contributor

xomx commented Jan 25, 2024

@JeroenNienhuis

It is possible to somehow speed up the find-and-replace

I am not conversant with that part of the N++ code, but at first look it seems ok to me:

case ProcessReplaceAll:
{
intptr_t replacedLength;
if (isRegExp)
replacedLength = pEditView->replaceTargetRegExMode(pTextReplace);
else
replacedLength = pEditView->replaceTarget(pTextReplace);
replaceDelta = replacedLength - foundTextLen;
break;
}

intptr_t ScintillaEditView::replaceTarget(const TCHAR * str2replace, intptr_t fromTargetPos, intptr_t toTargetPos) const
{
if (fromTargetPos != -1 || toTargetPos != -1)
{
execute(SCI_SETTARGETRANGE, fromTargetPos, toTargetPos);
}
WcharMbcsConvertor& wmc = WcharMbcsConvertor::getInstance();
size_t cp = execute(SCI_GETCODEPAGE);
const char *str2replaceA = wmc.wchar2char(str2replace, cp);
return execute(SCI_REPLACETARGET, static_cast<WPARAM>(-1), reinterpret_cast<LPARAM>(str2replaceA));
}

@donho
What do you think, is it possible by a some small change in the N++ Find, Replace, Mark, etc. design to help here?
I mean using some predefined size threshold for this kind of bulk ops, and if you exceeding it, temporarily disable the above mentioned Scintilla stuff before such bulk-op and reenable after...

@mpheath
Copy link
Contributor

mpheath commented Jan 25, 2024

This PythonScript 3 script disables undo collection, emptys undo buffer and disables change history.

from Npp import notepad

# Disable undo to prevent memory exhaustion.
if editor.getUndoCollection():
    editor.setUndoCollection(False)
    editor.setChangeHistory(0)
    editor.emptyUndoBuffer()
    notepad.messageBox('Undo is off')
else:
    editor.setUndoCollection(True)
    editor.setChangeHistory(3)
    notepad.messageBox('Undo is on')

Notepad++ memory (private working set) starts up at 77 MB with largest csv file loaded that @xomx provided.

Not using the script: 77,220 to 687,920
Using the script: 77,304 to 77,284

Still takes awhile to do the replacements though the memory saving is like no extra memory usage as it does not increase at all.

@JeroenNienhuis
Copy link
Author

If it is clear how a find and replace action can be greatly accelerated (not just twice. I prefer NPP, but I'm not going to wait for minutes if another program does it at least 10 times as fast).
An option might be to put a checkbox "fast replace" or "fast replace (without restore)" on the "replace" window. If checked specific functionality can be overruled for as long as the find and replace runs.

@donho donho self-assigned this Jan 25, 2024
@donho
Copy link
Member

donho commented Jan 25, 2024

@nyamatongwe
Thank you for checking into the code and the very detailed explantion.

@xomx

is it possible by a some small change in the N++ Find, Replace, Mark, etc. design to help here?
I mean using some predefined size threshold for this kind of bulk ops...

By bulk ops do you mean undo & change history or the notifications?

@xomx
Copy link
Contributor

xomx commented Jan 25, 2024

@donho

By bulk ops do you mean undo & change history or the notifications?

I mainly meant the notifications.
From the yesterday's debugging, it seems to me that the main culprit here are the excessive notifications.

For the change history feature might be enough a mention/switch in the N++ Preferences > Performance.

And about the undo - if it will be possible not to do it for every single replace-op, but at once, that will also be great.

@donho
Copy link
Member

donho commented Jan 25, 2024

@xomx

From the yesterday's debugging, it seems to me that the main culprit here are the excessive notifications.

Sounds good.
Let's solve the excessive notifications firstly. BTW, if the notification is really "excessive", why do we need the threshold ?
We might consider undo or change history, let's see in another PR.
If you want to work on it, I'll assign this issue to you. Otherwise I'll work on it.

@xomx
Copy link
Contributor

xomx commented Jan 25, 2024

@donho

Let's solve the excessive notifications firstly.

Ok.

if the notification is really "excessive", why do we need the threshold ?

Maybe I did not use the right words.
I am referring to this info:

Notepad++, unlike SciTE, leaves most notifications enabled so each replace operation will send 5 notifications [BeforeDelete, DeleteText, InsertCheck, BeforeInsert, InsertText]. Command events are also enabled which sends a WM_COMMAND:EN_CHANGE for each notification. These may not all be needed but that depends on what features of Notepad++ use notifications. If extensions can receive notifications, then it is more difficult to turn them off with SCI_SETMODEVENTMASK for notifications and SCI_SETCOMMANDEVENTS for command events.

Using document map or document peeker creates secondary views attached to the document and each notification is sent to each view, doubling or tripling the number of notifications.

For smaller files, let's say up to 500 KB, there is no need to worry about a limitation of that N++ rich notification functionality (in fact I have no idea if that rich notification functionality is really needed or what exactly will happen if we temporarily limit that, as I did not write / study relevant N++ code parts).

But for larger files, for such multirepetitive ops, they should be at least temporarily disabled if possible.

We might consider undo or change history, let's see in another PR.

Ok.

If you want to work on it, I'll assign this issue to you. Otherwise I'll work on it.

If you could, please assign yourself to this. Unfortunately I am short of time, I am not even able to finish my coding for other N++ issues, sorry about that :-(

@mpheath
Copy link
Contributor

mpheath commented Feb 4, 2024

Done some more testing with PythonScript using the same 26 MB CSV test file that xomx provided.

from Npp import editor, notepad

# Disable undo to prevent memory exhaustion.
if editor.getUndoCollection():
    editor.setUndoCollection(False)
    editor.emptyUndoBuffer()
    editor.setChangeHistory(0)
    notepad.messageBox('Undo is off')

    ModEventMask = editor.getModEventMask()
    # SC_MOD_NONE = 0
    editor.setModEventMask(0)
    editor.setCommandEvents(False)
    notepad.messageBox('Commands+EventMask is off')
else:
    editor.setUndoCollection(True)
    editor.setChangeHistory(3)
    notepad.messageBox('Undo is on')

    # Restore Mask (from getsetModEventMask 7FFFFF = SC_MODEVENTMASKALL)
    editor.setModEventMask(ModEventMask)
    editor.setCommandEvents(True)
    notepad.messageBox('Commands+EventMask is on')

Comment some code to get the variations.

undo ch events time mm:ss memory difference
1 1 1 1:23 610696
1 0 1 1:09 338228
1 1 0 0:18 610816
1 0 0 0:06 337220
0 0 0 0:05 0

1 = on, 0 = off

@xomx
Copy link
Contributor

xomx commented Feb 4, 2024

@mpheath

1 | 1 | 0 | 0:18 | 610816

So it is confirmed that the notifications are causing the biggest lag, right?

@mpheath
Copy link
Contributor

mpheath commented Feb 5, 2024

@xomx Correct! Notifications (aka events in the table header) are causing the biggest lag in speed. Turning notifications off makes testing more differcult as it can finish quicker then I get to note the time from the systray so I had to be quick to check.

What you quoted is what SciTE uses during Find and Replace which takes 18 seconds and uses all extra memory needed for Undo and Change History. The 2nd last row is what you proposed for large files in performance settings by disabling Change History which takes 6 seconds and halves the extra memory needed.

Disabling Undo for large files could be an option to disable though may make editing the files quite risky though memory use is 0, so is the most efficient. Super large files might want an Undo option to be disabled to prevent the larger extra memory needed.

donho added a commit to donho/notepad-plus-plus that referenced this issue Feb 6, 2024
@donho
Copy link
Member

donho commented Feb 6, 2024

@JeroenNienhuis
Could you test PR #14685 to see if it fixes your crash?

@chcg chcg added performance issue crash issue causing N++ to crash labels Feb 6, 2024
@JeroenNienhuis
Copy link
Author

where can I find the Windows build?

@alankilborn
Copy link
Contributor

where can I find the Windows build?

https://github.com/notepad-plus-plus/notepad-plus-plus/actions/runs/7804509442

@JeroenNienhuis
Copy link
Author

If I use the Win32 version without closing another program first, there is still an app crash (just like the current production version).
After freeing memory bij closing a program, my job (11M replacements) is finished in 1.5 minutes. Without saving the result and without closing N++ executing another replacement job on the same file gives as a result 11M replacements in just 15 secs. Checking the file shows that only the first 8% of the items were replaced.

The x64 behaves as expected. All items are replaces, also the second time. It is now much faster.
(next time I will install the x64 version instead of the default update)

@donho
Copy link
Member

donho commented Feb 8, 2024

@JeroenNienhuis Thank you for the confirmation.

@donho donho closed this as completed in 044296e Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crash issue causing N++ to crash performance issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants