Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory #106

Open
hershkoy opened this issue Jul 7, 2021 · 16 comments
Open

Out of memory #106

hershkoy opened this issue Jul 7, 2021 · 16 comments

Comments

@hershkoy
Copy link

hershkoy commented Jul 7, 2021

Very weird error, happening only in browser, not in cli
using v0.8.23

[Wed Jul 07 14:25:39.585044 2021] [:error] [pid 2736:tid 2684] [client 172.18.15.20:57062] PHP Fatal error:  Out of memory (allocated 58982400) (tried to allocate 34821771 bytes) in .....\\vendor\\shuchkin\\simplexlsx\\src\\SimpleXLSX.php on line 537

I checked php memory_limit it is 128M both for php cli and php browser.

Computer has 16Gb ram, and is 25% used.
I downloaded the xlsx, run it on my computer and it worked even in browser. Xlsx file is 2.5Mb, has about 45000 lines and only one sheet.

As far as I can understand, this is the line where it happens:

$entry_xml = preg_replace( '/xmlns[^=]*="[^"]*"/i', '', $entry_xml ); // remove namespaces

I printed the variable to a file before it crashes, and it generated a 17Mb file, all in one line, so the preg_replace above is trying to operate on that large string and crashes.

I don't know why it works on my computer, but not on the other one, and why in the other one it works in cli and not in browser.

Unfortunately xlsx file contains sensitive info, I can't share it, and right now I don't have a reproducible file to share.

@shuchkin
Copy link
Owner

shuchkin commented Jul 7, 2021

I've tested "long string" vs "array" version of cleanup xml code, there 3x memory overhead in array version

@hershkoy
Copy link
Author

hershkoy commented Jul 8, 2021

For what I needed to do, I ended up asking the client to use CSV file instead of XLSX which solves my immediate issue. So I don't have to fix this issue.
I just wonder what is the cause of the issue. On my computer, where it didn't crash, I watched task manager when the code is running, and barely saw an increase in ram usage. In addition, both my computer and the server have 16Gb ram, but my computer is running +90% ram usage most of the time, while the server is at 25%.
I think that the issue is bug in preg_replace, maybe something to do with php version (7.2 my computer, 5.6 server)

I don't know if you want to debug it further. We can close this issue if you wish

@nikolayganovski1
Copy link

nikolayganovski1 commented Oct 16, 2021

same problem here:
Fatal error: Out of memory (allocated 183738368) (tried to allocate 36137192 bytes) in [hiddden]/vendor/shuchkin/simplexlsx/src/SimpleXLSX.php on line 538
File is xlsm format and it is around 20MB of size(4-5 sheets of data).
php 7.4 running with 2GB memory limit

@shuchkin
Copy link
Owner

check your memory_limit there not 2GB

@nikolayganovski1
Copy link

@shuchkin it is correct, memory_limit in phpinfo() shows 2G

@nikolayganovski1
Copy link

Okay, after more testing, I've managed to see that on 39000 rows it crashes, and when I cut 5000 rows, now with 34000 rows it works fine.
Is there some way to chunk-parse the rows or something?

@shuchkin
Copy link
Owner

I think after preg_replace garbage collector is not called

try this version https://github.com/shuchkin/simplexlsx/blob/master/src/SimpleXLSX.php

@nikolayganovski1
Copy link

nikolayganovski1 commented Oct 16, 2021

nope, still the same errors:
<b>Fatal error</b>: Out of memory (allocated 110964736) (tried to allocate 20480 bytes) in <b>[hidden]/src/SimpleXLSX.php</b> on line 650
<b>Fatal error</b>: Out of memory (allocated 110964736) (tried to allocate 20480 bytes) in <b>[hidden]/wp-includes/functions.php</b> on line 4068
the second error is WP trying to fill up $errors[] variable.
looks like this row is where it breaks:
$rows[ $curR ][ $curC ] = $this->value( $c );
also, printed the peak memory usage(before my loop foreach ($xlsx->rows(0) as $value) { ) :

not real: 205.25686645508 MiB
real: 187.19140625 MiB

@nikolayganovski1
Copy link

@shuchkin any ideas?

@shuchkin
Copy link
Owner

try 1.0.12
foreach( $xlsx->readRows() as $r ) {
//
}
readRows() & readRowsEx() return Generator to read huge files in iteration mode

@fordero21
Copy link

Apologies for chiming in late but I'm experiencing the same issue and have done for a while now. From what I've researched the issue is how excel determines the final row of the spreadsheet. Excel seems to add additional blank rows without logic or reason.

If you open your spreadsheet that fails, highlight the worksheet (CRTL+A or CMD+A) you'll notice that the select all continues way passed any data. Look at the scroll bar for reference as you scroll!!! If you open the file in BBE edit or similar you can see thousands of empty rows in the content.

When you open the spreadsheet for processing the code throws a fatal exception at:

Line 608 - getEntryData - $entry['data'] = gzinflate($entry['data']);

Using the attached spreadsheet as an example, strlen($entry['data']) hits 18174506! So when you try to inflate the content the server runs out of memory (128MB). FYI spreadsheet size is clearly not 18.2MB from the small amount of data in it.

The workaround I've used that works 100% of the time is to manually highlight the data cells and copy to a brand new spreadsheet.

The only way I can think of fixing this is in code is to pre-parse the spreadsheet, reading in a binary stream of bytes removing the empty rows before loading the file into memory...?

See: https://stackoverflow.com/questions/11265914/how-can-i-extract-or-uncompress-gzip-file-using-php

example-out-of-memory.xlsx

@Vladimir1247
Copy link

Vladimir1247 commented Jul 13, 2022

Hi! I have a similar problem on server: "Allowed memory size of 134217728 bytes exhausted (tried to allocate 67510970 bytes) at vendor/shuchkin/simplexlsx/src/SimpleXLSX.php: 608". Locally everything is fine, but local memory is not limited.
123

@shuchkin
Copy link
Owner

memory_limit

@Vladimir1247
Copy link

Vladimir1247 commented Jul 14, 2022

VDS/VPS. 128 Mbyte - is the entire memory.
[upd]
I was able to get an additional 128 MB of memory, as a result, another problem with insufficient memory came out:
123

@shuchkin
Copy link
Owner

try comment this line, but results can be unpredictable

@Vladimir1247
Copy link

Vladimir1247 commented Jul 14, 2022

Of course, I already use readRows(), because the files can be very large (up to ~40 MB) - all the results are with it :/
[upd]
Is it possible to parse not the whole book, but to get the number of sheets and retrieve them separately?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants