Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.OutOfMemoryError: Java heap space #24

Closed
lbyzx123 opened this issue Jul 16, 2020 · 14 comments
Closed

java.lang.OutOfMemoryError: Java heap space #24

lbyzx123 opened this issue Jul 16, 2020 · 14 comments

Comments

@lbyzx123
Copy link

Maven dependency:

com.github.miachm.sods
SODS
1.2.2

Memory overflow when the memory is less than 280m. Stack info:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.github.miachm.sods.Range.getValues(Range.java:231)
at com.ams.module.transfer.util.ODSParserUtils.getHeaderFromSheet(ODSParserUtils.java:92)
at com.ams.module.transfer.util.ODSParserUtils.getDataFromSheet(ODSParserUtils.java:111)
at com.ams.module.transfer.util.ODSParserUtils.getDataFromSpreadsheet(ODSParserUtils.java:68)
at com.ams.module.transfer.util.ODSParserUtils.main(ODSParserUtils.java:145)

ODS file size is only 25KB, only 50 lines of data. Why does it take so much memory to read files.

@lbyzx123
Copy link
Author

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at com.github.miachm.sods.Sheet.checkLazyColumnLoading(Sheet.java:318)
at com.github.miachm.sods.Sheet.getCell(Sheet.java:297)
at com.github.miachm.sods.Range.iterateRange(Range.java:625)
at com.github.miachm.sods.Range.getValues(Range.java:232)
at com.ams.test.ODSParserHelper.getDataFromSheet(ODSParserHelper.java:58)
at com.ams.test.ODSParserHelper.getDataFromSpreadsheet(ODSParserHelper.java:85)
at com.ams.test.ODSParserHelper.main(ODSParserHelper.java:96)

@lbyzx123
Copy link
Author

My ODS file download link:https://github.com/lbyzx123/SODS/blob/master/alarm_config_rule_office.ods,edited by office 2010.

@lbyzx123
Copy link
Author

Memory overflow when VM arguments :-Xms270m -Xmx270m, it works well when VM arguments :-Xms280m -Xmx280m

@miachm
Copy link
Owner

miachm commented Jul 16, 2020

Why does it take so much memory to read files.

SODS stores the sheets in a matrix. That's the simplest way of store a sheet and it's pretty efficient since they are usually quite compact.

For some reason, Microsoft Excel declares ridiculous amount of rows/columns for his sheets. Since there is not an easy (and efficient) way to determine which it is the right size, it is trimmed to a considered safe value (columns are trimmed to 2000, rows allocations greater than 10,000 are trimmed to a more reasonable value).

This Spreadsheet might look quite simple, but SODS is storing three sheets of a size ~400x2000. There are like half million values (400x2000), plus the 400 rows properties, plus the 2000 column properties multiplied by the 3 sheets of the Spreadsheet. So his memory footprint can take up to 100-200 MB.

I know this can be an inconvenience. But there is not an easy way to solve this problem without increasing the loading time or changing the sheet's data architecture.

@TheCoder42
Copy link

TheCoder42 commented Jan 5, 2021

@miachm I have similar problems. I want to load this file. It is about 10 MBs large. (Note, the link is not stable, if it breaks the file can also be found here.) The file has four sheets. The first sheet combines the data from the last three sheets and has more than 35000 rows. Combining the rows of all four sheets this makes more than 70000 rows. The content.xml is about 400 MB large.

new SpreadSheet(new File("my path.ods"))

used in my tests 6,1 GB. This makes it hard to use it in production projects.

I have a few ideas how memory usage could be reduced:

  • When loading the spreadsheet allow to specify which sheet should be loaded. This would probably solve my issue, because instead of loading 70000 rows this could be cut down to about 17000 rows. I would then process the sheets one by one. For example new SpreadSheet(new File("my path.ods"), "sheetname")
  • I profiled heap usage and the Style class used 43 % of the used head. Followed by 36 % of instances of Cell. So reducing memory usage in those class would be most efficient.
    • in my use case I am even not interested in any styles. So for people like me who want to process large files a flag to disable reading styles would be nice.
    • Another possibility could be caching. Instead of instantiating a style for each cell, a style could be reused and when style for a cell is change a new style is created in cache only when there isn't a style like this present. (But then care must be taken, that the cache is freed when the style is not needed anymore.) I think this approach is most challenging.
    • Introduce a static "default" style in the Style class and let cells use this default style when there is not really a style used. I noticed that the api offers a clone() method. I don't know how frequent this method is used and if that defeats this idea.

Anyway, I am neither into the internals of your library nor into the ods file format. My problem is, that I don't see any alternative to using SODS. And actually I like your api. It is simple and straight forward.

Cheers

EDIT: I posted here even though I don't think that the cause here is a different one (= I am not sure, if that file was edited in excel)

@miachm
Copy link
Owner

miachm commented Jan 7, 2021

Thanks for your comment! I am in my mobile phone, so I have not been able to check the file yet.

Currently, SODS is storing a sheet as a matrix of Cells. Every cell contains a value and a Style. A Style contains many information (bold, underline, font size...).

This design is pretty simple and straightforward, but it looks really ineffective in memory usage. Specially if you have many rows of plain data, with not meaningful style associated.

A redesign to a new architecture is in the road map, but still I can not find a new one.

About the workarounds you are proposing:

  • That could help you with you use case. But it's a obscure feature... Also, it's not really a solution since your memory footprint would be 3GB now, better but still high
  • Loading only the values could decrease the loading time, but I don't think it's going to help you with the memory usage. Style classes are still instanciated, in case you might need to set the styles.
  • I liked it, still need some testing in order to know how much memory could be saved.
  • Catching the default style is a good idea as well. Although it can be implemented with the previous idea.

I have to think about it 🤔

Greetings,
Miguel

@leonhung
Copy link

leonhung commented Jun 30, 2021

hi miachm ,
i have similar problem "java.lang.OutOfMemoryError: GC overhead limit exceeded"...
when I call :
SpreadSheet spread = new SpreadSheet(new File("Base.ods"));

my ods file is here it about 10MB

please tell me what problem is my file ?
or just file too big to parse.
( I tried delete some sheet, it's work!)

Thank s for your kind cooperation !

@miachm
Copy link
Owner

miachm commented Jun 30, 2021

@leonhung The link doesn´t work (permissions).

Anyway, the ods file was generated by Microsoft Excel?

@leonhung
Copy link

leonhung commented Jul 1, 2021

@miachm
I'm sorry that I can't provide this file now,
but the file was created by Libreoffice 7.xx
if I can hide some sensititive data, I'll provide this file ,
thank you.

@miachm
Copy link
Owner

miachm commented Jul 4, 2021

It seems like caching the default style as singletone reduces signficantly the memory usage. For the first sheet, it reduces the memory usage by 40%.

Even though, the memory usage is too high for my taste.

miachm added a commit that referenced this issue Jul 4, 2021
Reduces the memory usage for issue #24
miachm added a commit that referenced this issue Jul 4, 2021
@miachm
Copy link
Owner

miachm commented Jul 4, 2021

@leonhung @lbyzx123 @TheCoder42 should take significantly less memory now (release v1.4.0). In the case of example:
https://www.gov.uk/government/publications/the-uk-sanctions-list

It reduces his memory footprint around 75%. Let me know if you find more edge cases.

@miachm miachm removed the help wanted label Jul 4, 2021
@leonhung
Copy link

leonhung commented Jul 7, 2021

@miachm
In my case it still throw exception:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.github.miachm.sods.Style.clone(Style.java:274)
at com.github.miachm.sods.Cell.setStyle(Cell.java:57)
at com.github.miachm.sods.Range.lambda$setStyle$31(Range.java:743)
at com.github.miachm.sods.Range$$Lambda$1/801197928.call(Unknown Source)
...
but it's different from previous edition 1.3 is ,1.4 will take about 10 minutes to read file and throw exception.
thanks for your revision !!!

@miachm
Copy link
Owner

miachm commented Jul 7, 2021 via email

@leonhung
Copy link

@miachm
file is here, please try it, thank you

@miachm miachm closed this as completed Sep 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants