Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringTreeBuilder would be improved if it didn't need to hold an entire copy of the file in memory #106

Closed
frizbog opened this issue Jul 2, 2016 · 6 comments
Assignees

Comments

@frizbog
Copy link
Owner

frizbog commented Jul 2, 2016

The makeStringTreeFromFlatLines() method of StringTreeBuilder take as a parameter a List containing all the lines of the file being read. This consumes a lot of memory unnecessarily (at least temporarily) while the file is parsed. If GedcomFileParser could support returning a BufferedReader object (in addition to getting the full list of file lines as Strings) that would return a line of the file at a time, StringTreeBuilder could be modified to use that reader and only a line at a time would need to be in temporary memory instead of the entire file's contents....which would make a huge difference in heap consumption, obviously.

@frizbog frizbog self-assigned this Jul 2, 2016
frizbog pushed a commit that referenced this issue Jul 3, 2016
frizbog pushed a commit that referenced this issue Jul 3, 2016
frizbog pushed a commit that referenced this issue Jul 3, 2016
frizbog pushed a commit that referenced this issue Jul 3, 2016
frizbog pushed a commit that referenced this issue Jul 3, 2016
The parser can read a line at a time now and add it to the StringTree
being built in StringTreeBuilder, and does not require an
ArrayList<String> holding the entire contents of the file in memory
(even temporarily).
frizbog pushed a commit that referenced this issue Jul 3, 2016
And now GedcomFileReader no longer offers an option to read the whole
file into an ArrayList of strings - if you want that, build your own
ArrayList.
@frizbog
Copy link
Owner Author

frizbog commented Jul 3, 2016

v2.3.1-SNAPSHOT at 2016-07-03T12:12:33-04:00 includes this code.

@haralduna
Copy link

haralduna commented Jul 4, 2016

Thank you, I tried your latest version 6d73abd and it works very well.
I am attaching two diagrams showing memory usage over time while reading my 8000 person file. One showing the clean 2.3.0 and one showing your latest 2.3.1-SNAPSHOT. The steeper curve is the parsing part. The last smooth slowly increasing curve is my post processing. The operation takes about 8 second longer on your latest version (I am not sure that is a big issue, avoiding peek allocation is more important.)

2 3 0-clean
2 3 1-snapshot-6d73abd

@frizbog
Copy link
Owner Author

frizbog commented Jul 4, 2016

Harald - thanks so much for the confirmation and the excellent graphs!

@haralduna
Copy link

haralduna commented Jul 6, 2016

Found a little flaw in the parser rate handling. A patch is attached. (patch removed)

@haralduna
Copy link

Same patch with fixed test cases.
0001-fixed-parser-progress-rate.patch.txt

frizbog pushed a commit that referenced this issue Jul 7, 2016
Thanks to Harald Undander for the patch
frizbog pushed a commit that referenced this issue Jul 7, 2016
frizbog pushed a commit that referenced this issue Jul 7, 2016
Thanks to Harald Undander for the patch
frizbog pushed a commit that referenced this issue Jul 7, 2016
@frizbog
Copy link
Owner Author

frizbog commented Jul 8, 2016

Released in v2.3.1

@frizbog frizbog closed this as completed Jul 8, 2016
haralduna pushed a commit to haralduna/gedcom4j that referenced this issue Jun 11, 2017
haralduna pushed a commit to haralduna/gedcom4j that referenced this issue Jun 11, 2017
haralduna pushed a commit to haralduna/gedcom4j that referenced this issue Jun 11, 2017
haralduna pushed a commit to haralduna/gedcom4j that referenced this issue Jun 11, 2017
haralduna pushed a commit to haralduna/gedcom4j that referenced this issue Jun 11, 2017
haralduna pushed a commit to haralduna/gedcom4j that referenced this issue Jun 11, 2017
haralduna pushed a commit to haralduna/gedcom4j that referenced this issue Jun 11, 2017
haralduna pushed a commit to haralduna/gedcom4j that referenced this issue Jun 11, 2017
haralduna pushed a commit to haralduna/gedcom4j that referenced this issue Jun 11, 2017
The parser can read a line at a time now and add it to the StringTree
being built in StringTreeBuilder, and does not require an
ArrayList<String> holding the entire contents of the file in memory
(even temporarily).
haralduna pushed a commit to haralduna/gedcom4j that referenced this issue Jun 11, 2017
…ings

And now GedcomFileReader no longer offers an option to read the whole
file into an ArrayList of strings - if you want that, build your own
ArrayList.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants