StringTreeBuilder would be improved if it didn't need to hold an entire copy of the file in memory #106

frizbog · 2016-07-02T15:14:51Z

The makeStringTreeFromFlatLines() method of StringTreeBuilder take as a parameter a List containing all the lines of the file being read. This consumes a lot of memory unnecessarily (at least temporarily) while the file is parsed. If GedcomFileParser could support returning a BufferedReader object (in addition to getting the full list of file lines as Strings) that would return a line of the file at a time, StringTreeBuilder could be modified to use that reader and only a line at a time would need to be in temporary memory instead of the entire file's contents....which would make a huge difference in heap consumption, obviously.

The parser can read a line at a time now and add it to the StringTree being built in StringTreeBuilder, and does not require an ArrayList<String> holding the entire contents of the file in memory (even temporarily).

And now GedcomFileReader no longer offers an option to read the whole file into an ArrayList of strings - if you want that, build your own ArrayList.

frizbog · 2016-07-03T16:12:58Z

v2.3.1-SNAPSHOT at 2016-07-03T12:12:33-04:00 includes this code.

haralduna · 2016-07-04T12:07:45Z

Thank you, I tried your latest version 6d73abd and it works very well.
I am attaching two diagrams showing memory usage over time while reading my 8000 person file. One showing the clean 2.3.0 and one showing your latest 2.3.1-SNAPSHOT. The steeper curve is the parsing part. The last smooth slowly increasing curve is my post processing. The operation takes about 8 second longer on your latest version (I am not sure that is a big issue, avoiding peek allocation is more important.)

frizbog · 2016-07-04T18:13:13Z

Harald - thanks so much for the confirmation and the excellent graphs!

haralduna · 2016-07-06T17:04:24Z

Found a little flaw in the parser rate handling. A patch is attached. (patch removed)

haralduna · 2016-07-07T08:20:50Z

Same patch with fixed test cases.
0001-fixed-parser-progress-rate.patch.txt

Thanks to Harald Undander for the patch

This reverts commit 4367875.

Thanks to Harald Undander for the patch

This reverts commit 4367875.

frizbog · 2016-07-08T12:23:48Z

Released in v2.3.1

…og#106.

…s pass

…ngle line

The parser can read a line at a time now and add it to the StringTree being built in StringTreeBuilder, and does not require an ArrayList<String> holding the entire contents of the file in memory (even temporarily).

…ings And now GedcomFileReader no longer offers an option to read the whole file into an ArrayList of strings - if you want that, build your own ArrayList.

frizbog added the enhancement label Jul 2, 2016

frizbog self-assigned this Jul 2, 2016

frizbog pushed a commit that referenced this issue Jul 2, 2016

Trying to further cut down on memory consumption. Opening issue #106.

44ce70d

frizbog pushed a commit that referenced this issue Jul 3, 2016

Issue #106 - First chop at AnselReader

3dfa8f5

frizbog pushed a commit that referenced this issue Jul 3, 2016

Issue #106 - First chop at AsciiReader

197638c

frizbog pushed a commit that referenced this issue Jul 3, 2016

Issue #106 - checkpoint - clean compile, but tests don't pass yet

3a74a0c

frizbog pushed a commit that referenced this issue Jul 3, 2016

Issue #106: Fixing tests, code

e6ce913

frizbog pushed a commit that referenced this issue Jul 3, 2016

Issue #106 - all but one test passes

d66c9e6

frizbog pushed a commit that referenced this issue Jul 3, 2016

Issue #106: All encoding specific readers read lines now, tests pass

587936a

frizbog pushed a commit that referenced this issue Jul 3, 2016

Issue #106 StringTreeBuilder now has/uses a method to add a single line

2ed14d6

frizbog pushed a commit that referenced this issue Jul 3, 2016

Issue #106 - GedcomFileReader doesn't return ArrayLists of Strings

ddaff5a

And now GedcomFileReader no longer offers an option to read the whole file into an ArrayList of strings - if you want that, build your own ArrayList.

frizbog added the fixed_pending_release label Jul 3, 2016

frizbog mentioned this issue Jul 3, 2016

Added progress reporting for parsing/LinePieces #107

Closed

frizbog pushed a commit that referenced this issue Jul 7, 2016

Issue #106 - corrected notification rate handling

4367875

Thanks to Harald Undander for the patch

frizbog pushed a commit that referenced this issue Jul 7, 2016

Revert "Issue #106 - corrected notification rate handling"

868b7f0

This reverts commit 4367875.

frizbog pushed a commit that referenced this issue Jul 7, 2016

Issue #106 - correcting notification handling, under proper account

af4357e

frizbog pushed a commit that referenced this issue Jul 7, 2016

Issue #106 - corrected notification rate handling

953bf3b

Thanks to Harald Undander for the patch

frizbog pushed a commit that referenced this issue Jul 7, 2016

Revert "Issue #106 - corrected notification rate handling"

24e7a74

This reverts commit 4367875.

frizbog pushed a commit that referenced this issue Jul 7, 2016

Issue #106 - correcting notification handling, under proper account

5c9f89c

frizbog closed this as completed Jul 8, 2016

frizbog removed the fixed_pending_release label Jul 8, 2016

haralduna pushed a commit to haralduna/gedcom4j that referenced this issue Jun 11, 2017

Trying to further cut down on memory consumption. Opening issue frizb…

c309f22

…og#106.

haralduna pushed a commit to haralduna/gedcom4j that referenced this issue Jun 11, 2017

Issue frizbog#106 - First chop at AnselReader

702ba8c

haralduna pushed a commit to haralduna/gedcom4j that referenced this issue Jun 11, 2017

Issue frizbog#106 - First chop at AsciiReader

a12f836

haralduna pushed a commit to haralduna/gedcom4j that referenced this issue Jun 11, 2017

Issue frizbog#106 - checkpoint - clean compile, but tests don't pass yet

abaccde

haralduna pushed a commit to haralduna/gedcom4j that referenced this issue Jun 11, 2017

Issue frizbog#106: Fixing tests, code

6f67381

haralduna pushed a commit to haralduna/gedcom4j that referenced this issue Jun 11, 2017

Issue frizbog#106 - all but one test passes

c3a143b

haralduna pushed a commit to haralduna/gedcom4j that referenced this issue Jun 11, 2017

Issue frizbog#106: All encoding specific readers read lines now, test…

2356243

…s pass

haralduna pushed a commit to haralduna/gedcom4j that referenced this issue Jun 11, 2017

Issue frizbog#106 StringTreeBuilder now has/uses a method to add a si…

536a0cb

…ngle line

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StringTreeBuilder would be improved if it didn't need to hold an entire copy of the file in memory #106

StringTreeBuilder would be improved if it didn't need to hold an entire copy of the file in memory #106

frizbog commented Jul 2, 2016 •

edited

frizbog commented Jul 3, 2016

haralduna commented Jul 4, 2016 •

edited

frizbog commented Jul 4, 2016

haralduna commented Jul 6, 2016 •

edited

haralduna commented Jul 7, 2016

frizbog commented Jul 8, 2016

StringTreeBuilder would be improved if it didn't need to hold an entire copy of the file in memory #106

StringTreeBuilder would be improved if it didn't need to hold an entire copy of the file in memory #106

Comments

frizbog commented Jul 2, 2016 • edited

frizbog commented Jul 3, 2016

haralduna commented Jul 4, 2016 • edited

frizbog commented Jul 4, 2016

haralduna commented Jul 6, 2016 • edited

haralduna commented Jul 7, 2016

frizbog commented Jul 8, 2016

frizbog commented Jul 2, 2016 •

edited

haralduna commented Jul 4, 2016 •

edited

haralduna commented Jul 6, 2016 •

edited