Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory/performance #65

Open
mlooney opened this issue May 26, 2015 · 24 comments
Open

Memory/performance #65

mlooney opened this issue May 26, 2015 · 24 comments
Labels

Comments

@mlooney
Copy link

mlooney commented May 26, 2015

Hi there,

I've been doing some benchmarking and am puzzled by the memory footprint i'm seeing. Do you expect MRI to use up 2.5+gb of RAM to parse (not pp) a 9mb SC832 doc, with run times well over 4min?

Seems a little... excessive. Is there a trick I'm missing?

@kputnam
Copy link
Owner

kputnam commented May 27, 2015

Hi McClain,

Yes that does seem strange. It has been quite a while since I've been involved with X12 and I won't have time to investigate, but one thing that comes to mind is perhaps you have more than one SC832 in your file.

If that is the case, I think there is a way to avoid building the entire tree at once. I think there is some sample code (examples/split.rb iirc) that might help with that.

Otherwise if all 9MB are for a single transaction set, we'd probably need to look at a profiler to see what's going on. It's been long enough that I've forgotten what input sizes were "normal" in my usage of the library.

-- Kyle

On May 26, 2015, at 1:47 PM, McClain Looney notifications@github.com wrote:

Hi there,

I've been doing some benchmarking and am puzzled by the memory footprint i'm seeing. Do you expect MRI to use up 2.5+gb of RAM to parse (not pp) a 9mb SC832 doc, with run times well over 4min?

Seems a little... excessive. Is there a trick I'm missing?


Reply to this email directly or view it on GitHub.

@mlooney
Copy link
Author

mlooney commented Jun 2, 2015

It is indeed a document with about 20 832s containing about 5000 line items for a large manufacturer (all skus for an entire line), but is actual data I have to deal with some day.

Turning off GC brings the runtime down to a minute or so, but the heap usage clearly pays the price, so I'd hazard to guess that it's heap thrash generally.

Having dealt with other EDI parsing libs for a minute, it seems that parsing large docs into huge object graphs seems to be a common technique, but doesn't scale particularly well, even with objectively tiny inputs (9mb?!)

Thanks for the pointer to split.rb, i'll take a look!

@johnslattery
Copy link

Hi,

I'm seeing something I think is at least similar to what McClain is seeing. Parsing a nearly 5 MB 837 file eventually consumes over 800 MB of RAM. What's more concerning is what I see when I parse 90 or so 837s, varying in size from a few KB up to 4 or 5 MB, in a go. The ram consumed grows fairly steadily to nearly 2 GB by the time all is said and done.

I took a look at notes/split.rb but don't think the approach is going to be helpful as you must parse a file before you can split it and parsing seems to be where all of the RAM is going.

Thanks.

John

@kputnam
Copy link
Owner

kputnam commented Jun 24, 2016

Yeah, now that I look at split.rb, you are right. It parses the whole file before splitting it up! If either of you (@johnslattery or @mlooney) are able, please email me the file that's causing the problem so I can try to reproduce it.

I'm hoping there is an obvious memory leak that I can find with a profiler. One possible cause is creating new versions of sections of the parse tree as they are being updated; perhaps these are not being garbage collected. Otherwise, I think we might need a different API to enable incremental consumption, rather than build the entire parse tree; building that that could take a while.

In the meantime, one chintzy hack you could try is to split large files up using regular expressions. You'll need the output to begin and end with the original header (ISA, GS, etc) and footer (GE, IEA) but you can then parse each separate ST ... SE section separately. Sorry, it sucks :-\

@johnslattery
Copy link

Thanks for looking at this, Kyle. I'm not able to provide any files; all I
have are production 837s. Sorry.

On Fri, Jun 24, 2016 at 10:06 AM, Kvle Putnam notifications@github.com
wrote:

Yeah, now that I look at split.rb, you are right. It parses the whole
file before splitting it up! If either of you (@johnslattery
https://github.com/johnslattery or @mlooney https://github.com/mlooney)
are able, please email me the file that's causing the problem so I can try
to reproduce it.

I'm hoping there is an obvious memory leak that I can find with a
profiler. One possible cause is creating new versions of sections of the
parse tree as they are being updated; perhaps these are not being garbage
collected. Otherwise, I think we might need a different API to enable
incremental consumption, rather than build the entire parse tree; building
that that could take a while.

In the meantime, one chintzy hack you could try is to split large files up
using regular expressions. You'll need the output to begin and end with the
original header (ISA, GS, etc) and footer (GE, IEA) but you can then
parse each separate ST ... SE section separately. Sorry, it sucks :-\


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#65 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABCLkrknWGtPU9HhF2lbeBGFmIcnUR2Qks5qO_JfgaJpZM4EqpWq
.

@kputnam
Copy link
Owner

kputnam commented Jul 6, 2016

Ah, I understand. You could replace identifying information with randomized text, but that might be a lot of work for a large file, and it might even take a bit of work to review which elements should be randomized.

I'll leave the issue open; if you or anyone has some test files at a later date, we can work on it!

@chrismv48
Copy link

Hi, has anyone come up with a solution for this issue? The runtime for our process has ballooned to 30+ hours and we suspect this is the culprit since we are processing a very large dataset. Is the only solution to break it up via regex as suggested by @kputnam ?

@kputnam
Copy link
Owner

kputnam commented Dec 8, 2018

While this issue still bugs me, personally the payoff is pretty low compared to the amount of work involved, so I'm not likely to work on this. However, if someone can provide diagnostic information it might pique my (or others') interest enough to get the ball rolling.

I've added a tool in bin/edi-obfuscate that should help anyone that wants to submit a large example that causes this issue. This script will obfuscate sensitive data by replacing any free-form text fields [AN], dates [DT], and times [TM} -- leaving only numbers [N and R], and identifiers [ID]. Note that "identifiers" aren't sensitive data, they're just the name the X12 committee gave to those short 2 or 3-letter qualifier codes.

@RomanKapitonov
Copy link
Contributor

Well, i've played a bit with this one. I'm not 100% sure but seems IO is a bottleneck in this case. Consider the following script (nothing special, just generates a 832 sample):

base = <<-X12
ISA*00* *00* *14*004321519IBMP *ZZ*7777777777 *071011*0239*U*00400*000409939* *P*>~
GS*SC*004321519*7777777777*20071011*0239*409939*X*004010~
ST*832*0001~
BCT*PC*FISHER 10/10/07****CATALOG 0000001~
DTM*009*20071010*1632~
N1*ST*SPECIAL LABS*91*999999001~
N3*1000 PENNSYLVANIA AVE~
N4*PITTSBURGH*PA*15222~
X12

prod1 = <<-X12
LIN*000001*VN*1234321~
PID*F*08***LENS PAPER BOOK 12BK/PK~
PO4*12**EA~
CTP**CON*14.80*1*PK~
X12

prod2 = <<-X12
LIN*000002*VN*2345432~
PID*F*08***LENS PAPER BOOK 24BK/PK~
PO4*12**EA~
CTP**CON*20.80*1*PK~
X12

footer = <<-X12
CTT*2~
SE*16*0001~
GE*1*409939~
IEA*1*000409939~
X12

buffer = StringIO.new

buffer.write(base)
1000.times do
  buffer.write(prod1)
end
1000.times do
  buffer.write(prod2)
end

buffer.write(footer)

File.open('832.x12', 'w') do |f|
  f.write(buffer.string)
end

Here is a sample output from the profiler:

==================================
  Mode: cpu(1000)
  Samples: 2804 (0.21% miss rate)
  GC: 0 (0.00%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
      1631  (58.2%)        1631  (58.2%)     Stupidedi::Reader::FileInput#at
       693  (24.7%)         650  (23.2%)     Stupidedi::Reader::FileInput#drop
        96   (3.4%)          96   (3.4%)     Stupidedi::Schema::RepeatCount::Bounded#include?
        63   (2.2%)          63   (2.2%)     #<Module:0x00007fc45300ddf8>#cons
        43   (1.5%)          43   (1.5%)     Stupidedi::Reader::FileInput#initialize
        79   (2.8%)          42   (1.5%)     Kernel#require
        24   (0.9%)          24   (0.9%)     #<Module:0x00007fc45484be88>#blank?
        24   (0.9%)          24   (0.9%)     Stupidedi::Reader::Position#initialize
        24   (0.9%)          24   (0.9%)     Stupidedi::Values::SimpleElementVal#initialize
        18   (0.6%)          18   (0.6%)     Stupidedi::Builder::LoopState#initialize
        16   (0.6%)          16   (0.6%)     Stupidedi::Reader::Success#initialize
        16   (0.6%)          16   (0.6%)     #<Module:0x00007fc45484be88>#blank?
       181   (6.5%)          14   (0.5%)     Stupidedi::Builder::AbstractState.mksegment
        11   (0.4%)          11   (0.4%)     #<Module:0x00007fc45300ddf8>#cons
         9   (0.3%)           9   (0.3%)     #<Module:0x00007fc45484be88>#blank?
      1715  (61.2%)           9   (0.3%)     Stupidedi::Reader::TokenReader#read_elements
         9   (0.3%)           9   (0.3%)     Stupidedi::Builder::TableState#initialize
         8   (0.3%)           8   (0.3%)     Stupidedi::Reader::TokenReader#initialize
         7   (0.2%)           7   (0.2%)     Stupidedi::Zipper::Hole#initialize
         7   (0.2%)           6   (0.2%)     Cantor::AbsoluteSet.build
         6   (0.2%)           6   (0.2%)     Stupidedi::Builder::TransactionSetState#initialize
         6   (0.2%)           6   (0.2%)     Stupidedi::Values::LoopVal#initialize
       577  (20.6%)           6   (0.2%)     Stupidedi::Reader::TokenReader#read_delimiter
         6   (0.2%)           6   (0.2%)     Stupidedi::Zipper::EditedCursor#initialize
         4   (0.1%)           4   (0.1%)     Stupidedi::Reader::FileInput#defined_at?
         3   (0.1%)           3   (0.1%)     Stupidedi::Reader::SegmentTok#initialize
         3   (0.1%)           3   (0.1%)     MonitorMixin#mon_enter
       218   (7.8%)           3   (0.1%)     Stupidedi::Reader::TokenReader#read_segment_id
         5   (0.2%)           3   (0.1%)     Stupidedi::Builder::InstructionTable::NonEmpty#matches
         3   (0.1%)           3   (0.1%)     Stupidedi::Schema::SimpleElementUse#simple?

If we take a look at what is actually #at doing:

      # (see AbstractInput#at)
      # @return [String]
      def at(n)
        raise ArgumentError, "n must be positive" unless n >= 0

        @io.seek(@offset + n)
        @io.read(1)
      end

It seeks the by offset and n, and then reads a byte. Please note I'm using SSD so random IO is much faster that on an HDD, yet seek is still an expensive operation. Those are just basic thoughts, but might be a good starting point to follow. Not sure where the memory consumption comes from tho.

@RomanKapitonov
Copy link
Contributor

RomanKapitonov commented Jan 14, 2019

I've also compared FileIO vs StringIO(had to do some tweaks to the input class) which obviously gives a boost as of file vs memory (which i believe on HDD vs Memory will have a significant difference, unfortunately i do not have an HDD based computer by hand to prove that):

➜  stupidedi git:(master) ✗ bin/edi-pp 832.x12
8012 segments
1.674 seconds # StringIO
➜  stupidedi git:(master) ✗ bin/edi-pp 832.x12
8012 segments
3.318 seconds # FileIO

Code profile differs as well StringIO:

==================================
  Mode: cpu(1000)
  Samples: 1646 (0.24% miss rate)
  GC: 0 (0.00%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
       491  (29.8%)         491  (29.8%)     Stupidedi::Reader::FileInput#initialize
       150   (9.1%)         150   (9.1%)     Stupidedi::Schema::RepeatCount::Bounded#include?
        93   (5.7%)          93   (5.7%)     #<Module:0x00007fd9401c5b80>#cons
        87   (5.3%)          87   (5.3%)     Stupidedi::Reader::Position#initialize
        69   (4.2%)          69   (4.2%)     Stupidedi::Reader::Success#initialize
       248  (15.1%)          57   (3.5%)     Stupidedi::Reader::TokenReader#read_delimiter
        56   (3.4%)          56   (3.4%)     #<Module:0x00007fd9401c5b80>#cons
        43   (2.6%)          43   (2.6%)     #<Module:0x00007fd94017cc28>#blank?
        79   (4.8%)          41   (2.5%)     Kernel#require
        41   (2.5%)          41   (2.5%)     Stupidedi::Reader::FileInput#defined_at?
        37   (2.2%)          37   (2.2%)     Stupidedi::Builder::LoopState#initialize
        35   (2.1%)          35   (2.1%)     Stupidedi::Values::SimpleElementVal#initialize
       801  (48.7%)          34   (2.1%)     Stupidedi::Reader::TokenReader#read_elements
        28   (1.7%)          28   (1.7%)     #<Module:0x00007fd940197be0>#tail
       268  (16.3%)          26   (1.6%)     Stupidedi::Builder::AbstractState.mksegment
        25   (1.5%)          25   (1.5%)     Stupidedi::Reader::TokenReader#initialize
        95   (5.8%)          23   (1.4%)     Stupidedi::Reader::TokenReader#read_segment_id
        22   (1.3%)          22   (1.3%)     Stupidedi::Builder::TransactionSetState#initialize
        20   (1.2%)          20   (1.2%)     #<StringIO:0x00007fd941259c58>.path
        19   (1.2%)          19   (1.2%)     #<Module:0x00007fd94017cc28>#blank?
        19   (1.2%)          19   (1.2%)     Stupidedi::Reader::FileInput#at
      1079  (65.6%)          19   (1.2%)     Stupidedi::Reader::TokenReader#read_segment
       509  (30.9%)          18   (1.1%)     Stupidedi::Reader::FileInput#drop
        18   (1.1%)          18   (1.1%)     #<Module:0x00007fd94017cc28>#blank?
        16   (1.0%)          15   (0.9%)     Stupidedi::Builder::InstructionTable::NonEmpty#matches
        12   (0.7%)          12   (0.7%)     Stupidedi::Reader::SimpleElementTok#initialize
        11   (0.7%)          11   (0.7%)     Stupidedi::Zipper::Hole#initialize
        11   (0.7%)          11   (0.7%)     Stupidedi::Reader::SegmentDict::Constants#defined_at?
         9   (0.5%)           9   (0.5%)     Stupidedi::Builder::TableState#initialize
         8   (0.5%)           8   (0.5%)     Stupidedi::Zipper::EditedCursor#initialize

Versus FileIO:

==================================
  Mode: cpu(1000)
  Samples: 2763 (0.14% miss rate)
  GC: 0 (0.00%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
      1630  (59.0%)        1630  (59.0%)     Stupidedi::Reader::FileInput#at
       666  (24.1%)         637  (23.1%)     Stupidedi::Reader::FileInput#drop
        87   (3.1%)          87   (3.1%)     Stupidedi::Schema::RepeatCount::Bounded#include?
        62   (2.2%)          62   (2.2%)     #<Module:0x00007ffaa323d4d0>#cons
        78   (2.8%)          50   (1.8%)     Kernel#require
        29   (1.0%)          29   (1.0%)     Stupidedi::Reader::FileInput#initialize
        22   (0.8%)          22   (0.8%)     Stupidedi::Reader::Position#initialize
        20   (0.7%)          20   (0.7%)     #<Module:0x00007ffaa31f4938>#blank?
        20   (0.7%)          20   (0.7%)     Stupidedi::Builder::LoopState#initialize
        18   (0.7%)          18   (0.7%)     Stupidedi::Values::SimpleElementVal#initialize
        17   (0.6%)          17   (0.6%)     Stupidedi::Reader::Success#initialize
       174   (6.3%)          16   (0.6%)     Stupidedi::Builder::AbstractState.mksegment
        15   (0.5%)          15   (0.5%)     #<Module:0x00007ffaa31f4938>#blank?
        14   (0.5%)          14   (0.5%)     #<Module:0x00007ffaa31f4938>#blank?
         9   (0.3%)           9   (0.3%)     Stupidedi::Builder::TransactionSetState#initialize
         7   (0.3%)           7   (0.3%)     Stupidedi::Zipper::EditedCursor#initialize
      1684  (60.9%)           6   (0.2%)     Stupidedi::Reader::TokenReader#read_elements
         6   (0.2%)           6   (0.2%)     Stupidedi::Reader::TokenReader#initialize
         6   (0.2%)           6   (0.2%)     Stupidedi::Builder::TableState#initialize
         6   (0.2%)           6   (0.2%)     Stupidedi::Reader::FileInput#defined_at?
         6   (0.2%)           6   (0.2%)     Stupidedi::Zipper::Hole#initialize
       598  (21.6%)           6   (0.2%)     Stupidedi::Reader::TokenReader#read_delimiter
         6   (0.2%)           6   (0.2%)     #<Module:0x00007ffaa323d4d0>#cons
         5   (0.2%)           4   (0.1%)     Stupidedi::Builder::InstructionTable::NonEmpty#matches
         4   (0.1%)           4   (0.1%)     #<Module:0x00007ffaa320fd00>#tail
         8   (0.3%)           4   (0.1%)     Stupidedi::Builder::AbstractState.mkcomposite
         4   (0.1%)           4   (0.1%)     Stupidedi::Reader::SimpleElementTok#initialize
         4   (0.1%)           4   (0.1%)     Cantor::AbsoluteSet.build
         4   (0.1%)           4   (0.1%)     Stupidedi::Reader::SegmentTok#initialize
       371  (13.4%)           3   (0.1%)     Stupidedi::Builder::Generation#insert

The output above is from https://github.com/tmm1/stackprof

@irobayna
Copy link
Collaborator

@RomanKapitonov Thank you! very good info

@kputnam What do you think?

@kputnam
Copy link
Owner

kputnam commented Jan 14, 2019

@RomanKapitonov thanks for that! I knew reading from an IO would be slower but I haven't compared in a long time, and didn't realize it was that big of a difference. Also, thanks for the tip about stackprof... looks like that will come in handy. Sometime in the future I would like to review the tokenizer (Reader) because I think there is some low-hanging fruit for speed gains.

The bigger concern for me is the memory usage, though. I've heard reports that stupidedi (possibly in combination with other in-process things) can take multiple gigabytes of memory to read fairly small files (<< 1GB). I know of a couple places that use excess memory, but without a memory profile I would be poking in the dark trying to fix it.

@RomanKapitonov
Copy link
Contributor

RomanKapitonov commented Jan 14, 2019

@kputnam, @irobayna for the case of 5000 copies of prod1 and prod2 from the script above and i get the following results (40012 segments). The amount of used memory is pretty huge.

Used tool: https://github.com/SamSaffron/memory_profiler
Ruby version: ruby 2.4.3p205 (2017-12-14 revision 61247) [x86_64-darwin15]

GC is disabled, here are some observations (sizes are in bytes):

Total allocated: 1762444159 bytes (15308349 objects)
Total retained:  485760653 bytes (1536745 objects)

allocated memory by gem
-----------------------------------
1752831889  stupidedi/lib
   5123694  rubygems
   2640704  bigdecimal
   1847872  cantor-1.2.1

allocated memory by file
-----------------------------------
 827706808  /Users/rkapitonov/repos/ruby/stupidedi/lib/ruby/object.rb
 320829344  /Users/rkapitonov/repos/ruby/stupidedi/lib/stupidedi/reader/input/file_input.rb
 129639176  /Users/rkapitonov/repos/ruby/stupidedi/lib/stupidedi/reader/token_reader.rb
  84052344  /Users/rkapitonov/repos/ruby/stupidedi/lib/stupidedi/schema/element_use.rb
  51215200  /Users/rkapitonov/repos/ruby/stupidedi/lib/stupidedi/reader/result.rb
  48970504  /Users/rkapitonov/repos/ruby/stupidedi/lib/stupidedi/builder/states/abstract_state.rb
  38412800  /Users/rkapitonov/repos/ruby/stupidedi/lib/ruby/blank.rb
  34167800  /Users/rkapitonov/repos/ruby/stupidedi/lib/stupidedi/zipper/edited_cursor.rb
  32333752  /Users/rkapitonov/repos/ruby/stupidedi/lib/stupidedi/either.rb
  30250400  /Users/rkapitonov/repos/ruby/stupidedi/lib/stupidedi/reader/position.rb
  23603984  /Users/rkapitonov/repos/ruby/stupidedi/lib/stupidedi/values/simple_element_val.rb
  18405424  /Users/rkapitonov/repos/ruby/stupidedi/lib/stupidedi/builder/generation.rb
  14347072  /Users/rkapitonov/repos/ruby/stupidedi/lib/ruby/array.rb
  13629566  /Users/rkapitonov/repos/ruby/stupidedi/lib/stupidedi/versions/functional_groups/004010/element_types/identifier_val.rb
  12005632  /Users/rkapitonov/repos/ruby/stupidedi/lib/stupidedi/reader/tokens/simple_element_tok.rb
  10402184  /Users/rkapitonov/repos/ruby/stupidedi/lib/stupidedi/builder/states/loop_state.rb
  10002320  /Users/rkapitonov/repos/ruby/stupidedi/lib/stupidedi/zipper/abstract_cursor.rb
...

Or when grouped by file location:

allocated memory by location
-----------------------------------
 825685576  /Users/rkapitonov/repos/ruby/stupidedi/lib/ruby/object.rb:18
 119075328  /Users/rkapitonov/repos/ruby/stupidedi/lib/stupidedi/reader/input/file_input.rb:105
  81608760  /Users/rkapitonov/repos/ruby/stupidedi/lib/stupidedi/schema/element_use.rb:83
  49614800  /Users/rkapitonov/repos/ruby/stupidedi/lib/stupidedi/reader/input/file_input.rb:21
  49614720  /Users/rkapitonov/repos/ruby/stupidedi/lib/stupidedi/reader/input/file_input.rb:149
  47216960  /Users/rkapitonov/repos/ruby/stupidedi/lib/stupidedi/reader/input/file_input.rb:64
...

The top one consumer seems to be: https://github.com/irobayna/stupidedi/blob/f3ca1a27b6cc5e25038086c18374076a40a4fbdd/lib/ruby/object.rb#L17-L19

Also a huge amount of strings are retained. I believe this is because they are allocated during the parse phase. A complete profile is available through this gist https://gist.github.com/RomanKapitonov/3c0399fc7989867bac184084ca9292e4

The results are obtained by running

bin/edi-pp 832.x12

With the following section being wrapped into profiler:

    report = MemoryProfiler.report do
      parser, _result = parser.read(Stupidedi::Reader.build(io))
    end

    report.pretty_print

@irobayna
Copy link
Collaborator

@RomanKapitonov Thanks Roman, a mountain of very valuable data to look at. Definitely room for improvement.

@RomanKapitonov
Copy link
Contributor

RomanKapitonov commented Jan 15, 2019

@irobayna @kputnam I've had a little time today to take a closer look at this. I've eventually evaluated the following result. Based on the observation from memory profiler, i've tried switching from "immutability first" concept to mutating the objects in one case.

I've switched to the following definition for Object#cons:

      def cons(array = [])
        array.unshift(self)
      end

      def snoc(array = [])
        array.push(self)
      end

Consider the following output from two test runs. Original Object#cons:

➜  stupidedi git:(master) ✗ ruby 832.rb && bin/edi-pp 832.x12
{:T_ARRAY=>"1038.81 Mb", :TOTAL=>"1761.57 Mb"}
40012 segments
17.333 seconds

Mutating Object#cons:

➜  stupidedi git:(master) ✗ ruby 832.rb && bin/edi-pp 832.x12
{:T_ARRAY=>"257.14 Mb", :TOTAL=>"979.71 Mb"}
80024 segments
15.518 seconds

The distance between nodes breaks for some reason, but i've checked whether the generated tree is the same and so it seems to be. So disregard the twice segments amount in the second case, it is still 40012 segments. An interesting part in this is that memory usage for T_ARRAY is 4 times reduced.

According to the doc Array#+ will allocate a new array each time, while Array#unshift is mutating.

InputFile#copy is yet another candidate, but i didn't manage to switch away from creating a new object on each FileInput#drop yet

@irobayna
Copy link
Collaborator

irobayna commented Jan 15, 2019

Interesting findings, in particular, memory usage. Could be a case where speed might become irrelevant for the sake of keeping memory low.

BTW, I thought unshift was supposed to be way faster (at least on ruby 2.2)

image

Source: https://github.com/JuanitoFatas/fast-ruby

@kputnam
Copy link
Owner

kputnam commented Jan 15, 2019

Hey @RomanKapitonov, thanks so much. This is really helpful! I will start poking at this next week, once I've finished some other things I have started (#164). Don't spend too much time working out how to optimize the tokenizer (FileInput), because i have some plans to rework that stuff soon.

@kputnam
Copy link
Owner

kputnam commented Jan 16, 2019

@RomanKapitonov, can you paste the code you used to generate this output formatting?:

➜  stupidedi git:(master) ✗ ruby 832.rb && bin/edi-pp 832.x12
{:T_ARRAY=>"257.14 Mb", :TOTAL=>"979.71 Mb"}
80024 segments
15.518 seconds

@kputnam
Copy link
Owner

kputnam commented Jan 18, 2019

Looks like we'll be changing the implementation of Object#cons and Object#snoc. Before pushing these changes, I'm going to wait until other bigger changes have been released. Then I'll try to write a larger benchmark suite that will hopefully catch any performance regressions in the library over time.

But in the meantime, here is some benchmark data comparing 4 implementations of cons and 3 implementations of snoc: https://gist.github.com/kputnam/015f6287dea48471baea9f419bbe274d.

The [{:data=>"Array", :count=>60000000}] lines display what I believe is how much memory is allocated to Array objects. I'm not sure what the results mean for ruby 2.5 and 2.6, since it doesn't show any allocations for [0, *xs] -- perhaps there are none, which might explain the speed increase.

Thanks @RomanKapitonov for pointing me in that direction!

@kputnam
Copy link
Owner

kputnam commented Jan 18, 2019

Also, note the bottom line of each file shows a huge increase in memory consumption on ruby 2.4... I'm not sure if that's a general issue, or something specific to my benchmark? I'm guessing it's changes internal to the ruby VM.

ruby-2.2.10: Process: 177 MiB
ruby-2.3.8:  Process: 229 MiB
ruby-2.4.5:  Process: 916 MiB
ruby-2.5.3:  Process: 352 MiB
ruby-2.6.0:  Process: 388 MiB

@RomanKapitonov
Copy link
Contributor

RomanKapitonov commented Jan 18, 2019

@kputnam Do you still need code sample for getting my output? I assume not since you've crafted a nice benchmark covering more cases. But let me know just in case. Mine is basically one of the cases covered by yours.

@dobrynin
Copy link

dobrynin commented Apr 3, 2019

Hello, I've ran into a performance issue with stupidedi. I'm not sure if it relates to what has been previously discussed here. For now we I'll likely pursue splitting the file prior to parsing, but I am confused about what is causing the slowness.

I've got a file with 841 PO850s (purchase orders). The file has around 90,000 lines, and I find that if I remove the last ~10,000, then stupidedi is able to parse the remaining 80,000 lines very quickly (~1-2 seconds). If I parse just the last ~10,000 lines, parsing is slow, but reasonable (~5-10 seconds). All 90,000 lines taken together however take a very long time (Maybe 1 hour? I haven't measured).

I used profile.rb in an attempt to find out what was going on for the last ~10,000 lines, and I think it also shows that cons is the most likely suspect.
results.txt

Here is the schema I've defined. I'm not sure if there is some ambiguity that may be slowing down the parser.

        d::TransactionSetDef.build('PO', '850',
          'Purchase Order',
          d::TableDef.header('Table 1 - Header',
            s:: ST.use(10, r::Mandatory, d::RepeatCount.bounded(1)),
            s::BEG.use(20, r::Mandatory, d::RepeatCount.bounded(1)),
            s::REF.use(50, r::Optional,  d::RepeatCount.unbounded),
            s::PER.use(60, r::Optional,  d::RepeatCount.bounded(3)),
            s::FOB.use(80, r::Optional,  d::RepeatCount.unbounded),
            s::ITD.use(130, r::Optional, d::RepeatCount.unbounded),
            s::DTM.use(150, r::Optional, d::RepeatCount.bounded(10)),
            s::TD5.use(240, r::Optional, d::RepeatCount.bounded(12)),
            s::PCT.use(276, r::Optional, d::RepeatCount.unbounded),
            d::LoopDef.build('N9', d::RepeatCount.bounded(1000),
              s::N9.use(295, r::Optional, d::RepeatCount.bounded(1)),
              s::MSG.use(300, r::Optional, d::RepeatCount.bounded(1000))),
            d::LoopDef.build('N1', d::RepeatCount.bounded(200),
              s::N1.use(310, r::Optional,  d::RepeatCount.bounded(1)),
              s::N4.use(340, r::Optional,  d::RepeatCount.unbounded))),
          d::TableDef.detail('Table 2 - Detail',
            d::LoopDef.build('PO1', d::RepeatCount.bounded(100_000),
              s::PO1.use(10, r::Mandatory, d::RepeatCount.bounded(1)),
              d::LoopDef.build('PID', d::RepeatCount.bounded(1000),
                s::PID.use(50, r::Optional, d::RepeatCount.bounded(1))),
              d::LoopDef.build('PO4', d::RepeatCount.bounded(1),
                s::PO4.use(90, r::Optional, d::RepeatCount.unbounded)),
              d::LoopDef.build('SAC', d::RepeatCount.bounded(25),
                s::SAC.use(130, r::Optional, d::RepeatCount.bounded(1))),
              d::LoopDef.build('SDQ', d::RepeatCount.bounded(1),
                s::SDQ.use(190, r::Optional, d::RepeatCount.bounded(500))),
              d::LoopDef.build('TC2', d::RepeatCount.bounded(1),
                s::TC2.use(235, r::Optional, d::RepeatCount.unbounded)),
              d::LoopDef.build('SLN', d::RepeatCount.bounded(1000),
                s::SLN.use(470, r::Optional, d::RepeatCount.bounded(1)),
                s::PID.use(490, r::Optional, d::RepeatCount.bounded(1000)),
                s::CTP.use(516, r::Optional, d::RepeatCount.bounded(25))))),
          d::TableDef.summary('Table 3 - Summary',
            d::LoopDef.build('CTT', d::RepeatCount.bounded(1),
              s::CTT.use(10, r::Optional, d::RepeatCount.bounded(1)),
              s:: SE.use(30, r::Mandatory, d::RepeatCount.bounded(1)))))

Would appreciate any insight, thanks!

@kputnam
Copy link
Owner

kputnam commented Apr 4, 2019

@dobrynin thanks for reporting! I suspect it's a huge garbage collection taking place, and I'm betting most performance issues with stupidedi are caused by excessive memory consumption that causes long garbage collections.

You could get a better sense if this is happening by monitoring the memory usage with some kind of process monitor, and watching the consumption increase until a long pause with 100% CPU, then memory drops suddenly. That's what I saw happening on synthetically generated large files.

Would you be willing to run your file through bin/edi-obfuscate and send it to me? The obfuscator will remove any strings, dates, numbers, times, and only leave segment names and qualifier elements that are required for parsing. I'm also curious if you have the same results parsing the obfuscated file.

I'm (slowly) working on a benchmark suite to keep track of performance over time, and to make sure once this is fixed that it doesn't regress. But having real-world examples like yours will really help make sure any performance measurements are are tied to reality.

@dobrynin
Copy link

dobrynin commented Apr 4, 2019

@kputnam thanks for the quick response. I can get you those things but I've got a couple other things on my plate today, I'll try to get to it soon.

One quick comment about the file size. I am somewhat skeptical that the file size itself is responsible as we've had successful runs with files that are substantially larger than the one that is causing problems (I think the problematic one is ~1 MB while the biggest we've done is ~7 MB, will have to check). So that leads me to believe that there is something about the content/structure of this particular PO850 message that is causing the slowness.

casconed added a commit to origami-medical/stupidedi that referenced this issue Oct 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants