Large increase in memory usage when upgrading to Turnip 4.4.0 #251

addersuk · 2022-11-09T16:42:47Z

In a private repo we have seen a massive increase in memory usage of our app that is using turnip since we upgraded to Turnip 4.4.0

Before the upgrade we run two ruby threads on our CI server

Tasks: 144 total,   3 running, 141 sleeping,   0 stopped,   0 zombie
%Cpu(s): 98.5 us,  1.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   3924.0 total,    139.2 free,   1747.1 used,   2037.7 buff/cache
MiB Swap:   4340.0 total,   4336.0 free,      4.0 used.   1893.9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   4788 **+  20   0  578064 541888  13236 R  99.0  13.5   0:52.55 bundle
   4787 **+  20   0  783668 747956  13192 R  98.3  18.6   0:52.12 bundle

After the upgrade to 4.4.0 with two threads

top - 16:39:42 up 36 min,  1 user,  load average: 2.87, 2.86, 2.41
Tasks: 133 total,   6 running, 127 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.6 us, 22.5 sy,  0.0 ni, 12.4 id, 48.9 wa,  0.0 hi,  4.4 si,  6.1 st
MiB Mem :   3924.0 total,    108.0 free,   3730.1 used,     85.9 buff/cache
MiB Swap:   4340.0 total,     47.3 free,   4292.7 used.     33.6 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   6232 **+  20   0 3741972   1.7g   3132 R  18.7  43.7  11:20.64 bundle
   6231 **+  20   0 3914640   1.7g   1136 R  15.7  44.4  11:34.07 bundle

I'll try and create a test case to illustrate this as it is happening in a private repo that makes extensive usage of Gherkin feature files.

We suspect this is due to the move to cuke_modeler

The text was updated successfully, but these errors were encountered:

leoc · 2022-11-10T08:44:31Z

Thanks for the report! I am looking forward to a test case that helps reproducing the issue comparing 4.3 to 4.4.
I just had a brief look at cuke_modeler and it seemed like a good abstraction layer.

In any case if I read the comparison correctly the memory increase is at a factor of ~3.
But the CPU usage is much lower could that be right? Some kind of trade off happening here?

@enkessler As you created cuke_modeler and #244, do you have an idea what may be happening here?

enkessler · 2022-11-10T18:03:14Z

...a massive increase in memory usage...extensive usage of Gherkin feature files...

Yeah, that's probably due to cuke_modeler. When CukeModeler makes a model tree, you're getting everything. Models for directories, their feature files, the comments within those files, any random @wip tags lying around on tests; all of it. CukeModeler doesn't know what the models are needed for, so it's not going to pick and choose what information to keep or leave out and all of those model objects are going to hang around in memory until Ruby thinks that they can be cleaned up, the same as any other object. If it turns out that a directory containing 10,000 tests was modeled in order to find the one test that Turnip actually wants to run or something like that then, yeah, that's 10,000+ objects in memory until either the process ends or the references to that model tree gets thrown away.

So I'm guessing that it's a fixable problem. We just have to have a look at how long the Turnip internals need to hang on to the objects or maybe have it prune off parts of the tree that it knows that it doesn't need.

enkessler · 2022-11-10T21:22:33Z

Additionally, cucumber-gherkin provides a lot more information than cuke_modeler bothers to model but, in order to not be a limiting factor, all of that underlying, non-abstracted data is still made available to the user via <some_model_class>#parsing_data. For example, before CukeModeler had models for the comments in feature files, a user would have been able to still grab the parsed data contained in the feature model and find the comments in there.

Because of all that, the CukeModeler models are 'bigger' than the Gherkin objects that they replaced. But we could, at the very least, throw out that extra data after the models are created inside of the Turnip code.

enkessler · 2022-11-11T17:53:45Z

@addersuk A test or sample project that reproduces the issues would be helpful but, in the mean time, stick this monkey patch in somewhere and see if it affects the numbers.

module Turnip
  class Builder
    def self.build(feature_file)
      feature_file = CukeModeler::FeatureFile.new(feature_file)

      # We don't need any of the raw data from Gherkin, so save memory by getting rid of it.
      feature_file.each do |model|
        model.parsing_data = nil
      end

      return nil unless feature_file.feature
      Node::Feature.new(feature_file.feature)
    end
  end
end

leoc assigned addersuk Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large increase in memory usage when upgrading to Turnip 4.4.0 #251

Large increase in memory usage when upgrading to Turnip 4.4.0 #251

addersuk commented Nov 9, 2022 •

edited

leoc commented Nov 10, 2022

enkessler commented Nov 10, 2022 •

edited

enkessler commented Nov 10, 2022 •

edited

enkessler commented Nov 11, 2022

Large increase in memory usage when upgrading to Turnip 4.4.0 #251

Large increase in memory usage when upgrading to Turnip 4.4.0 #251

Comments

addersuk commented Nov 9, 2022 • edited

leoc commented Nov 10, 2022

enkessler commented Nov 10, 2022 • edited

enkessler commented Nov 10, 2022 • edited

enkessler commented Nov 11, 2022

addersuk commented Nov 9, 2022 •

edited

enkessler commented Nov 10, 2022 •

edited

enkessler commented Nov 10, 2022 •

edited