Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large increase in memory usage when upgrading to Turnip 4.4.0 #251

Open
addersuk opened this issue Nov 9, 2022 · 4 comments
Open

Large increase in memory usage when upgrading to Turnip 4.4.0 #251

addersuk opened this issue Nov 9, 2022 · 4 comments
Assignees

Comments

@addersuk
Copy link

addersuk commented Nov 9, 2022

In a private repo we have seen a massive increase in memory usage of our app that is using turnip since we upgraded to Turnip 4.4.0

Before the upgrade we run two ruby threads on our CI server

Tasks: 144 total,   3 running, 141 sleeping,   0 stopped,   0 zombie
%Cpu(s): 98.5 us,  1.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   3924.0 total,    139.2 free,   1747.1 used,   2037.7 buff/cache
MiB Swap:   4340.0 total,   4336.0 free,      4.0 used.   1893.9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   4788 **+  20   0  578064 541888  13236 R  99.0  13.5   0:52.55 bundle
   4787 **+  20   0  783668 747956  13192 R  98.3  18.6   0:52.12 bundle

After the upgrade to 4.4.0 with two threads

top - 16:39:42 up 36 min,  1 user,  load average: 2.87, 2.86, 2.41
Tasks: 133 total,   6 running, 127 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.6 us, 22.5 sy,  0.0 ni, 12.4 id, 48.9 wa,  0.0 hi,  4.4 si,  6.1 st
MiB Mem :   3924.0 total,    108.0 free,   3730.1 used,     85.9 buff/cache
MiB Swap:   4340.0 total,     47.3 free,   4292.7 used.     33.6 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   6232 **+  20   0 3741972   1.7g   3132 R  18.7  43.7  11:20.64 bundle
   6231 **+  20   0 3914640   1.7g   1136 R  15.7  44.4  11:34.07 bundle

I'll try and create a test case to illustrate this as it is happening in a private repo that makes extensive usage of Gherkin feature files.

We suspect this is due to the move to cuke_modeler

@leoc
Copy link
Collaborator

leoc commented Nov 10, 2022

Thanks for the report! I am looking forward to a test case that helps reproducing the issue comparing 4.3 to 4.4.
I just had a brief look at cuke_modeler and it seemed like a good abstraction layer.

In any case if I read the comparison correctly the memory increase is at a factor of ~3.
But the CPU usage is much lower could that be right? Some kind of trade off happening here?

@enkessler As you created cuke_modeler and #244, do you have an idea what may be happening here?

@enkessler
Copy link
Contributor

enkessler commented Nov 10, 2022

...a massive increase in memory usage...extensive usage of Gherkin feature files...

Yeah, that's probably due to cuke_modeler. When CukeModeler makes a model tree, you're getting everything. Models for directories, their feature files, the comments within those files, any random @wip tags lying around on tests; all of it. CukeModeler doesn't know what the models are needed for, so it's not going to pick and choose what information to keep or leave out and all of those model objects are going to hang around in memory until Ruby thinks that they can be cleaned up, the same as any other object. If it turns out that a directory containing 10,000 tests was modeled in order to find the one test that Turnip actually wants to run or something like that then, yeah, that's 10,000+ objects in memory until either the process ends or the references to that model tree gets thrown away.

So I'm guessing that it's a fixable problem. We just have to have a look at how long the Turnip internals need to hang on to the objects or maybe have it prune off parts of the tree that it knows that it doesn't need.

@enkessler
Copy link
Contributor

enkessler commented Nov 10, 2022

Additionally, cucumber-gherkin provides a lot more information than cuke_modeler bothers to model but, in order to not be a limiting factor, all of that underlying, non-abstracted data is still made available to the user via <some_model_class>#parsing_data. For example, before CukeModeler had models for the comments in feature files, a user would have been able to still grab the parsed data contained in the feature model and find the comments in there.

Because of all that, the CukeModeler models are 'bigger' than the Gherkin objects that they replaced. But we could, at the very least, throw out that extra data after the models are created inside of the Turnip code.

@enkessler
Copy link
Contributor

@addersuk A test or sample project that reproduces the issues would be helpful but, in the mean time, stick this monkey patch in somewhere and see if it affects the numbers.

module Turnip
  class Builder
    def self.build(feature_file)
      feature_file = CukeModeler::FeatureFile.new(feature_file)

      # We don't need any of the raw data from Gherkin, so save memory by getting rid of it.
      feature_file.each do |model|
        model.parsing_data = nil
      end

      return nil unless feature_file.feature
      Node::Feature.new(feature_file.feature)
    end
  end
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants