Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autoproj is killed by OOM Exception after upgrade to 2.15.1 #368

Closed
m0rsch1 opened this issue Feb 16, 2022 · 17 comments · Fixed by #371
Closed

autoproj is killed by OOM Exception after upgrade to 2.15.1 #368

m0rsch1 opened this issue Feb 16, 2022 · 17 comments · Fixed by #371
Labels

Comments

@m0rsch1
Copy link

m0rsch1 commented Feb 16, 2022

On Ubuntu 20.04 with ruby2.7 autoproj osdeps and others gets killed by the OS due to an Out Of Memory (OOM) Exception.
Downgrading to version 2.15.0 solves this issue.

autoproj version
autoproj version: 2.15.1

autoproj osdeps --verbose --debug
[...]
Killed
[600269.465614] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service,task=ruby2.7,pid=383303,uid=1000
[600269.465622] Out of memory: Killed process 383303 (ruby2.7) total-vm:8317176kB, anon-rss:4878364kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:9800kB oom_score_adj:0
[600269.591413] oom_reaper: reaped process 383303 (ruby2.7), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
@m0rsch1 m0rsch1 added the bug label Feb 16, 2022
@doudou
Copy link
Member

doudou commented Feb 16, 2022

My wild guess is that this has something to do with the refactoring of dependency resolution (#335). Maybe an infinite loop somewhere in there.

@m0rsch1 it would be difficult to debug this without a reproducing use-case. Do you think you could try to track down where in the code autoproj gets stuck ?

@m0rsch1
Copy link
Author

m0rsch1 commented Feb 17, 2022

I will try

@annaborn
Copy link
Contributor

Have same issue. It happens in the buildconf with over 20 package_sets

@m0rsch1
Copy link
Author

m0rsch1 commented Feb 17, 2022

So far i have traced it back to configuration.rb and function sort_package_sets_by_import_order.
From there i traced it down to PackageSetHierarchy.initialize and finally to the check unless @dag.acyclic?.
Since this is part of a different package, the bug seems to be in RGL and not in autoproj.
But we should remove the check then.

@m0rsch1
Copy link
Author

m0rsch1 commented Feb 17, 2022

Maybe the graph is cyclic. But shouldn't autoproj deal with that case?
See also this interesting piece of software https://github.com/PaulPauls/cyclic-toposort

@m0rsch1
Copy link
Author

m0rsch1 commented Feb 17, 2022

The culprit seams to be the function @dag.cycles

@m0rsch1
Copy link
Author

m0rsch1 commented Feb 17, 2022

Found an additional problem here:
In the first pass, the dependency graph is set up (including the root package set).
In the second pass artifical dependencies between the dependent package sets of the root package set are inserted:
These can create cycles which are actually not there in reality thus preventing the update.
Is this intended behaviour?

@m0rsch1
Copy link
Author

m0rsch1 commented Feb 17, 2022

@2maz @doudou

@doudou
Copy link
Member

doudou commented Feb 17, 2022

Hi @m0rsch1, could you make a reproducing test case, possibly with empty package sets, that reproduce the dependencies you have setup on your system ? That would be great.

The additional dependencies are there to enforce the manifest order to avoid confusions (why is package X which is after Y in the manifest loaded before ?). We do detect cycles and should be producing a proper error when they happen.

@m0rsch1
Copy link
Author

m0rsch1 commented Feb 18, 2022

@doudou I have traced the OOM to @dag.cycles so this specific error has been found.
Then it will raise the exception that a cycle has been detected.

It is another issue to setup a correct dependency graph (who is importing who) but then introducing artificial dependencies to enforce an ordering ... This will create cycles which have not been there and will break an otherwise proper setup.

I think that enforcing import order can not be mixed with dependency resolution. But maybe I am wrong. Anyways it's a different issue.

@planthaber
Copy link
Member

You can reproduce it when putting a package_set into the autoproj/manifest files after it is imported by another package set:

rock-core/rock-package_set imports rock-core/package_set in it's source.yml:

When you use this order in manifest file order you get the infinite loop resulting in the OOM error:

package_sets:
   - github: rock-core/rock-package_set
   - github: rock-core/package_set

where this works fine:

package_sets:
   - github: rock-core/package_set
   - github: rock-core/rock-package_set

@moooeeeep
Copy link
Contributor

moooeeeep commented Feb 18, 2022

Seems to be related to this issue then: #359

@doudou
Copy link
Member

doudou commented Feb 24, 2022

#371 fixes the reporting (the graph algorithm was never at fault, it was the formatting of the error, which displayed the objects in all their naked-ness)

@doudou
Copy link
Member

doudou commented Feb 24, 2022

For what it's worth, I never had an OOM condition doing this. Which version of ruby / amount of RAM were you running on ?

@m0rsch1
Copy link
Author

m0rsch1 commented Feb 24, 2022

ruby 2.7.0
RAM: 8GB

@m0rsch1
Copy link
Author

m0rsch1 commented Feb 24, 2022

#371 fixes the reporting (the graph algorithm was never at fault, it was the formatting of the error, which displayed the objects in all their naked-ness)

That seems to be the case. However, the algorithm to resolve the cycles seems to be inefficient (O(N^4))
as the following comment in mutable.rb in the RGL library suggests:

# Returns an array of all minimum cycles in a graph
#
# This is not an efficient implementation O(n^4) and could
# be done using Minimum Spanning Trees. Hint. Hint.
#
def cycles

@doudou
Copy link
Member

doudou commented Feb 25, 2022

I'm not sure what to do with that comment to be honest. Are you suggesting that the N^4 caused the OOM ? (I can assure you that the issue was probably the creation of a gigantic string). Or that this will cause performance issues in the long run ?

I'll go with the second one ... because it would be a good teachable moment.

We all know the "premature optimization is the root of all evil" mantra. Now that we have established that the worst case grows real bad, the question should be "how does it perform with a reasonable working set"

So, 20 package sets and 50 dependencies (not counting the ones that exist because of the manifest) is solved in under a second, with a few thousand cycles. That's so much bigger than a reasonable working set that I think I'll stick with the already integrated N^4 algorithm.

Moreover, #cycles is called only if the graph has cycles, which is determined with a topsort. Which is linear in the number of vertices and edges.

In conclusion, for this to perform well, we would need a rather humongous set of package sets and inter-dependencies, full of cycles. And one will have had to manage to create this while never running autoproj, since autoproj will refuse to work with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants