New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[resotocore] Update process interrupted
with large graph updates
#636
Comments
Cool 👍 |
Thanks a lot! That seemed to work... somewhat. We're no longer running into the 30 second timeout. Closing this issue. However, we are now be back at getting the import process OOM killed again. Found the OOM message in
For completeness, here is the log from how an OOM kill looks from the perspective of user land logs. Not immediately helpful as-is.
|
We do run some feature flags that were only introduced with a later version. Change we introduced implicitly with moving to v2.0.0a16 (via
|
Description
When collecting a large number of accounts *1 the graph merge fails with any version newer than
v2.0.0a13
.Using #608 (on a
v2.0.0a15
image) to dump the collected graph and model:Trying the steps @lloesche documented, I was unable to reproduce the behaviour locally (macOS, current
main
, graphdb in Docker). The graphs imports fine, albeit very slowly (which might just be due to the graphdb being undersized etc).Trying to understand whats going on from the provided stack trace:
19:49:07
19:49:40
- 33 seconds later.This kind of leads me to believe that
Queue#get(True, 30)
hits its 30 second timeout.Without diving too deep yet, the underlying locking mechanism seems like it might be platform/os dependent and therefore explains why I'm not seeing the same behaviour locally.
Unfortunately I wont be able to supply the collected graph for debugging.
Version
v2.0.0a14+
Environment
Linux, Resoto one-for-all container image running on GKE
Steps to Reproduce
Logs
Additional Context
*1 - about 2000-ish GCP projects, only few select resource types resulting in a 306M resoto-graph .ndjson file
The text was updated successfully, but these errors were encountered: