-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve graceful shutdown and surrounding cleanup #35
Conversation
Change debugger to polling loop so it can notice shutdown requests from the main thread. Make the main thread notify the debugger to shut down when quitting so shutdown completes instead of hanging indefinitely. Do not use fileinput because it blocks and might potentially read input from command line arguments as well. Make the debugger correctly revert stdout and stderr to their previous values.
Introduce an explicit submit backlog replacing the OneAnalysis plugin. This backlog still has the task of avoiding analysing the same sample multiple times in parallel. Instead it keeps duplicate samples in the backlog until one analysis has completed. Then it submits the others in the expectation of them being recognised as known and being analysed much faster and without a roundtrip into cuckoo. The explicit backlog avoids one instance of resubmission of samples to the job queue from within the ruleset. It also introduces a first idea of a sample state which could possibly lead to a more explicit sample state handling in general. As a side effect a lot of expensive sequential searching in the connection map is avoided at the price of a hash and a lock keeping reference of in-flight samples in the job queue. Remove dead code from ConnectionMap.
Refactor virtually identical code into the Cuckoo base class. Give the CuckooServer twisted reactor protocol a reference to its owning cuckoo object so it can call back into it.
Convert the JobQueue and ConnectionMap global Singletons to standard classes of which we create one instance each and hand them as references to all other objects employing them. The main reason is to highlight architectural breakage and impose serious pain on any programmer intent on creating any. As a minor side-effect this avoids importing the modules everywhere because user objects can just use them without knowing anything about them other than what method to call. As specific improvement, move the logic to create the workers into the JobQueue now that we govern the time when it's instantiated.
Enhance the shutdown mechanisms in various ways: First, make the signal handler do nothing but call back into some registered objects interested in shutdown requests. They in turn also do nothing but remember that a shutdown action is requested in keeping with the recommendation that a signal handler should not do lengthy work. Move the job queue and worker shutdown logic into the job queue. With this change the job queue is now totally governing the worker life cycle from creation to scheduling to shutdown. The cuckoo classes are extended as well to notice shutdown requests. This is particularly beneficial for embedded mode because before it would simply make the process exit on any error without giving it any chance for cleanup. Also, looking into this releaved that twisted is by default also installing signal handlers for SIGINT, SIGTERM and SIGCHLD potentially interfering in unforeseen ways and causing confusion when debugging problems.
Move acquiring the cuckoo report from the CuckooReport into the Cuckoo classes. This eliminates a need for explicitly distinguishing the mode again. Incidentally this fixes that the mode wasn't switched in api mode and CuckooReport would still access the files in the storage directory instead of downloading the report via the API. Do not copy the files but reserialise the loaded report as json. Drop copying the HTML report for now. If we need it, it can be downloaded via the REST API as well. The problem here is whether we would load it into memory or create a temporary directory for it. Both options aren't all that appealing.
Having a global configuration object accessible from everywhere obscures who is using which configuration data and what is actually configurable and what isn't. Addtionally, in the case of the database connection and cuckoo wrapper object it was abused as basically a global variable, allowing arbitrary objects to be accessible everywhere, obscuring architectural boundaries. Remove the global config object. Have just one instance in the main daemon. Do not hand it to anyone as a whole. Instead hand individual configuration items as parameters to objects via their constructor, overriding their internal defaults, thus providing a very clear and visible documentation of what is adjustable from the outside and what isn't. As with the global Singletons it is the declared intention of this change to highlight architectural breakage by imposing pain on the programmer wishing to access some config item in the very last corner of the code base. One special case is currently the Sample class which needs so many parameters that we resort to basically a mini Sample-Configuration class called SampleFactory holding all the items needed by but identical for all samples. The goal here is to eventually simplify this or refactor at least some of those out of Sample.
Make rule functions check and possibly warn if their config doesn't exist. Alternatively they can choose to fall back on defaults. Make PeekabooRulesetConfiguration actually provide some helpful abstraction of the ruleset configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f92958b
I like what you did there and I agree on the idea to simply restart cuckoo a number of times before giving up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The highest priority for this feature was to (at all cost) avoid mixing data or allowing clues about other samples/mails.
I will perform some more tests and get back after |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM |
I was trying to figure out why Peekaboo always hung on shutdown and needed kill -9 to get rid of. This turned out to be the blocking input loop of the debugger.
While looking into this I noticed that embedded Cuckoo would tear down Peekaboo using sys.exit() without any shutdown if anything happened to the Cuckoo child. I changed this to an orderly shutdown interacting gracefully with an improved signal handler. We should now be all set to consider if we should actually respawn Cuckoo if it dies.
From there on, the situation degraded into basically an arbitrary set of cleanups, mostly driven by a desire to get rid of global objects and reestablish clear interfaces and relationships between different parts of the code. First victims are the ConnectionMap, JobQueue and _config global Singletons.
One particular showcase how their availability is seducing is the OneAnalysis plugin which from inside the ruleset detects (at high computational cost) that identical samples are currently in flight via the ConnectionMap, dropping the current sample from processing if so. Upon finishing of each sample it would reevaluate if identical samples were still lingering in the ConnectionMap marked as pending and resubmit them to the JobQueue. I believe this should be a function of the JobQueue itself and the replacement implementation does not need to rely on the ConnectionMap at all, instead introducing an idea of samples being in-flight and done to the JobQueue.
Also, I noticed some needless complexity in the Cuckoo interface, particularly the report resubmit and report acquisition and streamlined those.
The individual commits give additional details and should be self-contained and -explanatory, although somewhat interleaved in area of improvement. Some review of this, particularly reasoning and back story of some of the criticised constructs, would be highly appreciated.