Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve graceful shutdown and surrounding cleanup #35

Merged
merged 8 commits into from
Sep 21, 2018

Conversation

michaelweiser
Copy link
Contributor

I was trying to figure out why Peekaboo always hung on shutdown and needed kill -9 to get rid of. This turned out to be the blocking input loop of the debugger.

While looking into this I noticed that embedded Cuckoo would tear down Peekaboo using sys.exit() without any shutdown if anything happened to the Cuckoo child. I changed this to an orderly shutdown interacting gracefully with an improved signal handler. We should now be all set to consider if we should actually respawn Cuckoo if it dies.

From there on, the situation degraded into basically an arbitrary set of cleanups, mostly driven by a desire to get rid of global objects and reestablish clear interfaces and relationships between different parts of the code. First victims are the ConnectionMap, JobQueue and _config global Singletons.

One particular showcase how their availability is seducing is the OneAnalysis plugin which from inside the ruleset detects (at high computational cost) that identical samples are currently in flight via the ConnectionMap, dropping the current sample from processing if so. Upon finishing of each sample it would reevaluate if identical samples were still lingering in the ConnectionMap marked as pending and resubmit them to the JobQueue. I believe this should be a function of the JobQueue itself and the replacement implementation does not need to rely on the ConnectionMap at all, instead introducing an idea of samples being in-flight and done to the JobQueue.

Also, I noticed some needless complexity in the Cuckoo interface, particularly the report resubmit and report acquisition and streamlined those.

The individual commits give additional details and should be self-contained and -explanatory, although somewhat interleaved in area of improvement. Some review of this, particularly reasoning and back story of some of the criticised constructs, would be highly appreciated.

Change debugger to polling loop so it can notice shutdown requests from
the main thread. Make the main thread notify the debugger to shut down
when quitting so shutdown completes instead of hanging indefinitely. Do
not use fileinput because it blocks and might potentially read input
from command line arguments as well. Make the debugger correctly revert
stdout and stderr to their previous values.
Introduce an explicit submit backlog replacing the OneAnalysis plugin.

This backlog still has the task of avoiding analysing the same sample
multiple times in parallel. Instead it keeps duplicate samples in the
backlog until one analysis has completed. Then it submits the others in
the expectation of them being recognised as known and being analysed
much faster and without a roundtrip into cuckoo.

The explicit backlog avoids one instance of resubmission of samples to
the job queue from within the ruleset. It also introduces a first idea
of a sample state which could possibly lead to a more explicit sample
state handling in general. As a side effect a lot of expensive
sequential searching in the connection map is avoided at the price of a
hash and a lock keeping reference of in-flight samples in the job queue.

Remove dead code from ConnectionMap.
Refactor virtually identical code into the Cuckoo base class. Give the
CuckooServer twisted reactor protocol a reference to its owning cuckoo
object so it can call back into it.
Convert the JobQueue and ConnectionMap global Singletons to standard
classes of which we create one instance each and hand them as references
to all other objects employing them. The main reason is to highlight
architectural breakage and impose serious pain on any programmer intent
on creating any. As a minor side-effect this avoids importing the
modules everywhere because user objects can just use them without
knowing anything about them other than what method to call.

As specific improvement, move the logic to create the workers into the
JobQueue now that we govern the time when it's instantiated.
Enhance the shutdown mechanisms in various ways: First, make the signal
handler do nothing but call back into some registered objects interested
in shutdown requests. They in turn also do nothing but remember that a
shutdown action is requested in keeping with the recommendation that a
signal handler should not do lengthy work.

Move the job queue and worker shutdown logic into the job queue. With
this change the job queue is now totally governing the worker life cycle
from creation to scheduling to shutdown.

The cuckoo classes are extended as well to notice shutdown requests.
This is particularly beneficial for embedded mode because before it
would simply make the process exit on any error without giving it any
chance for cleanup. Also, looking into this releaved that twisted is
by default also installing signal handlers for SIGINT, SIGTERM and
SIGCHLD potentially interfering in unforeseen ways and causing confusion
when debugging problems.
Move acquiring the cuckoo report from the CuckooReport into the Cuckoo
classes. This eliminates a need for explicitly distinguishing the mode
again. Incidentally this fixes that the mode wasn't switched in api mode
and CuckooReport would still access the files in the storage directory
instead of downloading the report via the API.

Do not copy the files but reserialise the loaded report as json. Drop
copying the HTML report for now. If we need it, it can be downloaded via
the REST API as well. The problem here is whether we would load it into
memory or create a temporary directory for it. Both options aren't all
that appealing.
Having a global configuration object accessible from everywhere obscures
who is using which configuration data and what is actually configurable
and what isn't. Addtionally, in the case of the database connection and
cuckoo wrapper object it was abused as basically a global variable,
allowing arbitrary objects to be accessible everywhere, obscuring
architectural boundaries.

Remove the global config object. Have just one instance in the main
daemon. Do not hand it to anyone as a whole. Instead hand individual
configuration items as parameters to objects via their constructor,
overriding their internal defaults, thus providing a very clear and
visible documentation of what is adjustable from the outside and what
isn't.

As with the global Singletons it is the declared intention of this
change to highlight architectural breakage by imposing pain on the
programmer wishing to access some config item in the very last corner of
the code base.

One special case is currently the Sample class which needs so many
parameters that we resort to basically a mini Sample-Configuration class
called SampleFactory holding all the items needed by but identical for
all samples. The goal here is to eventually simplify this or refactor at
least some of those out of Sample.
Make rule functions check and possibly warn if their config doesn't
exist. Alternatively they can choose to fall back on defaults.

Make PeekabooRulesetConfiguration actually provide some helpful
abstraction of the ruleset configuration.
@michaelweiser michaelweiser self-assigned this Sep 6, 2018
@michaelweiser michaelweiser added this to the 1.7 milestone Sep 6, 2018
Copy link
Member

@Jack28 Jack28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f92958b
I like what you did there and I agree on the idea to simply restart cuckoo a number of times before giving up

Copy link
Member

@Jack28 Jack28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

705de17

The highest priority for this feature was to (at all cost) avoid mixing data or allowing clues about other samples/mails.

@Jack28
Copy link
Member

Jack28 commented Sep 6, 2018

I will perform some more tests and get back after

Copy link

@SebastianDeiss SebastianDeiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Jack28
Copy link
Member

Jack28 commented Sep 18, 2018

LGTM

@michaelweiser michaelweiser changed the title WIP: Improve graceful shutdown and surrounding cleanup Improve graceful shutdown and surrounding cleanup Sep 21, 2018
@michaelweiser michaelweiser merged commit 37811f6 into scVENUS:master Sep 21, 2018
@michaelweiser michaelweiser modified the milestones: 1.7, 1.6.2 Nov 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants