Permalink
333 lines (283 sloc) 14 KB

MIG Agent Architecture

1   Initialization process

The agent tries to be as autonomous as possible. One of the goals is to minimize any sort of reliance on configuration management tools to install a working agent. Therefore, the agent generally attempts to install itself as a service on the system when it is executed and also supports optional automatic upgrades via the mig-loader companion program.

As a portable binary, the agent needs to detect the type of operating system and init method that is used by an endpoint. Depending on the endpoint, different initialization methods are used.

If the agent installservice configuration flag is set in the configuration file, the agent will make the neccessary changes to the platform to install itself as a service. This is supported on Linux (systemd, upstart, SV init), Darwin (launchd), and Windows (SMC).

As such, in this scenario executing the agent directly will cause the agent to detect if an existing mig-agent service is present, adding it if missing, starting the service, and then the process executed will exit leaving the daemonized mig-agent process running under the service manager.

1.1   Registration process

The initialization process goes through several environment detection steps which are used to select the proper init method. Once started, the agent will send a heartbeat to the public relay, and also store that heartbeat in its run directory. The location of the run directory is platform specific.

  • windows: C:\mig
  • darwin: /Library/Preferences/mig/
  • linux: /var/lib/mig/

Below is a sample heartbeat message from a linux agent stored in /var/lib/mig/mig-agent.ok.

{
        "destructiontime": "0001-01-01T00:00:00Z",
        "environment": {
                "arch": "amd64",
                "ident": "Red Hat Enterprise Linux Server release 6.5 (Santiago)",
                "init": "upstart"
        },
        "heartbeatts": "2014-07-31T14:00:20.00442837-07:00",
        "name": "someserver.example.net",
        "os": "linux",
        "pid": 26256,
        "queueloc": "linux.someserver.example.net.5hsa811oda",
        "starttime": "2014-07-30T21:34:48.525449401-07:00",
        "version": "201407310027+bcbdd94.prod"
}

The agent sends information about the OS configuration and it's environment to the scheduler periodically. This includes information like the hostname of the system it is running on, IP addresses assigned, AWS instance related information, and others. It's possible on an endpoint this changes while the agent is running. For example, a new IP address could be assigned via DHCP. The agent periodically checks the system; if changes to the environment are detected the heartbeat message will automatically be updated to include those changes. The frequency environment checks occur can be controlled through the refreshenv configuration option in the agent configuration file, or the REFRESHENV variable in the agent built-in configuration.

1.2   Check-In mode

With infrastructure where running the agent as a permanent process is not acceptable, it is possible to run the agent as a cron job. By starting the agent with the flag -m agent-checkin, the agent will connect to the configured relay, retrieve and run outstanding commands, and exit after 10 seconds of inactivity.

Check-in mode can also be used by enabling the checkin configuration value in the agent configuration file.

2   Communication with modules

Upon processing of an action, the scheduler will retrieve a list of agents to send the action to. One action is then derived into multiple commands and sent to agents.

An agent receives a command from the scheduler on its personal AMQP queue (1). It parses the command (2) and extracts all of the operations to perform. Operations are passed to modules and executed in parallel (3). Rather than maintaining a state of the running command, the agent create a goroutine and a channel tasked with receiving the results from the modules. Each module publishes its results inside that channel (4). The result parsing goroutine receives them, and when it has received all of them, populates the results (5) array of the command with the results from each module, and sends the command back to the scheduler(6).

The modules while running are executed as a child process, communicating with the agent over a pipe.

When the agent is done running the command, both the channel and the goroutine are destroyed.

             +-------+   [ - - - - - - A G E N T - - - - - - - - - - - - ]
             |command|+---->(listener)
             +-------+          |(2)
               ^                V
               |(1)         (parser)
               |               +       [ m o d u l e s ]
+---------+    |            (3)|----------> op1 +----------------+
|SCHEDULER|+---+               |------------> op2 +--------------|
|         |<---+               |--------------> op3 +------------|
+---------+    |               +----------------> op4 +----------+
               |                                                 V(4)
               |(6)                                         (receiver)
               |                                                 |
               |                                                 V(5)
               +                                             (publisher)
             +-------+                                           /
             |results|<-----------------------------------------'
             +-------+

The command received by the agent is composed of a copy of the action described previously, but signed with the private key of a trusted investigator. It also contains additional parameters that are specific to the targeted agent, such as command processing timestamps, name of the agent queue on the message broker, action and command unique IDs and status and results of the command. Below is a command derived from the root password checking action, and ran on the host named 'host1.example.net'.

{
  "id": 1.427392971126604e+18,
  "action": { ... SIGNED COPY OF THE ACTION ... },
  "agent": {
        "id": 1.4271760437936648e+18,
        "name": "host1.example.net",
        "queueloc": "linux.host1.example.net.981alsd19aos1984",
        "mode": "daemon",
        "version": "20150324+0d0f88c.prod"
  },
  "status": "success",
  "results": [
        {
          "foundanything": true,
          "success": true,
          "elements": {
                "root_passwd_hashed_or_disabled": [
                  {
                        "file": "/etc/shadow",
                        "fileinfo": {
                          "lastmodified": "2015-02-07 01:51:07.17850601 +0000 UTC",
                          "mode": "----------",
                          "size": 1684
                        },
                        "search": {
                          "contents": [
                                "root:(\\*|!|\\$(1|2a|5|6)\\$).+"
                          ],
                          "options": {
                                "matchall": false,
                                "matchlimit": 0,
                                "maxdepth": 0
                          },
                          "paths": [
                                "/etc"
                          ]
                        }
                  }
                ]
          },
          "statistics": {
                "exectime": "2.017849ms",
                "filescount": 1,
                "openfailed": 0,
                "totalhits": 1
          },
          "errors": null
        }
  ],
  "starttime": "2015-03-26T18:02:51.126605Z",
  "finishtime": "2015-03-26T18:03:00.671232Z"
}

The results of the command show that the file '/etc/shadow' has matched, and thus "FoundAnything" returned "True".

The invocation of the file module has completed successfully, which is represented by results->0->success=true. In our example, there is only one operation in the action->operations array, so only one result is present. When multiple operations are performed, each has its results listed in a corresponding entry of the results array (operations[0] is in results[0], operations[1] in results[1], etc...).

Finally, the agent has performed all operations in the operations array successfully, and returned **status=success**. Had a failure occurred in the agent, the returned status would be one of "failed", "timeout" or "cancelled".

2.1   Command expiration & timeouts

To prevent abuse of resources, agents will kill long-running modules after a given period of time. That timeout is can be configured in the agent configuration file using the moduletimeout option.

The timeout represents the maximum execution time of a single operation. If an action contains 3 operations, each operation gets its own timeout. But because operations run in parallel in the agent, the maximum runtime of an action should be very close to the value of moduletimeout.

In a typical deployment, it is safe to increase moduletimeout to allow for longer operations. A value of 20 minutes is usual. Make sure to fine tune this to your environment, and get the approval of your ops team because mig-agent may end up consuming resources (but never more than 50% of the cpu available on a system).

Oftentimes, an investigator will want a timeout that is much shorter than the value of moduletimeout. In the MIG command line, the flag -e controls the expiration. It defaults to 5 minutes but can be set to 30 seconds for simple investigations. When that happens, the agent will calculate an appropriate expiration for the operations being run. If the expiration set on the action is set to 30 seconds, the agent will kill operations that run for more than 30 seconds.

If the expiration is larger than the value of moduletimeout (for example, 2 hours), then moduletimeout is used. Setting a long expiration may be useful to allow agents that only check in periodically to pick up actions long after they are launched. This can be used to for example, create an action with a 24 hour validity time; when an agent comes online it will see receive the action and see that it is still valid, execute it using moduletimeout as the maximum timeout value, and return the results. This is useful to target an action at a group of agents that may not all be online at the same time.

2.2   Agent/Modules message format

The agent when running as a module accepts different classes of inputs on stdin, as one-line JSON objects. The most common one is the parameters class, but it could also receive a stop input that indicates that the module should stop its execution immediately. The format of module input messages is defined by modules.Message.

// Message defines the input messages received by modules.
type Message struct {
        Class      string      // represent the type of message being passed to the module
        Parameters interface{} // for `parameters` class, this interface contains the module parameters
}

const (
        MsgClassParameters string = "parameters"
        MsgClassStop       string = "stop"
        MsgClassPing       MessageClass = "ping"
        MsgClassLog        MessageClass = "log"
        MsgClassRegister   MessageClass = "register"
        MsgClassConfig     MessageClass = "config"
)

When the agent receives a command to pass to a module for execution, it extracts the operation parameters from Command.Action.Operations[N].Parameters and copies them into Message.Parameters. It then sets Message.Class to modules.MsgClassParameters, marshals the struct into JSON, and passes the resulting []byte to the module as an IO stream.

3   Agent upgrade process via mig-loader

MIG supports upgrading agents in the wild through the use of the companion program mig-loader. Using mig-loader is optional; you don't need to use mig-loader in your environment if you want to upgrade agents yourself.

The following is a high level diagram of how the loader interacts with the API and the agent during the upgrade process. Note this diagram focuses on the agent being upgraded, but it could be any file in the manifest such as the certificates, agent configuration, or loader. In all cases changes to anything will result in a respawn of any running agent by the loader.

/------ Endpoint ---------\
Agent                Loader              API
+---+                +----+             +--+
|                    |                     |
|                    | 1. request manifest |
|                    |-------------------->|------+
|                    |                     |      | 2. update loader
| 3. valid  +--------|                     |      | record in database
| manifest  |        |                     |<-----+
| sig?      +------->|                     |
|                    |                     |
| 4. does   +--------|                     |
| current   |        |                     |
| agent     |        |                     |
| match?    +------->|                     |
|                    |                     |
|                    | 5. fetch new agent  |
|                    |    or other files   |
|                    |    from manifest    |
|                    |    that dont match  |
|                    |-------------------->|
|                    |                     |
| 6. stage  +--------|                     |
| agent on  |        |                     |
| disk      +------->|                     |
|                    |                     |
| 7. agent  +--------|                     |
| SHA256    |        |                     |
| matches   |        |                     |
| manifest? +------->|                     |
|                    |                     |
|  8. install agent  |                     |
|<-------------------|                     |
|                    |                     |
|  9. stop old agent |                     |
|<-------------------|                     |
|                    |                     |
| 10. start new      |                     |
|<-------------------|                     |
|                    |                     |

For more information on how MIG loader can be used see the relevant documentation in MIG Loader.