Fetching contributors…
Cannot retrieve contributors at this time
326 lines (246 sloc) 13.5 KB
= Mozpool API Documentation =
== REST API ==
Mozpool exposes a simple REST API via HTTP.
All resources are accessed under paths starting with /api/.
=== MozPool ===
This is the central, public interface to request, return, and operate on
devices.
==== Top-Level ====
/api/version/
* GET returns a JSON response with a 'version' key giving the Mozpool version.
==== Devices ====
/api/device/list/
* GET returns a JSON response body whose "devices" key
contains an array of the names of devices known to the system.
Device names can be passed as the id in the following device APIs.
/api/device/list?details=1
* GET returns a JSON response body whose "devices" key
contains an array of objects, each representing a single device.
The objects have keys id, name, fqdn, inventory_id, mac_address,
imaging_server, relay_info, comments, environment, image, last_pxe_config,
and request_id.
/api/device/{id}/request/
* POST requests the given device. {id} may be "any" to let MozPool choose an
unassigned device. The body must be a JSON object with at least the keys
"requester", "duration", and "image". The value for "requester" takes an
email address, for human users, or a hostname, for machine users. "duration"
must be a value, in seconds, of the duration of the request (which can be
renewed; see below).
"image" specifies low-level configuration that should be done on the device
by mozpool. The supported images can be obtained with the /image/list/ API
call, as documented below in the "Information" section. Some image types
will require additional parameters. Currently the only supported value is
"b2g", for which a "b2gbase" key must also be present. The value of
"b2gbase" must be a URL to a b2g build directory containing boot, system,
and userdata tarballs. If supplied, the "environment" key limits
the available devices to those in the given environment; the default is
'any', which can also be supplied explicitly.
If successful, returns 200 OK with a JSON object with the key "request".
The value of "request" is an object detailing the request, with the keys
"assigned_device" (which is blank if mozpool is still attempting to find
a device, "assignee", "boot_config", "expires", "id", "requested_device",
and "url". The "url" attribute contains a partial URL
for the request object and should be used in request calls, as detailed
below. If 'any' device was requested, always returns 200 OK, since it will
retry a few times if no devices are free. If a specific device is requested
but is already assigned, returns 409 Conflict; otherwise, returns 200 OK.
If a 200 OK code is returned, the client should then poll for the request's
state (using the value of request["url"] returned in the JSON object with
"status/" appended. See below for a description of the request states.
==== Requests ====
/api/request/list/[?include_closed=1]
* GET returns a JSON response body whose "requests" key contains an array of
objects representing all current requests. The objects have the keys id,
assignee, assigned_device, boot_config, device_status, expires,
imaging_server, requested_device, and state. "assigned_device" and
"device_status" will be blank if no suitable free device has been found.
"expires" is given in UTC. By default, closed requests are omitted. They
can be included by giving the "include_closed" argument (with any value).
Once a request is fulfilled using the "request" API above, all further
actions related to the requested device should be done using that URL, which
includes up to "/api/request/{id}/". This ensures that only one server
handles any given request. Attempts to access that request ID on a different
server will result in a 302 Found redirection to the correct server.
The full paths of request APIs are presented below for clarity.
Note that a request will be automatically terminated once it expires. The
"renew" call should be used to extend the request lifetime.
/api/request/{id}/status/
* GET returns a JSON response body with keys "log" and "state". Log objects
contain "message", "source", and "timestamp" keys. "state" is the name of
the current state, "ready" being the state in which it is safe to use the
device.
/api/request/{id}/details/
* GET returns a JSON response body whose "request" key contains an object
representing the given request with the keys id, device_id, assignee,
expires, and status. The expires field is given as an ISO-formatted time.
/api/request/{id}/renew/
* POST requests that the request's lifetime be updated. The request body
should be a JSON object with the key "duration", the value of which is the
*new* remaining time, in seconds, of the request. Returns 204 No Content.
/api/request/{id}/return/
* POST returns the device to the pool and deletes the request. Returns
204 No Content.
/api/request/{id}/event/{event}/
* POST to communicate a state-machine event to this request. This is only
used for internal communication in the current version of Mozpool. The
request body contains a JSON object which is transmitted to the state
machine's event method. Most methods ignore their arguments.
* GET is identical to a POST with an empty request body.
200 OK is returned.
=== LifeGuard ===
These are requests to a particular LifeGuard server to start low-level BMM
operations. These should *not* be called directly by anything other than
a MozPool server or the device itself.
/api/device/{id}/event/{event}
* POST to communicate a state-machine event to this device. This may come
from external sources, a higher level (mozpool), or from the device itself
to indicate its state has changed. The request body contains a JSON object
which is transmitted to the state machine's event method. Most methods
ignore their arguments.
* GET is identical to a POST with an empty request body.
200 OK is returned.
/api/device/{id}/state-change/{old_state}/to/{new_state}/
* POST to conditionally set the lifeguard state of a device from old_state to
new_state. If the current state is not old_state, the request will fail.
The POST body is ignored.
200 OK is returned on success; on failure, 409 Conflict.
/api/device/{id}/status/
* GET returns a JSON response body whose "state" key contains
a short string describing the current state of the device, and whose "log"
key contains an array of recent log entries for the device.
/api/device/{id}/state/
* GET returns a JSON response similar to `/api/device/{id}/status/`, but
without the `logs` key.
/api/device/{id}/state/?cache=1
* Same as `/api/device/{id}/state/`, but cached (on the order of seconds).
This is intended for use by monitoring tools like Nagios to avoid pounding
the backend database.
=== Black Mobile Magic ===
Black Mobile Magic handles the hardware directly: power control, network boot
configuration, and pings.
==== Operations ====
These low-level BMM operations are useful for diagnostics and repairs, but
using them on devices that are managed by Lifeguard may cause undesirable
results, since lifeguard expects to be controlling the devices.
/api/device/{id}/power-cycle/
* POST to initiate a power-cycle of this device. The POST body is a JSON object,
with optional keys `pxe_config` and `boot_config`. If `pxe_config` is
specified, then the device is configured to boot with that PXE config;
otherwise, the device boots from its internal storage. If `boot_config` is
supplied (as a string), it is stored for later use by the device via
`/api/device/{id}/bootconfig/`.
/api/device/{id}/power-off/
* GET to initiate a power-off of this device. Use the power-cycle API to
turn power back on.
/api/device/{id}/clear-pxe/
* POST to clear the PXE configuration for the device. Call this after a
`power_cycle` operation with a `pxe_config` argument has been successful, so
that any subsequent device-initiated reboots will not PXE boot.
/api/device/{id}/ping/
* GET to ping this device. Returns a JSON object with a `success` key, and
value true or false. The ping happens synchronously, and takes around a
half-second.
/api/relay/{id}/test/
* GET to test the two way comms of this relay board. Returns a JSON object with a `success` key, and
value true or false. The test happens synchronously per relay board, and times out after about 10 secs.
==== Information ====
/api/device/{id}/log/
* GET to get a list of all log lines for this device. The return value has
a 'log' key containing a list of objects representing log lines.
If the query parameter 'timeperiod={secs}' is added, only log entries from
the last {secs} seconds will be included. If the query parameter
'limit={count}' is added, only the last {count} log entries will be
included.
/api/device/{id}/bootconfig/
* GET to get the boot configuration string set for this device.
/api/device/{id}/set-comments/
* POST to set the comments for a device. The body should contain a 'comments'
key.
/api/device/{id}/set-environment/
* POST to set the environment for a device. The body should contain an
'environment' key.
/api/environment/list/
* GET to get a list of all environments containing one or more devices.
Returns an object with an 'environments' key.
/api/image/list/
* GET to get details on all images. Returns an object with an 'images'
key, whose value is a list of image objects. Each image object has
keys 'id', 'name', 'boot_config_keys', 'can_reuse', 'hidden', and
'has_sut_agent'.
==== PXE Configs ====
/api/bmm/pxe_config/list/
* GET returns a JSON response body whose "pxe_configs" key
contains an array of the names of boot images known to the system.
Bootimage names can be passed as the id in the following bootimage APIs.
With `?active_only=1` appended, this will return only active PXE configs.
/api/bmm/pxe_config/{id}/details/
* GET returns a JSON response body whose "details" key contains
an object that provides information about this PXE config.
The keys of this object are: "name", "version", "description" and
"content".
== Machine States ==
Mozpool models both devices and requests as state machines and exposes their
current state via the REST API above. The states for each state machine are
also considered part of the API.
Machines in an unrecognized state should be treated as in an undefined state.
These states are transient, and only relevant internally. However, the states
documened here can provide useful information, and any change to their meaning
from version to version will be called out specifically and treated as a
compatibility-breaking change.
== Mozpool ==
Each request is modeled as a state machine by Mozpool. Each request will go
through a sequence of transient states before entering either the `ready`
state or one of the failure states. From the `ready` state it will go to the
`closed` state when it is explicitly returned or when the duration expires.
If the duration expires before the device becomes ready, then it may never
enter the `ready` state.
After submitting a request, clients should poll the request status waiting for
`ready` or `closed` (in case a request expires before it is ready) or one of
the failed states below.
=== Failed States ===
`failed_device_not_found`
* No suitable, unassigned device could be found. This indicates that after
trying several devices, Mozpool could not satisfy the request.
`failed_bad_image`
* The requested image appears to be bad. This state indicates that attempts
to install the image failed in a way that implicates the image, rather than
the device. It is probably fruitless to attempt to continue to try to
install the image on other devices.
`failed_bad_device`
* The requested device appears to be bad. This state only occurs for requests
for a specific device.
`failed_device_busy`
* The specific requested device is not available. While Mozpool will retry
the device several times, it eventually gives up.
== Lifeguard ==
Each device's state is modeled separately. The normal states below represent
phases of normal operation, while the failed states represent permanent
failures that must be remedied by a human.
=== Normal States ===
`new`
* Newly-added devices show up in this state. Mozpool will not allocate these.
They will automatically be self-tested, and either enter the 'free' state or
be marked failed.
`ready`
* The device is functional and not being manipulated by Lifeguard. It may or
may not be assigned to a request. The Mozpool layer selects un-assigned
devices in the ready state to satisfy new requests.
`maintenance_mode`
* The device is booted into maintenance mode, with an open SSH shell allowing
user maintenance. Devices in this state can be re-imaged or power-cycled to
return to the ready state.
`locked_out`
* Lifeguard will not touch devices in this state. It is used for devices
which are in Lifeguard's list, but are managed by other mechanisms. The
'please' requests don't work in this state - the device must first be forced
back to another normal state.
=== Failed States ===
Sometimes devices go bad. Bad device! Lifeguard will generally determine this
when a device repeatedly misbehaves. When this occurs, it will assign the
device to a failed state.
These states can be recognized by the `failed_` prefix. Clients should not
attempt to interpret the remainder of the state name, as that may change from
version to version of Mozpool.
A device in a failed state will remain in that state until it is addressed by a
human. To detect a device recovery, poll the device's status (using a long
timer) and wait for it to reach the `ready` state.