Skip to content

Commit

Permalink
Cleanup README and documentation. Add docs for Prometheus Metrics. (#422
Browse files Browse the repository at this point in the history
)

* Cleanup README and documentation. Add docs for Prometheus Metrics.

* Fix link

* rename OpenWhisk to IBM Cloud

* cluster paragraph first

* fix typo in clustering diagram

* add info about CE
  • Loading branch information
mthenw committed May 14, 2018
1 parent 517f1ec commit e5138aa
Show file tree
Hide file tree
Showing 8 changed files with 224 additions and 179 deletions.
202 changes: 49 additions & 153 deletions README.md

Large diffs are not rendered by default.

13 changes: 11 additions & 2 deletions docs/api.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# API documentation
# API

This document contains the API documentation for the Events API and the Configuration API in the Event Gateway. You can also find links to OpenAPI specs for these APIs.
The Event Gateway has two APIs: the Configuration API for registering functions and subscriptions, and the runtime Events API for sending events into the Event Gateway.
This document contains the API documentation for both Events and Configuration APIs. You can also find links to OpenAPI specs for these APIs.

## Contents

Expand Down Expand Up @@ -463,6 +464,14 @@ JSON object:
* `path` - `string` - optional, in case of `http` event, path that accepts requests
* `cors` - `object` - optional, in case of `http` event, CORS configuration

### Prometheus Metrics

Endpoint exposing [Prometheus metrics](./prometheus-metrics.md).

**Endpoint**

`GET <Configuration API URL>/metrics`

### Status

Dummy endpoint (always returning `200 OK` status code) for checking if the event gateway instance is running.
Expand Down
33 changes: 33 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Architecture

```
┌──────────────┐
│ │
│ Client │
│ │
└──────────────┘
Event
┌───────────────────────────────────────────────────────────┐
│ │
│ Event Gateway Cluster │
│ │
└───────────────────────────────────────────────────────────┘
┌─────────────────────────────┼─────────────────────────────┐
│ │ │
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ AWS Lambda │ │ Google Cloud │ │Azure Function │
│ Function │ │ Function │ │ │
│ │ │ │ │ Region: │
│ Region: │ │ Region: │ │ West US │
│ us-east-1 │ │ us-central1 │ │ │
└───────────────┘ └───────────────┘ └───────────────┘
```
51 changes: 51 additions & 0 deletions docs/clustering.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Clustering

The Event Gateway is a horizontally scalable system. It can be scaled by adding instances to the cluster. A cluster is
a group of instances sharing the same database. A cluster can be created in one cloud region, across multiple regions,
across multiple cloud provider or even in both cloud and on-premise data centers.

The Event Gateway instances use a strongly consistent, subscribable DB (initially [etcd](https://coreos.com/etcd),
with support for Consul, and Zookeeper planned) to store and broadcast configuration. The instances locally
cache configuration used to drive low-latency event routing. The instance local cache is built asynchronously based on
events from backing DB.

The Event Gateway is a stateless service and there is no direct communication between different instances. All
configuration data is shared using backing DB. If the instance from region 1 needs to call a function from region 2 the
invocation is not routed through the instance in region 2. The instance from region 1 invokes the function from region 2
directly.

```
┌─────────────────────────────────────────────Event Gateway Cluster──────────────────────────────────────────────┐
│ │
│ │
│ Cloud Region 1───────┐ │
│ │ │ │
│ │ ┌─────────────┐ │ │
│ │ │ │ │ │
│ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─│─ ▶│etcd cluster │◀ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ │ │ │ │ │ │
│ │ │ └─────────────┘ │ │
│ │ ▲ │ │ │
│ │ │ │ │
│ Cloud Region 2───────┐ │ │ │ Cloud Region 3───────┐ │
│ │ │ │ │ │ │ │ │
│ │ ▼ │ │ ▼ │ │ ▼ │ │
│ │ ┌───────────────┐ │ │ ┌──────────────┐ │ │ ┌──────────────┐ │ │
│ │ │ │ │ │ │ │ │ │ │ │ │ │
│ │ │ Event Gateway │ │ │ │Event Gateway │ │ │ │Event Gateway │ │ │
│ │ │ instance │◀┼──────────┐ │ │ instance │◀─┼──────────┐ │ │ instance │ │ │
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │
│ │ └───────────────┘ │ │ │ └──────────────┘ │ │ │ └──────────────┘ │ │
│ │ ▲ │ │ │ ▲ │ │ │ ▲ │ │
│ │ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │ │
│ │ ▼ │ │ │ ▼ │ │ │ ▼ │ │
│ │ ┌───┐ │ │ │ ┌───┐ │ │ │ ┌───┐ │ │
│ │ │ λ ├┐ │ └───┼───────▶│ λ ├┐ │ └────┼───────▶│ λ ├┐ │ │
│ │ └┬──┘│ │ │ └┬──┘│ │ │ └┬──┘│ │ │
│ │ └───┘ │ │ └───┘ │ │ └───┘ │ │
│ └────────────────────┘ └────────────────────┘ └────────────────────┘ │
│ │
│ │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```
24 changes: 0 additions & 24 deletions docs/plugins.md

This file was deleted.

22 changes: 22 additions & 0 deletions docs/prometheus-metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Prometheus Metrics

Both Events and Configuration API exposes Prometheus metrics. The metrics are accesible via `/v1/metrics` endpoint.

## Events API Metrics

| Metric Name | Description | Type | Labels |
| --------------------------------- | ------------------------------------------------------------ | ------- | ---------------- |
| `gateway_events_received_total` | Total of events received. | Counter | `space`, `type` |
| `gateway_events_processed_total` | Total of processed events. | Counter | `space`, `type` |
| `gateway_events_dropped_total` | Total of events dropped due to insufficient processing power. | Counter | `space`, `type` |
| `gateway_events_backlog` | Gauge of asynchronous events count waiting to be processed. | Gauge | |
| `gateway_events_custom_processing_seconds` | Bucketed histogram of processing duration of an event. From receiving the asynchronous custom event to calling a function. | Histogram | |

## Configuration API Metrics

| Metric Name | Description | Type | Labels |
| ----------------------------------------- | ------------------------------------------------------------ | --------- | --------------------------------- |
| `gateway_functions_total` | Gauge of registered functions count. | Gauge | `space` |
| `gateway_subscriptions_total` | Gauge of created subscriptions count. | Gauge | `space` |
| `gateway_config_requests_total` | Total of Config API requests. | Counter | `space`, `resource`, `operation` |
| `gateway_config_request_duration_seconds` | Bucketed histogram of request duration of Config API requests. | Histogram | |
11 changes: 11 additions & 0 deletions docs/reliability-guarantees.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Reliability Guarantees

## Events are not durable

The event received by Event Gateway is stored only in memory, it's not persisted to disk before processing. This means that in case of hardware failure or software crash the event may not be delivered to the subscriber. For a synchronous subscription (`http` or `invoke` event) it can manifest as error message returned to the requester. For asynchronous custom event with multiple subscribers it means that the event may not be delivered to all of the subscribers.

## Events are delivered _at most once_

Event Gateway attempts delivery fulfillment for an event only once and consequently any event received successfully by the Event Gateway is guaranteed to be received by the subscriber _at most once_. That said, the nature of Event Gateway provider implementation could result in retries under specific circumstances, but these should not cause delivering the same event multiple times. For example, Providers for AWS Services that use the AWS SDK are subject to auto retry logic that's built into the SDK ([AWS documentation on API retries](https://docs.aws.amazon.com/general/latest/gr/api-retries.html)).

AWS Lambda provider uses `RequestResponse` invocation type which means that retry logic for asynchronous AWS events doesn't apply here. Among others it means, that failed deliveries of custom events are not sent to DLQ. Please find more information in [Understanding Retry Behavior](https://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html), "Synchronous invocation" section.
47 changes: 47 additions & 0 deletions docs/system-events-and-plugin-system.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# System Events and Plugin System

## System Events

System Events are special type of events emitted by the Event Gateway instance internally. They are emitted on each stage
of event processing flow starting from receiving event to function invocation end. Those events are:

* `gateway.event.received` - the event is emitted when an event was received by Events API. Data fields:
* `event` - event payload
* `path` - Events API path
* `headers` - HTTP request headers
* `gateway.function.invoking` - the event emitted before invoking a function. Data fields:
* `event` - event payload
* `functionId` - registered function ID
* `gateway.function.invoked` - the event emitted after successful function invocation. Data fields:
* `event` - event payload
* `functionId` - registered function ID
* `result` - function response
* `gateway.function.invocationFailed` - the event emitted after failed function invocation. Data fields:
* `event` - event payload
* `functionId` - registered function ID
* `error` - invocation error

## Plugin System

The Event Gateway is built with extensibility in mind. Built-in plugin system allows reacting on system events and
manipulate how an event is processed through the Event Gateway.

_Current implementation supports plugins written only in Golang. We plan to support other languages in the future._

Plugin system is based on [go-plugin](https://github.com/hashicorp/go-plugin). A plugin needs to implement the following
interface:

```go
type Reacter interface {
Subscriptions() []Subscription
React(event event.Event) error
}
```

`Subscription` model indicates the event that plugin subscribes to and the subscription type. A subscription can be either
sync or async. Sync (blocking) subscription means that in case of error returned from `React` method the event won't be
further processed by the Event Gateway.

`React` method is called for every system event that plugin subscribed to.

For more details, see [the example plugin](../plugin/example).

0 comments on commit e5138aa

Please sign in to comment.