Skip to content

Latest commit

 

History

History
573 lines (417 loc) · 24.5 KB

input-cel.asciidoc

File metadata and controls

573 lines (417 loc) · 24.5 KB

Common Expression Language input

experimental[]

CEL

Use the cel input to read messages from a file path or HTTP API with a variety of payloads using the Common Expression Language (CEL) and the mito CEL extension libraries.

CEL is a non-Turing complete language that can perform evaluation of expression in inputs, which can include file and API endpoints using the mito extension library. The cel input periodically runs a CEL program that is given an execution environment that may be configured by the user, and publishes the set of events that result from the program evaluation. Optionally the CEL program may return cursor states that will be provided to the next execution of the CEL program. The cursor states may be used to control the behaviour of the program.

This input supports:

  • Auth

    • Basic

    • OAuth2

  • Retrieval at a configurable interval

  • Pagination

  • Retries

  • Rate limiting

  • Proxying

Example configurations:

filebeat.inputs:
# Fetch your public IP every minute.
- type: cel
  interval: 1m
  resource.url: https://api.ipify.org/?format=json
  program: |
    bytes(get(state.url).Body).as(body, {
        "events": [body.decode_json()]
    })
filebeat.inputs:
- type: cel
  resource.url: http://localhost:9200/_search
  state:
    scroll: 5m
  program: |
    (
        !has(state.cursor) || !has(state.cursor.scroll_id) ?
            post(state.url+"?scroll=5m", "", "")
        :
            post(
                state.url+"/scroll?"+{"scroll_id": [state.cursor.scroll_id]}.format_query(),
                "application/json",
                {"scroll": state.scroll}.encode_json()
            )
    ).as(resp, bytes(resp.Body).decode_json().as(body, {
            "events": body.hits.hits,
            "cursor": {"scroll_id": body._scroll_id},
    }))

Execution

The execution environment provided for the input includes includes the function, macros and global variables provided by the mito and ext.Strings libraries. A single JSON object is provided as an input accessible through a state variable. state contains a string url field and may contain arbitrary other fields configured via the input’s state configuration. If the CEL program saves cursor states between executions of the program, the configured state.cursor value will be replaced by the saved cursor prior to execution.

On start the state is will be something like this:

{
    "url": ,
    "cursor": { ... },
    ...
}

The state.url field will be present and may be an HTTP end-point or a file path. It is the responsibility of the CEL program to handle removing the scheme from a file URL if it is present. The state.url field may be mutated during execution of the program, but the mutated state will not be persisted between restarts The state.url field must be present in the returned value to ensure that it is available in the next evaluation unless the program has the resource address hard-coded in or it is available from the cursor.

Additional fields may be present at the root of the object and if the program tolerates it, the cursor value may be absent. Only the cursor is persisted over restarts, but all fields in state are retained between iterations of the processing loop except for the produced events array, see below.

If the cursor is present the program should perform and process requests based on its value. If cursor is not present the program must have alternative logic to determine what requests to make.

After completion of a program’s execution it should return a single object with a structure looking like this:

{
    "events": [ <1>
        {...},
        ...
    ],
    "cursor": [ <2>
        {...},
        ...
    ],
    "url": ,
    "status_code": ,
    "header": ,
    "rate_limit": , <3>
    "want_more": false <4>
}
  1. The events field must be present, but may be empty or null. If it is not empty, it must only have objects as elements. The field should be an array, but in the case of an error condition in the CEL program it is acceptable to return a single object instead of an array; this will will be wrapped as an array for publication and an error will be logged.

  2. If cursor is present it must be either be a single object or an array with the same length as events; each element i of the cursor will be the details for obtaining the events at and beyond event i in the events array. If the cursor is a single object it is will be the details for obtaining events after the last event in the events array and will only be retained on successful publication of all the events in the events array.

  3. If rate_limit is present it must be a map with numeric fields rate and burst. The rate_limit field may also have a string error field and other fields which will be logged. If it has an error field, the rate and burst will not be used to set rate limit behavior. The Limit, and Okta Rate Limit policy and Draft Rate Limit policy documentation show how to construct this field.

  4. The evaluation is repeated with the new state, after removing the events field, if the "want_more" field is present and true, and a non-zero events array is returned.

The status_code, header and rate_limit values may be omitted if the program is not interacting with an HTTP API end-point and so will not be needed to contribute to program control.

Debug state logging

The CEL input will log the complete state after evaluation when logging at the DEBUG level. This will include any sensitive or secret information kept in the state object, and so DEBUG level logging should not be used in production when sensitive information is retained in the state object.

CEL extension libraries

As noted above the cel input provides function, macro and global variables to extend the language.

In addition to the extensions provided in the packages listed above, a global variable useragent is also provided which gives the user CEL program access to the {beatname_lc} user-agent string.

Additionally, it supports authentication via Basic auth, HTTP Headers or oauth2.

Example configurations with authentication:

filebeat.inputs:
- type: cel
  resource.url: http://localhost
  request.transforms:
    - set:
        target: header.Authorization
        value: 'Basic aGVsbG86d29ybGQ='
filebeat.inputs:
- type: cel
  auth.oauth2:
    client.id: 12345678901234567890abcdef
    client.secret: abcdef12345678901234567890
    token_url: http://localhost/oauth2/token
  resource.url: http://localhost
filebeat.inputs:
- type: cel
  auth.oauth2:
    client.id: 12345678901234567890abcdef
    client.secret: abcdef12345678901234567890
    token_url: http://localhost/oauth2/token
    user: user@domain.tld
    password: P@$$W0₹D
  resource.url: http://localhost

Input state

The cel input keeps a runtime state between requests. This state can be accessed by the CEL program and may contain arbitrary objects.

The state must contain a url string and may contain any object the user wishes to store in it.

All objects are stored at runtime, except cursor, which has values that are persisted between restarts.

Configuration options

The cel input supports the following configuration options plus the [{beatname_lc}-input-cel-common-options] described later.

interval

Duration between repeated requests. It may make additional pagination requests in response to the initial request if pagination is enabled. Default: 60s.

auth.basic.enabled

When set to false, disables the basic auth configuration. Default: true.

Note
Basic auth settings are disabled if either enabled is set to false or the auth.basic section is missing.

auth.basic.user

The user to authenticate with.

auth.basic.password

The password to use.

auth.oauth2.enabled

When set to false, disables the oauth2 configuration. Default: true.

Note
OAuth2 settings are disabled if either enabled is set to false or the auth.oauth2 section is missing.

auth.oauth2.provider

Used to configure supported oauth2 providers. Each supported provider will require specific settings. It is not set by default. Supported providers are: azure, google.

auth.oauth2.client.id

The client ID used as part of the authentication flow. It is always required except if using google as provider. Required for providers: default, azure.

auth.oauth2.client.secret

The client secret used as part of the authentication flow. It is always required except if using google as provider. Required for providers: default, azure.

auth.oauth2.user

The user used as part of the authentication flow. It is required for authentication - grant type password. It is only available for provider default.

auth.oauth2.password

The password used as part of the authentication flow. It is required for authentication - grant type password. It is only available for provider default.

Note
user and password are required for grant_type password. If user and password is not used then it will automatically use the token_url and client credential method.

auth.oauth2.scopes

A list of scopes that will be requested during the oauth2 flow. It is optional for all providers.

auth.oauth2.token_url

The endpoint that will be used to generate the tokens during the oauth2 flow. It is required if no provider is specified.

Note
For azure provider either token_url or azure.tenant_id is required.

auth.oauth2.endpoint_params

Set of values that will be sent on each request to the token_url. Each param key can have multiple values. Can be set for all providers except google.

- type: cel
  auth.oauth2:
    endpoint_params:
      Param1:
        - ValueA
        - ValueB
      Param2:
        - Value

auth.oauth2.azure.tenant_id

Used for authentication when using azure provider. Since it is used in the process to generate the token_url, it can’t be used in combination with it. It is not required.

auth.oauth2.azure.resource

The accessed WebAPI resource when using azure provider. It is not required.

auth.oauth2.google.credentials_file

The credentials file for Google.

Note
Only one of the credentials settings can be set at once. If none is provided, loading default credentials from the environment will be attempted via ADC. For more information about how to provide Google credentials, please refer to https://cloud.google.com/docs/authentication.

auth.oauth2.google.credentials_json

Your credentials information as raw JSON.

Note
Only one of the credentials settings can be set at once. If none is provided, loading default credentials from the environment will be attempted via ADC. For more information about how to provide Google credentials, please refer to https://cloud.google.com/docs/authentication.

auth.oauth2.google.jwt_file

The JWT Account Key file for Google.

Note
Only one of the credentials settings can be set at once. If none is provided, loading default credentials from the environment will be attempted via ADC. For more information about how to provide Google credentials, please refer to https://cloud.google.com/docs/authentication.

auth.oauth2.google.jwt_json

The JWT Account Key file as raw JSON.

Note
Only one of the credentials settings can be set at once. If none is provided, loading default credentials from the environment will be attempted via ADC. For more information about how to provide Google credentials, please refer to https://cloud.google.com/docs/authentication.

auth.oauth2.google.delegated_account

Email of the delegated account used to create the credentials (usually an admin). Used in combination with auth.oauth2.google.jwt_file or auth.oauth2.google.jwt_json.

resource.url

The URL of the HTTP API. Required.

The API endpoint may be accessed via unix socket and Windows named pipes by adding +unix or +npipe to the URL scheme, for example, http+unix:///var/socket/.

resource.timeout

Duration before declaring that the HTTP client connection has timed out. Valid time units are ns, us, ms, s, m, h. Default: 30s.

resource.ssl

This specifies SSL/TLS configuration. If the ssl section is missing, the host’s CAs are used for HTTPS connections. See [configuration-ssl] for more information.

resource.keep_alive.disable

This specifies whether to disable keep-alives for HTTP end-points. Default: true.

resource.keep_alive.max_idle_connections

The maximum number of idle connections across all hosts. Zero means no limit. Default: 0.

resource.keep_alive.max_idle_connections_per_host

The maximum idle connections to keep per-host. If zero, defaults to two. Default: 0.

resource.keep_alive.idle_connection_timeout

The maximum amount of time an idle connection will remain idle before closing itself. Valid time units are ns, us, ms, s, m, h. Zero means no limit. Default: 0s.

resource.retry.max_attempts

The maximum number of retries for the HTTP client. Default: 5.

resource.retry.wait_min

The minimum time to wait before a retry is attempted. Default: 1s.

resource.retry.wait_max

The maximum time to wait before a retry is attempted. Default: 60s.

resource.redirect.forward_headers

When set to true request headers are forwarded in case of a redirect. Default: false.

resource.redirect.headers_ban_list

When redirect.forward_headers is set to true, all headers except the ones defined in this list will be forwarded. Default: [].

resource.redirect.max_redirects

The maximum number of redirects to follow for a request. Default: 10.

resource.rate_limit.limit

The value of the response that specifies the maximum overall resource request rate.

resource.rate_limit.burst

The maximum burst size. Burst is the maximum number of resource requests that can be made above the overall rate limit.

resource.tracer.filename

It is possible to log HTTP requests and responses in a CEL program to a local file-system for debugging configurations. This option is enabled by setting the resource.tracer.filename value. Additional options are available to tune log rotation behavior.

Enabling this option compromises security and should only be used for debugging.

resource.tracer.maxsize

This value sets the maximum size, in megabytes, the log file will reach before it is rotated. By default logs are allowed to reach 1MB before rotation.

resource.tracer.maxage

This specifies the number days to retain rotated log files. If it is not set, log files are retained indefinitely.

resource.tracer.maxbackups

The number of old logs to retain. If it is not set all old logs are retained subject to the resource.tracer.maxage setting.

resource.tracer.localtime

Whether to use the host’s local time rather that UTC for timestamping rotated log file names.

resource.tracer.compress

This determines whether rotated logs should be gzip compressed.

cursor

Cursor is an object available as state.cursor where arbitrary values may be stored. Cursor state is kept between input restarts and updated after each event of a request has been published. When a cursor is used the CEL program must create a cursor state for each event that is returned by the program.

filebeat.inputs:
# Fetch your public IP every minute and note when the last request was made.
- type: cel
  interval: 1m
  resource.url: https://api.ipify.org/?format=json
  program: |
    bytes(get(state.url).Body).as(body, {
        "events": [body.decode_json().with({
            "last_requested_at": has(state.cursor) && has(state.cursor.last_requested_at) ?
                state.cursor.last_requested_at
            :
                now
        })],
        "cursor": {"last_requested_at": now}
    })

redact.fields

This specifies fields in the state to be redacted prior to debug logging. Fields listed in this array will be either replaced with a * or deleted entirely from messages sent to debug logs.

redact.delete

This specifies whether fields should be replaced with a * or deleted entirely from messages sent to debug logs. If delete is true, fields will be deleted rather than replaced.