[Elastic Agent] Allowlist / Blocklist for inputs an Agent supports with Fleet #21000

ruflin · 2020-09-07T06:47:15Z

When an Agent is run with Fleet, today any input that is supported by a package can be run. But there are cases where the administrator of the Agent itself would like to limit the type of inputs that can be run. Few examples:

An agent should only be used for heartbeat pings
An agent should not be allowed to run endpoint
An endpoint should only be used to collect system metrics but not logs

The Agent needs a way to allow / block list certain inputs when it is run so Fleet cannot send down any configs which contain these inputs or the not allowed inputs are ignored. Which inputs are supported or not supported should also be sent up to Fleet to make Fleet smart around what can be configured and what not.

elasticmachine · 2020-09-07T06:47:16Z

Pinging @elastic/ingest-management (Team:Ingest Management)

ph · 2020-12-18T14:53:50Z

A few questions that need to be answered which will affect the design and implementation.

Agents:

It is an enroll time only configuration
It is part of the agent configuration
Once set can a user change it?

Kibana side is elastic/kibana#76841

Looping PM @mostlyjason

michalpristas · 2021-01-13T10:03:43Z

we were talking with @ph about this feature and question arised, aside design where should implementation take place, would it be fleet server responsible for specifying these based on some metadata
or should it be implemented fleet side

ruflin · 2021-01-13T10:06:42Z

@michalpristas Not sure I follow, this should be an Elastic Agent. This feature should also work in standalone even though I don't see a use case there. When enrolled into fleet, the Elastic Agent must report its capabilities.

michalpristas · 2021-01-13T10:13:47Z

i think the question comes to what @ph said a comment above: at what point this should be defined?
should it be enroll time, part of config etc.
Eventually it will be stored as part of the configuration but what should be the source of these restrictions (system detection/user specified config locally/user configuration fleet side/generated based on metadata on fleet server-side).

ruflin · 2021-01-13T10:21:37Z

As this is a restriction that should be applied independent of how the agent is run, I don't think it should happen on enroll time. The config seems like a natural place but that becomes a challenge when enrolling as it is overwritten.

For designing the feature I would take fleet out of the equation to keep it an independent feature.

ph · 2021-01-13T15:44:43Z

Agree on decoupling the Fleet in elastic/kibana#76841 and Agent implementation.
When the Agent implementation would be more advance would be good to sync with @jfsiii to figure a plan to expose it in fleet.

@michalpristas Do you have a proposal how a user would configure the restriction in the YAML?

mostlyjason · 2021-01-14T18:01:09Z

If we put in the config that means its set when the agent is started (or restarted)? It could change more often than enrollment since its a runtime setting?

Also, we could put it in a section of agent config that is not overwritten by fleet. For example, it could be in elastic-agent.yml. I think a file should be fine at least for a first version since I imagine its the operator persona who needs this feature, and they'll have access to deploy to the filesystem. We could introduce it in Fleet later once we have RBAC but we don't have that today. Without RBAC a service owner could override these settings and that's not the behavior we want to allow on Elastic Cloud, again from an operator perspective.

Also, I imagine we'd want way to report whether inputs are disabled due to a blocklist, perhaps in our status information to ES? This could be similar to how we report input conditions.

ph · 2021-01-14T19:04:23Z

@mostlyjason what you are proposing I think this is exactly what Ruflin and I had in mind but not explained as good as you.

For reporting it could either be status information... or even metadata on the agent?

ph · 2021-01-14T19:06:11Z

Including @jfsiii for awareness.

michalpristas · 2021-01-19T10:53:36Z

here's what i've been thinking about, feedback more than welcome
i could not come up with proper name so i just refer to ruleset as rules here

Rule structure

rule: allow|deny 
name: identifying_name 
input: input_name
when: eql_formula [optional]

i was considering a bunch of different options but this one seems most straightforward and leaves space for enhancements.
rule just specifies whether we're allowing or denying
input specifies input we're targeting this one's mandatory and allows * wildcard not full regex
when clause is then a full eql formula allowing access to stream and input fields

Everything is allowed by default with empty ruleset
As soon as rules are present, first matching rule wins.

I was also thinking option that will deny everything as soon as any allow rule is in the list. But would make it not that clear when combining allow/deny for same level (e.g allow system/logs deny system/* would result in allowing system/logs and nothing more not even */http)

with config

inputs:
	- type: logfile
	  streams:
	  	- paths: []{a.error,b.access}
	  	- paths: []{c,d}
	- type: system/logs
	  streams:
	  	- paths: []{a,b}
	  	- paths: []{c,d}
	- type: system/metrics
	  streams:
	  	- metricset: cpu
	  	- metricset: memory
	- endpoint

Agent is allowed everything

nothing needed

An agent should only be used for memory collection

rules:
- rule: allow
  input: system/metrics
  when: metricset == memory
- rule: deny
  input: *

An agent should not be allowed to run endpoint

rules:
- rule: deny
  input: endpoint
- rule: allow
  input: *

An agent should only be used to collect system metrics but not logs

rules:
- rule: deny
  input: system/logs

An agent should only be used to collect system information

rules:
- rule: allow
  input: system/*
- rule: deny
  input: *

An agent should collect only hb pings

rules:
- rule: allow
  input: */http
  when: type == icmp
- rule: deny
  input: *

Allow system/metrics but deny all other system inputs

rules:
- rule: allow
  input: system/metrics
- rule: deny
  input: system/*

I would split implementation into 3 stages possibly

1st one: rules support on input level
2nd one: support for when clause
3rd: more detailed filtering [optional]

What i mean by more detailed filtering is enabling subset of the stream to be applied
e.g with config

inputs:
	- type: logfile
	  streams:
                - paths: []{ other }
	  	- paths: []{a,b,c,d}

we would enable specifying rule in a way that it not only allows/denies stream based on a when condition but also filters it

after applying

rules:
- rule: allow
  input: logfile
  filter: 
     paths: hasString(paths, c) or hasString(paths, d)

where filter is a KV with key specifying path to object we're applying filter for
it would result in

inputs:
	- type: logfile
	  streams:
	  	- paths: []{c,d}

after applying

rules:
- rule: deny
  input: logfile
  filter: 
     paths: hasString(paths, c)

it would result in

inputs:
	- type: logfile
	  streams:
	  	- paths: []{a,b,d}

after applying

rules:
- rule: deny
  input: logfile
  when: hasString(paths,c)
  filter: 
     paths: hasString(paths, c)

it would result in

inputs:
	- type: logfile
	  streams:
	  	- paths: []{ other }
	  	- paths: []{a,b,d}

other is kept as scope of filter is reduced by when condition

ph · 2021-01-19T14:44:43Z

@urso @blakerouse You might be interested in this?

This is an interesting syntax, and it's built on top of what we have.

I don't want to increase the scope but I certainly see this useful in other places:

Program spec: if It can only be run on Windows.
Input: this input require root privileges. (run on port < 1024)

Do you see us standardizing on the same syntax or at least same code path?

blakerouse · 2021-01-19T15:30:41Z

I like this I think its very flexible. From this it seems like the end of the ruleset always has the following appended:

- rule: allow
  input: *

At first I was confused on the need for filter. But now I understand that is a way to still allow an input but isolated it to specific parameters in the input definition. This is very powerful, but is it needed is the question. Do we have a use case for this?

michalpristas · 2021-01-19T15:34:20Z

Filter is something which came to mind while i was thinking about options and structure and which might be powerful tool in our toolbox but I did not planed to include that into this release it's just something i wanted to raise as a stretch goal with very low priority

ph · 2021-01-19T15:51:34Z

I've played with the syntax in theory you also limit what output is used with something like this:

-rule: deny
 output: logstash

-rule: deny
 output: logstash
 when: ${ssl} != true

Concerning the filter its super powerful but I don't think we need it in the short term, maybe something we can investigate or let the users drive the requirements feature. @mostlyjason

@michalpristas I like that you have created an implementation plant in phase, keeping it at the "1st one: rules support on input level" for the first iteration look like a good idea.

ph · 2021-01-19T15:53:31Z

Maybe we should consider adding an optional or even mandatory name to a rule, I would probably want to have it display in the log while trying to debug it.

ruflin · 2021-01-20T09:26:48Z

One thing I would like to see is to be able to use this system also for outputs or for example upgrades, see #21096 Taking the proposal, we might be able to extend it like the following:

rules:
- type: output
  condition: output.name != elasticsearch
  rule: deny

- type: input
  condition: input.name == logfile
  rule: allow
  
- type: input
  condition: *
  rule: deny

- type: upgrade
  upgrade.enabled: true
  upgrade.bugfix_release: true
  upgrade.major_release: false

The above would potentially allow that each rule type could have its own independent implementation and additional params passed in like in the case of upgrade. Not happy with the proposal there yet but maybe something to iterate on?

Is there a reason to use when instead of condition like we have in the other parts of the config? Maybe we don't even need to rule part as it is just based on the output of the condition (true or false)?

@michalpristas Where will these rules be specified to make sure they also still apply after enrolling?

@michalpristas Is the rule under An agent should not be allowed to run endpoint correct?

michalpristas · 2021-01-20T09:43:07Z

i was going for when as we have that in spec and it also reads nicely
e.g rule: allow system/* when paths contains *.error reminds a bit more natural language
this is also the reason why rule and when condition is there as it is more clear what we want to deny/allow denying if when condition is false seems cumbersome and maybe more error prone.

regarding input/output/upgrade i like what @ph suggested above, it allows you to combine rule e.g

rule: deny
output: logstash
input: *
when: ${ssl} != true

which would disable all inputs/outputs not using ssl.

upgrade could be defined as (to allow bugfix

rule: allow
upgrade: *
when: ${version} LIKE 7.12.x

to allow minor

rule: allow
upgrade: *
when: ${version} LIKE 7.x.x

or update one major up

rule: allow
upgrade: *
when: ${version} LIKE 8.x.x

edit: endpoint example was indeed other way around

ruflin · 2021-01-20T12:10:30Z

I missed the part that we might want to add multiple conditions from different types together like your input / output example above. But I wonder if this applies in practice or is just theoretical?

On condition vs when: The when is in our internal spec file I assume and condition in the public config. I would prefer to not expose multiple terms to our users with the same syntax. I assume it will use the same code as condition?

@michalpristas Where will this be specified to work in standalone but also when enrolled into Fleet?

michalpristas · 2021-01-20T12:38:25Z

oh yeah,

i planned to have it elastic-agent.yml for standalone
in case of enroll it depends we can keep it here OR if we plan this to be configurable from fleet which i guess so it would move to fleet.yml after enroll

combining is theoretical
i think you're right about condition let's rename it then

michalpristas · 2021-01-20T12:45:47Z

i think we just need to be careful to evaluate rules before eql gets evaluated as it uses condition as a reserved key and removes all keys where condition is evaluated as false

ruflin · 2021-01-20T12:46:18Z

This should NOT be configurable through Fleet. This is important as the person setting up the Elastic Agent puts in these restrictions and Fleet should not be able to overwrite these.

ph · 2021-01-20T13:05:41Z

For an implementation stand point, not combining is probably easier to implement, validates and reason about.

urso · 2021-01-20T13:56:58Z

Configuration storage

This should NOT be configurable through Fleet. This is important as the person setting up the Elastic Agent puts in these restrictions and Fleet should not be able to overwrite these.

+100

i planned to have it elastic-agent.yml for standalone

In case of automated provisioning users might want to have a single file prepared that can be shared. But I presume one can run agent with multiple config files in that case?

In standalone case an administrator might even want to maintain the rules in a separate read-only config file, such that users that have write access to the elastic-agent.yml config can not overwrite rules.

Condition vs when

On condition vs when: The when is in our internal spec file I assume and condition in the public config. I would prefer to not expose multiple terms to our users with the same syntax. I assume it will use the same code as condition?

i think we just need to be careful to evaluate rules before eql gets evaluated as it uses condition as a reserved key and removes all keys where condition is evaluated as false

+1 on using condition. Having two similar keywords in different places can become confusing. Users don't care about nitty details why evaluation order forces us to use different keywords. If there is a potential issue, we need to fix that.

Filter

I would say: No. If users provide an invalid configuration according to the configured rules, reject it. Be strict! And report to logs or Fleet UI that some configuration has been rejected (let users see and maybe react).

Although filter can be very flexible, it also makes it more difficult to see which configuration is actually used to start the Beat. This can make debugging more difficult. filter adds an additional configuration modification layer.

In some places we use regular expressions and glob filters. For example the log input. As these match a "regular language", you will have a hard time to write a good filter expression in order to exclude some files/directories. In the filter use-case users might not want to modify the glob pattern, but want to add patterns to exclude_files (always).

michalpristas · 2021-01-20T14:10:40Z

i like having separate file for this. i can easily imagine using inspect command to see effects of different rule files on provided config.
naming wise, do you have any ideas how to name this. if in separate file elastic-agent-filter.yml elastic-agent-sieve.yml elastic-agent-ruleset.yml comes to mind but i'm completely open to any ideas because i dont like a single one of the ones i came up with

blakerouse · 2021-01-20T14:46:32Z

i was going for when as we have that in spec and it also reads nicely
e.g rule: allow system/* when paths contains *.error reminds a bit more natural language
this is also the reason why rule and when condition is there as it is more clear what we want to deny/allow denying if when condition is false seems cumbersome and maybe more error prone.

regarding input/output/upgrade i like what @ph suggested above, it allows you to combine rule e.g
rule: deny
output: logstash
input: *
when: ${ssl} != true
which would disable all inputs/outputs not using ssl.

When combining output with input in the same rule it feels to me this would mean that all inputs are disabled for the output logstash if ssl is not enabled. So other inputs would be enabled as long as the output is not logstash.

I also think that ${ssl} there should have a prefix of the reference first. So if your using say an environment variable to satisfy the condition it would be ${env.MY_VAR} == 'on'. To use the value of the evaluated input I think it would be ${input.ssl} == true.

blakerouse · 2021-01-20T14:47:23Z

i like having separate file for this. i can easily imagine using inspect command to see effects of different rule files on provided config.
naming wise, do you have any ideas how to name this. if in separate file elastic-agent-filter.yml elastic-agent-sieve.yml elastic-agent-ruleset.yml comes to mind but i'm completely open to any ideas because i dont like a single one of the ones i came up with

As for the rules, I also think a separate file is better. I would be okay with just rules.yml. Don't think it needs to be prefixed with elastic-agent.

ruflin · 2021-01-20T15:49:36Z

The way I think of this feature is capabilities that the Elastic Agent supports, much less of rules that define what can be in the configuration. From an implementation perspective, we will use "rules" to filter out config blocks which are not supported but this is an implementation detail.

Explaining this to a user, I would explain it the way that this Elastic Agent only supports the Elasticsearch output, or that the capabilities this Elastic Agent has is only input type apm and nothing else.

So on my end, I would land on capabilities.yml or something similar, not rules.

For the combination: Already our conversation above shows it is not easy to explain. Lets stay away from it until we have a specific requirement for it.

ph · 2021-01-21T12:30:10Z

@michalpristas Can you take 2-3 mins, add a summary comment with the plan this will help reduce any miscommunication.)

mostlyjason · 2021-01-21T20:01:11Z

report to logs or Fleet UI that some configuration has been rejected (let users see and maybe react)

+1 please consider how we track that an input is blocked. It'd be great to be able to see this in the agent details UI so a user can understand why no data is coming in or why the agent is unhealthy, then fix it. That means we should record it in a way that we can programmatically show the status in the UI

michalpristas · 2021-01-25T09:02:20Z

updated summary

Rule structure

rule: allow|deny 
name: identifying_name 
input: input_glob
condition: eql_formula [optional]

rule: allow|deny 
name: identifying_name 
output: output_glob
condition: eql_formula [optional]

rule: allow|deny 
name: identifying_name 
upgrade: ${version} LIKE 8.x.x

Everything is allowed by default with empty ruleset
As soon as rules are present, first matching rule wins.

with config

inputs:
	- type: logfile
	  streams:
	  	- paths: []{a.error,b.access}
	  	- paths: []{c,d}
	- type: system/logs
	  streams:
	  	- paths: []{a,b}
	  	- paths: []{c,d}
	- type: system/metrics
	  streams:
	  	- metricset: cpu
	  	- metricset: memory
	- endpoint

Agent is allowed everything

nothing needed

An agent should only be used for memory collection

rules:
- rule: allow
  input: system/metrics
  condition: metricset == memory
- rule: deny
  input: *

An agent should not be allowed to run endpoint

rules:
- rule: deny
  input: endpoint
- rule: allow
  input: *

An agent should only be used to collect system metrics but not logs

rules:
- rule: deny
  input: system/logs

An agent should only be used to collect system information

rules:
- rule: allow
  input: system/*
- rule: deny
  input: *

An agent should collect only hb pings

rules:
- rule: allow
  input: */http
  condition: type == icmp
- rule: deny
  input: *

Allow system/metrics but deny all other system inputs

rules:
- rule: allow
  input: system/metrics
- rule: deny
  input: system/*

I would split implementation into 3 stages possibly

1st one: rules support on input level
2nd one: support for condition clause

I still need to think about representation of blocked, one thing is that information will be in logs with sufficient details.
But i'm need to think how to simplify it for reporting purposes ideally as part of checkin.
If it would have been only for inputs it would be fine. but input can be partially allowed.
Would it be ok to use some kind of ID and information about REVOKED or PARTIALLY_REVOKED e.g

If input/output/upgrade contains ID this will be returned otherwise as: ${rule_name}-input/output/upgrade-${type}
e.g:

rule1-input-system/metrics: REVOKED
rule2-input-system/logs: PARTIALLY_REVOKED
Then user could lookup information in logs based on ID to see what precisely happened

urso · 2021-01-25T13:57:20Z

Would it be ok to use some kind of ID and information about REVOKED or PARTIALLY_REVOKED e.g

Would it make sense to have this as part of the status reporting API?

ruflin · 2021-01-25T15:59:48Z

To keep it simple, lets only focus on input for now so we can also circumvent the partial problem.

bradenlpreston · 2021-01-25T16:16:11Z

Question - are we settled on the name "Allowlist" and "Blocklist" ? - I worry that these names may cause confusion with blocklists and allowlists for other use cases (blocklist of file hashes on endpoint, blocklist of IPs, etc.)

Perhaps somethng like Available Integrations/Unavailable Integrations

@caitlinbetz , @kevinlog , @crowens

michalpristas · 2021-01-26T08:04:59Z

@bradenlpreston aside from rule: allow|deny it wont be part of naming.
i will try to move near capabilites for names.

ruflin · 2021-01-26T08:25:12Z

@bradenlpreston Nit: This is about inputs not integrations, just to make sure we are talking about the same thing. In the case of endpoing, the input is the integration but in most cases there are multiple inputs per integration.

bradenlpreston · 2021-01-26T14:27:29Z

thank you for the clarification @ruflin and @michalpristas

ruflin added the Team:Ingest Management label Sep 7, 2020

ruflin mentioned this issue Sep 7, 2020

[Fleet] Show agent as Unhealthy if agent reports an error about incompatible input(s) elastic/kibana#76841

Closed

ruflin mentioned this issue Sep 15, 2020

[Elastic Agent] Capabilities feature #21096

Closed

ph added the v7.12.0 label Oct 14, 2020

ph assigned michalpristas Dec 18, 2020

michalpristas mentioned this issue Feb 4, 2021

[Ingest Management] Agent supports capabilities definition #23848

Merged

6 tasks

michalpristas mentioned this issue Feb 15, 2021

Cherry-pick #23848 to 7.x: Agent supports capabilities definition #24037

Merged

6 tasks

michalpristas closed this as completed Feb 16, 2021

mostlyjason mentioned this issue May 10, 2021

[Heartbeat][Agent] Synthetics deps and Elastic Agent #22932

Closed

[Elastic Agent] Allowlist / Blocklist for inputs an Agent supports with Fleet #21000

[Elastic Agent] Allowlist / Blocklist for inputs an Agent supports with Fleet #21000

Comments

ruflin commented Sep 7, 2020

elasticmachine commented Sep 7, 2020

ph commented Dec 18, 2020

michalpristas commented Jan 13, 2021

ruflin commented Jan 13, 2021

michalpristas commented Jan 13, 2021

ruflin commented Jan 13, 2021

ph commented Jan 13, 2021

mostlyjason commented Jan 14, 2021 • edited Loading

ph commented Jan 14, 2021

ph commented Jan 14, 2021

michalpristas commented Jan 19, 2021 • edited Loading

Agent is allowed everything

An agent should only be used for memory collection

An agent should not be allowed to run endpoint

An agent should only be used to collect system metrics but not logs

An agent should only be used to collect system information

An agent should collect only hb pings

Allow system/metrics but deny all other system inputs

ph commented Jan 19, 2021

blakerouse commented Jan 19, 2021

michalpristas commented Jan 19, 2021

ph commented Jan 19, 2021

ph commented Jan 19, 2021 • edited Loading

ruflin commented Jan 20, 2021

michalpristas commented Jan 20, 2021 • edited Loading

ruflin commented Jan 20, 2021

michalpristas commented Jan 20, 2021 • edited Loading

michalpristas commented Jan 20, 2021

ruflin commented Jan 20, 2021

ph commented Jan 20, 2021

urso commented Jan 20, 2021 • edited Loading

Configuration storage

Condition vs when

Filter

michalpristas commented Jan 20, 2021

blakerouse commented Jan 20, 2021

blakerouse commented Jan 20, 2021

ruflin commented Jan 20, 2021

ph commented Jan 21, 2021

mostlyjason commented Jan 21, 2021 • edited Loading

michalpristas commented Jan 25, 2021 • edited Loading

Agent is allowed everything

An agent should only be used for memory collection

An agent should not be allowed to run endpoint

An agent should only be used to collect system metrics but not logs

An agent should only be used to collect system information

An agent should collect only hb pings

Allow system/metrics but deny all other system inputs

urso commented Jan 25, 2021

ruflin commented Jan 25, 2021

bradenlpreston commented Jan 25, 2021

michalpristas commented Jan 26, 2021

ruflin commented Jan 26, 2021

bradenlpreston commented Jan 26, 2021

mostlyjason commented Jan 14, 2021 •

edited

Loading

michalpristas commented Jan 19, 2021 •

edited

Loading

ph commented Jan 19, 2021 •

edited

Loading

michalpristas commented Jan 20, 2021 •

edited

Loading

michalpristas commented Jan 20, 2021 •

edited

Loading

urso commented Jan 20, 2021 •

edited

Loading

mostlyjason commented Jan 21, 2021 •

edited

Loading

michalpristas commented Jan 25, 2021 •

edited

Loading