Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for environment variable injection in logstash plugin configuration #3944

Closed
fbaligand opened this issue Sep 20, 2015 · 29 comments
Closed

Comments

@fbaligand
Copy link
Contributor

It would be great to support environment variable injection in logstash configuration, like this :

tcp {
port: "${TCP_PORT}"
}

It would be very useful to have a logstash configuration independent from its environment. And so, have the same logstash configuration among different environments (dev, test, prod, ...)

Numerous frameworks support such a feature like spring or log4j.

@jordansissel
Copy link
Contributor

As a workaround, you can use a template tool (m4, sed, etc) to achieve
this. Run it on your config before starting logstash. This is what I've
done in the past with good success. Hope this helps :)

On Sunday, September 20, 2015, Fabien Baligand notifications@github.com
wrote:

It would be great to support environment variable injection in logstash
configuration, like this :

tcp {
port: "${TCP_PORT}"
}

It would be very useful to have a logstash configuration independent from
its environment. And so, have the same logstash configuration among
different environments (dev, test, prod, ...)

Numerous frameworks support such a feature like spring or log4j.


Reply to this email directly or view it on GitHub
#3944.

@fbaligand
Copy link
Contributor Author

Thanks for your reply @jordansissel.
I know this workaround because I have read your answer in some forums or issues ;)
But it would be really great to support it dynamically in logstash !
It avoids to generate an instance configuration from a template configuration, each time we start logstash.

@cstockton
Copy link

Given the robustness of logstash configuration files I am a bit surprised there is not a cleaner way to do this. All of our logstash instances are containerized within Docker, getting our various containers to talk to eachother is done via Docker's container linking.. they provide you with some environmental variables so your containers can find each other.

That said, even if something like M4 was suitable for configuration preprocessing in a grammar as expressive as logstash's (it isn't) .. it would be difficult (messy at best) to hook the preprocessor into the various container deployment and orchestration systems. Your containers will be expected to be built ahead of time and be environment agnostic. Typically getting all the context they need at runtime from the environment variables.

@fbaligand The solution I have today uses the environment filter, but it is not ideal because it is including all of the environment vars in the message. Maybe you found a better way?

filter {
  ...
  environment {
    add_field_from_env => {
      "MY_ENV_ADDR" => "MY_ENV_PROD_PORT_12285_TCP_ADDR"
      "MY_ENV_PORT" => "MY_ENV_PROD_PORT_12285_TCP_PORT"
    }
  }
}
...
output {
  http {
    codec => "json"
    http_method => "post"
    url => "http://%{MY_ENV_ADDR}:%{MY_ENV_PORT}/..."
  }
}

Is there a way to specify local fields i.e.: not to be included in final output, but used for transit through the pipeline? Or maybe there is a syntax for defining fields to namespace them? There might be a way to do what I am trying to do in the logstash world, just haven't found it.

I did notice in grok there is a more robust declaration grammar for property access, I.E.: %{SYNTAX:SEMANTIC:CAST} .. I could see something like %{[NAMESPACE :]FIELD_NAME} being pretty useful and backwards compatibility being maintained. Could provide a few default namespacs like ENV, LOCAL.. and anything else useful.

Example below:

filter {
  ...
  mutate {
    add_field => {
      "LOCAL:SCHEMA" => "https" # Not included in output
      "LOCAL:VAR_NAME" => "LOCAL_VALUE" # Not included in output
      "GLOBAL_ENV_ADDR" => "%{ENV:ADD_TO_ALL}" # Included in output
      "GLOBAL_NAME" => "GLOBAL_VALUE" # Included in output
    }
  }
}
...
output {
  if [LOCAL:VARNAME] ... {

    http {
      codec => "json"
      http_method => "post"
      url => "%{LOCAL:SCHEMA}://%{ENV:MY_ENV_PROD_PORT_12285_TCP_ADDR}:%{ENV:MY_ENV_PROD_PORT_12285_TCP_PORT}/..."
    }
  }
}

Just food for thought. Thanks.

-Chris

@untergeek
Copy link
Member

@cstockton this is an argument in favor of having the environment filter put all of those variables into the @metadata field by default. Then they wouldn't show up in the output.

@untergeek
Copy link
Member

@jordansissel
Copy link
Contributor

@cstockton and @fbaligand - Could you review logstash-plugins/logstash-filter-environment#5 and let us know what you think?

@untergeek
Copy link
Member

@cstockton and @fbaligand - I think @jordansissel meant logstash-plugins/logstash-filter-environment#5 (the pull request)

@jordansissel
Copy link
Contributor

lol, failure on my part. Good catch, @untergeek !

@fbaligand
Copy link
Contributor Author

environment plugin is useful for some cases, but not for all.
In the sample in my issue, I inject a env variable in a input plugin. This can't be done using environment plugin.
environment plugin can't neither be used to assign int config properties and static config properties (which not process %{...}).

That's why a native env variable injection pre-processing in logstash would be very welcome !

@fbaligand
Copy link
Contributor Author

Regarding environment plugin enhancement (using @metadata), this sounds to me as an excellent idea !
In most cases, the environment variables provided by this plugin are used to configure output config properties, and not to be part of the event itself.

@cstockton
Copy link

thanks a lot @jordansissel this certainly fixes my use case, I responded with a little feedback in addition at logstash-plugins/logstash-filter-environment#5 (comment)

@fbaligand
Copy link
Contributor Author

Thanks for the improvement in environment plugin @untergeek !

Regarding this issue now, it's still relevant to address all cases that are not covered by environment plugin (input plugin properties, int plugin properties, and all plugin properties that do not support dynamic field injection).

@cstockton
Copy link

@fbaligand That is a good point, I could see input plugins needing access to environmental variables for setting up listeners. I think it would be pretty clean to denote @ 'identifier' as special kind of proxy field (seems like what metadata sort of is) and adding @env, I.E.:

input {
  tcp {
    port => [@env][DOCKER_PROVIDED_PORT_12285_TCP_PORT] } }

@fbaligand
Copy link
Contributor Author

That's an interesting option !
@jordansissel is that simpler to do [@env][MYVAR] or "${MYVAR}" ?

Le 23 sept. 2015 à 00:43, Chris Stockton notifications@github.com a écrit :

@fbaligand That is a good point, I could see input plugins needing access to environmental variables for setting up listeners. I think it would be pretty clean to denote @ 'identifier' as special kind of proxy field (seems like what metadata sort of is) and adding @env, I.E.:

input {
tcp {
port => [@env][DOCKER_PROVIDED_PORT_12285_TCP_PORT] } }

Reply to this email directly or view it on GitHub.

@jordansissel
Copy link
Contributor

the [@env] syntax feels weird because it uses what is called field reference syntax and events are the only thing in logstash that have fields, and events don't really have "environment variables", further, there's no event available for inputs, so having field reference syntax there would be really confusing ;P

I'm still not really in favor of this yet since m4/puppet/etc seems so simple to me. I don't want to dump this as something we force ops folks to solve, but I'm also not sure about the added burdens of additional syntax in the config file.

Carry this knowing that we are working on clustering and other concepts for logstash that will outlive the lifetime of a single logstash-process, so in the clustered world, configuring via environment variables feels quite weird. I think it's weird because I really want configuration with one interface, not multiple, and with clustering, the configuration comes from some central authority, and using environment variables complicates that - who evaluates the env vars? Each node? Just the central authority? If each node, now you have two sources of configuration instead of just one.

Thoughts?

@fbaligand
Copy link
Contributor Author

@jordansissel
In a cluster world, the concept of ${MYVAR} can be extended to support various sources : environment variables, but also cluster variables which are defined in your management console (for example).

To take example from spring or log4j 2, both of them support multiple sources when resolving ${MYVAR} : env variables, java system properties, java JNDI variables, ...

@cstockton
Copy link

I see where you are going with that @jordansissel .. I am not sure what kind of architecture would stay up to speed as logstash grows. The first thing to pop into my head would be some sort of "variable/field" providers. Then you could create etcd, environment, or even a swagger api provider. Given the fact that variable providers would be a form of input, maybe it could be done via inputs.. I don't know anything about the plugin systems API but maybe it would be robust enough already to do this through plugins today.

Below is an example of something that would be pretty nice for me with the disclaimer that I've only been working with logstash for a few weeks.. so it may not be idiomatic or align at all with logstash's long term goals/vision so sorry in advance if it's some sort of butchery :p

I.E.

input {
  env { }
  etcd { prefix => "_ETCD", url => "http://%{ETCD_ADDR}:%{ETCD_PORT}/v2/keys", ttl => 60 } # ETCD_ADDR is from environment, no prefix
  syslog { port => "%{_ETCD_SYSLOG_LISTEN_PORT}" }
  tcp { port => "%{_ETCD_FOO_LISTEN_PORT}" }
}

The main thing is it seems that "properties / variables" throughout logstash's configuration seem to always be bound to the event. However users like me, (incorreclty perhaps?) are trying to get non-event bound static attributes resolved at compile time.. while forseeing cases where call time resolved attributes feel inevitable. There is of course other ways to achieve both of those two things though. It all depends if you think it is dirty and wrong, or glorious for someone to do this:

output {
  if [@metadata][product] in [_ETCD_ENABLED_PRODUCTS] {
    elasticsearch { ... } } }

@untergeek
Copy link
Member

I like the idea of having an env { } block within inputs (rather than as its own input):

input {
  tcp {
    env => true
    port => @env["_ETCD_FOO_LISTEN_PORT"]
  }
}

Not sure how easy it would be to add this, but to me it's either this, or make the @env ivar available across all of Logstash (I know, ivars and having to extend each plugin to support a given ivar...).

@fbaligand
Copy link
Contributor Author

Up to me, it is really easier that logstash core itself pre-process ${MYVAR} tokens just before injecting result value in the plugin property. (Using the example configuration format I put in issue)
And it is especially not the responsibility to the plugin to interpret environment variable references.

@fbaligand
Copy link
Contributor Author

@suyograo @jordansissel @acchen97 @cstockton @untergeek
This would be great to support the same mechanism than beat :
elastic/beats#715

I like very much their ability to have a default value or to support array value.

@suyograo suyograo removed the v2.3.0 label Feb 3, 2016
fbaligand added a commit to fbaligand/logstash that referenced this issue Feb 21, 2016
@mikeholczer
Copy link

Chiming in to support the idea of implementing this similar to beats.

@fbaligand
Copy link
Contributor Author

@mikeholczer
I totally agree with you.
If you look at my pull request, it is exactly what I did.

@jordansissel
Copy link
Contributor

It avoids to generate an instance configuration from a template configuration, each time we start logstash.

I understand, but it feels like this assumes your method for starting logstash is immutable, which isn't true. Your "start logstash" procedure can include generating the config before executing bin/logstash.

My general concern here is that I don't think environment variables are a good solution for this, especially when I consider the Logstash roadmap.

Environment variables can only be communicated once to a given process, at the start. They can never be altered outside the process. This immutability of runtime-configuration is counter to one existing feature as well as one future feature.

The existing feature is automatic configuration reloads - with reloading, you cannot ever change the environment variable even though the configuration files themselves can be changed. If you try to use environment variables for configuration settings, you will be required to do a full restart of Logstash in order to change these values.

The future feature is centralized configuration (with ability to change the configuration after startup). For the same reason as config reloading that we have, today, I feel environment variable immutability will make this not valuable, or at the very least, not something most users would use (I won't speak for all users, though).

Logstash is intended to be a long-running process, and with our progress towards making Logstash more configurable while running, introducing partial immutability (environment variables) feels like a step backwards.

@ip2k
Copy link

ip2k commented Feb 27, 2016

+1; in my use-case I'm baking an AMI separately from where the instance will actually run, so the env I need to inject ( EC2_INSTANCE_ID ) would be different in my bake phase (which generates 1 AMI) vs where I'm running Logstash (multiple instances). Because of this, I currently use a script that runs once when the instance first boots. Being able to reference env vars from within the LS config itself would be massively helpful in this case :)

@fbaligand
Copy link
Contributor Author

Hi @jordansissel,

Firstable, environment variable is a really common and standard way to parameterize servers, tools, ...
Numerous people use that and need that.
You can see in this issue that :

  • @cstockton need it in a docker context
  • @ip2k need it in a amazon cloud context
  • and personnaly I need that because I want to have the same logstash configuration files in my dev environment and my prod environment.
  • I also need it because in production, I have a logstash cluster and each logstash instance has its own tcp input port. So having the same logstash configuration for each instance and just a different environment variable set at startup would be very helpfull.
    logstash central configuration is not standard and could not address all cases like docker image instanciation for example.

Secondary, numerous frameworks process environment variable injection ; this is a common feature.
I can quote log4j 2, logback and spring.
These 3 frameworks also support hot configuration reload, and they don't consider these features as opposite.

Thirdly, elasticsearch itself has this feature since a long time (https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html#node-name).
And Beats will have this feature in its release 2.0.0 (https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html#node-name).
I don't understand why it would be not relevant for logstash.

Fourth, in a first step, we could only have environment variable injection.
But when logstash central configuration will be available, we can add "logstash central variable injection".
And why not "java system property injection".
All that using the same mechanism.
For example, spring supports such a mechanism and I can guarantee you that it is very helpfull !
Concretely, when some user references a variable, logstash-core search firstable in "logstash central configuration", and if not found, in environment variables, and if not found, replace reference by empty string.
And this is really easy to add this in mixin.rb

Finally, up to me, for all these reasons, this is really a key feature and absolutely not a step backwards.

@samcday
Copy link

samcday commented Mar 2, 2016

I stumbled on this issue because I need to do exactly what @ip2k mentioned - I want to use the EC2 instance ID as a parameter for a Logstash plugin. It would be awesome if the config file supported environment var interpolations.

@jordansissel
Copy link
Contributor

#4710 is merged and supports this feature.

suyograo pushed a commit that referenced this issue Mar 7, 2016
suyograo pushed a commit that referenced this issue Mar 7, 2016
breml pushed a commit to u-s-p/logstash that referenced this issue Mar 12, 2016
Based on the implementation for environment variable injection (elastic#3944),
the delimiter for the default value in sprintf interpolation is changed
to ":" (KEY_NODE_DEFAULT_DELIMITER).
@nvtkaszpir
Copy link

works with jdbc input plugin like a charm :D

@fbaligand
Copy link
Contributor Author

Nice :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants