Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@timestamp erroneously added to events in pipeline-to-pipeline communication #13333

Open
breml opened this issue Oct 18, 2021 · 4 comments
Open

Comments

@breml
Copy link
Contributor

breml commented Oct 18, 2021

Logstash information:

Please include the following information:

Logstash 7.15 from official Docker image (FROM docker.elastic.co/logstash/logstash:7.15.0)

Plugins installed: (bin/logstash-plugin list --verbose)

only default plugins

JVM (e.g. java -version):

JVM from official Docker image (FROM docker.elastic.co/logstash/logstash:7.15.0)

Description of the problem including expected versus actual behavior:

Given a logstash configuration with 2 connected pipelines, where in the first pipeline the @timestamp field is removed (e.g. with filter { mutate { remove_field => [ "@timestamp" ] } }), it is re-added when the event is passed to the second pipeline.

Steps to reproduce:

Dockerfile to reproduce:

# syntax=docker/dockerfile:1.3-labs
FROM docker.elastic.co/logstash/logstash:7.15.0

USER logstash
RUN <<EOF
mkdir tests
mkdir pipeline/pipeline1
mkdir pipeline/pipeline2

cat <<EOT > /usr/share/logstash/config/pipelines.yml
- pipeline.id: pipeline1
  path.config: "pipeline/pipeline1/pipeline.conf"
- pipeline.id: pipeline2
  path.config: "pipeline/pipeline2/pipeline.conf"
EOT

cat <<EOT > /usr/share/logstash/tests/test.yml
input_plugin: "input"
testcases:
  - input:
      - Test
    expected:
      - message: Test
EOT

cat <<EOT > /usr/share/logstash/pipeline/pipeline1/pipeline.conf
input { stdin { id => "input" } }
filter { mutate { remove_field => [ "@timestamp" ] } }
output { pipeline { send_to => [ pipeline2 ] } }
EOT

cat <<EOT > /usr/share/logstash/pipeline/pipeline2/pipeline.conf
input { pipeline { address => pipeline2 } }
output { stdout { } }
EOT

cat <<EOT > /usr/share/logstash/run_tests.sh
echo "Running directly through Logstash"
echo "Test" | logstash &
sleep 30
pkill logstash

EOT

chmod a+x run_tests.sh
EOF

CMD ["/bin/bash", "/usr/share/logstash/run_tests.sh"]

Build and run the above Dockerfile with:

DOCKER_BUILDKIT=1 docker build --tag test .
docker run --rm test

For the above test case, the expected output would be:

{
          "host" => "a6d6c8cfdd7b",
      "@version" => "1",
       "message" => "Test"
}

The actual output is:

{
          "host" => "a6d6c8cfdd7b",
      "@version" => "1",
    "@timestamp" => 2021-10-18T19:53:27.878Z,
       "message" => "Test"
}

See also: magnusbaeck/logstash-filter-verifier#151

@breml
Copy link
Contributor Author

breml commented Nov 26, 2021

Polite ping!

@yaauie
Copy link
Member

yaauie commented Nov 26, 2021

Thank you for the report and the minimal reproduction.

The timestamp is added as part of the event instantiation and because we do not have copy-on-write events, the entire event is cloned when being sent to the other pipeline (this prevents mutation issues where we either have multiple downstream pipelines or we have multiple outputs and a downstream pipeline begins work on an event before it has finished its way through all of the upstream pipelines).

I don't see an immediate way around this issue.

@breml
Copy link
Contributor Author

breml commented Nov 26, 2021

@yaauie Thanks for your reply. The part about

the entire event is cloned

got me thinking, because one could argue, that this is not really true. One does not get a real clone of the source event, because whenever a new event is created, this always gets a timestamp.

I tried also the clone filter and it is true there as well:

# syntax=docker/dockerfile:1.3-labs
FROM docker.elastic.co/logstash/logstash:7.15.0

USER logstash
RUN <<EOF
mkdir pipeline/pipeline1

cat <<EOT > /usr/share/logstash/config/pipelines.yml
- pipeline.id: pipeline1
  path.config: "pipeline/pipeline1/pipeline.conf"
EOT

cat <<EOT > /usr/share/logstash/pipeline/pipeline1/pipeline.conf
input { stdin { id => "input" } }
filter { mutate { remove_field => [ "@timestamp" ] } clone { clones => [ clone ] } }
output { stdout { } }
EOT

cat <<EOT > /usr/share/logstash/run_tests.sh
echo "Running directly through Logstash"
echo "Test" | logstash &
sleep 30
pkill logstash

EOT

chmod a+x run_tests.sh
EOF

CMD ["/bin/bash", "/usr/share/logstash/run_tests.sh"]

I see following options:

  1. Extend the event instantiation such that adding the timestamp is optional and can be controlled by a parameter. In most cases the current behavior is kept. The two exceptions would be: input/output pipeline and filter clone. In both cases an existing event is cloned and therefore the timestamp should be taken from the original event and if there is no timestamp, also the clone must not have a timestamp.
    Alternatively, the code that performs the cloning (in output pipeline and filter clone) compensates for the timestamp, if the original event does not have this field.

To support this argument, consider the following test case, where an event is cloned, which has the timestamp field, but due to the sleep filter, the clone happens 1 second after the original event is created. But still, both events have the same timestamp:

# syntax=docker/dockerfile:1.3-labs
FROM docker.elastic.co/logstash/logstash:7.15.0

USER logstash
RUN <<EOF
mkdir pipeline/pipeline1

cat <<EOT > /usr/share/logstash/config/pipelines.yml
- pipeline.id: pipeline1
  path.config: "pipeline/pipeline1/pipeline.conf"
EOT

cat <<EOT > /usr/share/logstash/pipeline/pipeline1/pipeline.conf
input { stdin { id => "input" } }
filter { sleep { time => 1 } clone { clones => [ clone ] } }
output { stdout { } }
EOT

cat <<EOT > /usr/share/logstash/run_tests.sh
echo "Running directly through Logstash"
echo "Test" | logstash &
sleep 30
pkill logstash

EOT

chmod a+x run_tests.sh
EOF

CMD ["/bin/bash", "/usr/share/logstash/run_tests.sh"]

produces the following output:

{
    "@timestamp" => 2021-11-26T19:57:05.973Z,
      "@version" => "1",
       "message" => "Test",
          "host" => "f0471db64e45"
}
{
    "@timestamp" => 2021-11-26T19:57:05.973Z,
          "type" => "clone",
      "@version" => "1",
       "message" => "Test",
          "host" => "f0471db64e45"
}
  1. Mark this issue as accepted bug / won't fix. (which would be unfortunate for me)

@breml
Copy link
Contributor Author

breml commented Dec 20, 2021

Polite ping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants