Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Data Prepper can not write to Opensearch Datastream #2037

Open
rospeelman opened this issue Nov 24, 2022 · 11 comments
Open

[BUG] Data Prepper can not write to Opensearch Datastream #2037

rospeelman opened this issue Nov 24, 2022 · 11 comments
Labels
bug Something isn't working question Further information is requested

Comments

@rospeelman
Copy link

rospeelman commented Nov 24, 2022

Describe the bug
Can not send logs to Opensearch Data stream

To Reproduce
Steps to reproduce the behavior:

  1. create an index template for a data stream in Opensearch
  2. use data-prepper to send logs with an index matching the index patern of the datastream
  3. get the error "WARN com.amazon.dataprepper.plugins.sink.opensearch.OpenSearchSink - Document [org.opensearch.client.opensearch.core.bulk.BulkOperation@c4aecd] has failure: java.lang.RuntimeException: only write ops with an op_type of create are allowed in data streams"

Expected behavior
write logs to Data stream

Environment (please complete the following information):

  • Running in docker (Docker Desktop 4.10.1)
  • using latest version of Opensearch
  • using latest version of Data prepper

Comment
I didn't find an option to set the op_typ.

@rospeelman rospeelman added bug Something isn't working untriaged labels Nov 24, 2022
@sshivanii
Copy link
Contributor

Hi @rospeelman

The reason you've encountered that error is because the OpenSearch Sink currently does not have configuration options to specifically support data streams. I would encourage you to update this issue as a feature request or enhancement.

@cmanning09
Copy link
Contributor

@rospeelman , have you tried manually creating the data stream prior to running data prepper? The error you provided appears to be on creation of the data stream. And as @sshivanii has pointed out, the current plugin does not support data stream configuration option. Ingesting of documents should be standard across indices and creating a stream prior may be a work-around until we have support for creating streams through the plugin.

@rospeelman
Copy link
Author

I tested it again, creating the datastream bevor starting data prepper gets me the same result, as soon as i try to ingest data I get the same error.

@dlvenable
Copy link
Member

@rospeelman , Have you tried disabling the index_type in Data Prepper's opensearch plugin?

As @cmanning09 pointed out, we do not currently support Data Streams. However, Data Prepper can ingest data into OpenSearch without attempting to manage the indices.

I believe the following two steps may work:

  • Create the Data Stream outside Data Prepper
  • Configure the opensearch plugin in Data Prepper to use index_type: management_disabled .
pipeline:
  ...
  sink:
    opensearch:
      hosts: ["https://localhost:9200"]
      index_type: management_disabled

There are additional details in this issue: 1051

Configuration reference: https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/opensearch#configuration

@dlvenable dlvenable added question Further information is requested and removed untriaged labels Jan 28, 2023
@PW999
Copy link

PW999 commented Jun 7, 2023

I'm having the same issue right now and I can confirm that disabling index management doesn't work.

log-pipeline:
  source:
    otel_logs_source:
      port: 2021
      ssl: false
      authentication:
        unauthenticated:
  processor:
  sink:
    - opensearch:
        hosts: [ "https://logs.sandbox.system.com" ]
        username: "test"
        password: "123!"
        index_type: management_disabled
        index: system-dev-ecs

We would like to use datastreams since we're going to ingest logs from different sources (java on ECS Fargate, python/node on AWS Lambda's, plain log files through Fluentbit) and we've decided on using OTLP as protocol for all those application.
We need an easy way to delete old logs from OpenSearch, so we like the way datastreams automatically roll over indices and that it can delete old indices automatically.

//edit: small update after I found this ticket: #854 . Turns out that index names are formatted (would be nice if it were documented), so this actually works index: system-dev-ecs-%{YYYY-MM-dd}. We can then apply an ISM template to it, still need to see if it actually deletes the data, I'm happy we have something working.

//edit2: we're a couple of months later and it appears that using index: system-dev-ecs-%{YYYY-MM-dd} in combination with ISM's works at first sight but become a real PITA after a single day. So we're going to remove the %{YYYY-MM-dd} and let the ISM just deal with everything behind the scenes and see how that goes.

@cameronattard
Copy link

The issue here is not related to creating a data stream - it is the action performed when sending requests to OpenSearch (it is set to index and cannot be changed to create). To solve this, Data Prepper should implement an action attribute on the OpenSearch sink, similar to what LogStash has done here - https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-action

@k8ieone
Copy link

k8ieone commented Jul 27, 2023

@cameronattard Exactly this. As soon as I saw this error message, I immediately went looking at the options for the OpenSearch sink expecting to find an option to configure the op_type that data-prepper sends. To my surprise, no such option is available at this time.

@StefanSa
Copy link

Hi there,
For us it would also be important that otel as source and datastream would work.

If i understand correctly, the "opensearch exporter" from otel currently only supports traces, but not logs and metrics.
However, Data Prepper traces, logs and metrics from otel, but no datastream to opensearch.

@dlvenable
The current situation is not really helpful.

@StefanSa
Copy link

Hi @dlvenable @cmanning09
Any progress or time frame here ?
The ticket has been open for almost a year and so far nothing has moved, whereas this feature would be very necessary.

@mmehrten
Copy link

Would also like to see DataPrepper support for data streams. With many tools (such as AWS OpenSearch Ingestion) moving to DataPrepper based ingestion stacks, and Data Stream based analytics stacks (e.g. OpenSearch Integrations), it would be great to see support for data streams in DataPrepper.

@Alankarsharma
Copy link

I was able to write data to DataStream by setting action to create and index_type to custom. DataStream I created manually.
Sample config:

log-pipeline:
  source:
    otel_logs_source:
      port: 2021
      ssl: false
      authentication:
        unauthenticated:
  processor:
  sink:
    - opensearch:
        hosts: [ "https://logs.sandbox.system.com" ]
        username: "test"
        password: "123!"
        index_type: custom
        index: system-dev-ecs
        action: create

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
Development

No branches or pull requests

10 participants