Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThreadPool signature breaks Logstash elastic_integration plugin build with 8.12.x version(s). #104959

Open
mashhurs opened this issue Jan 31, 2024 · 7 comments
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team

Comments

@mashhurs
Copy link
Contributor

mashhurs commented Jan 31, 2024

Description

Logstash recently developed elastic_integration plugin. When building the plugin, we internally use Elasticsearch sources. Source changes in Elasticsearch may affect the plugin if the plugin is using changed APIs/classes.
Recently, ES changed its ThreadPool constructor signature and ThreadPool was referenced by the elastic_integration plugin source. If we apply constructor changes, we no longer can build plugin against current stack release 8.12.x versions and validate its working state.
We can still build plugin against ES main branch and use with 8.12.x versions. However, since the plugin is not a default plugin yet, manual installation always fetch the latest version, may introduce abnormal states.

Acceptance Criteria

We understand that it is not ES breaking change boundary and ES team may encourage to always sync with latest source. However, as much as possible, we would like to mitigate risks with the plugin and keep 🟢 building against ES current release, snapshot and main branches. So, we ask a collaboration!

Possible approaches

  1. [EDITED] Keep OLD constructor set it deprecated and separate recent interface
public ThreadPool(Settings settings, MeterRegistry meterRegistry, ExecutorBuilder<?>... customBuilders) {
    this(settings, customBuilders);
    this.executors.forEach((k, v) -> {
        this.instruments.put(k, setupMetrics(meterRegistry, k, v)); // instruments is not final
    });
}
  1. OR another approach would be OLD constructor uses no-op metric by default (need to confirm with ES team if default is a desired behavior) on top of current ES change:
public ThreadPool(Settings settings, ExecutorBuilder<?>... customBuilders) {
    this(settings, MeterRegistry.NOOP, customBuilders);
}
  1. TODO: We have a bridge between plugin and ES, checking possibility.
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Jan 31, 2024
@yaauie
Copy link
Member

yaauie commented Jan 31, 2024

The option (1) doesn't seem to make sense to me. The new constructor is clearly the path that the ES team is preferring, so it doesn't makes sense to deprecate its use, or to re-factor it to route a portion of its functionality through another constructor


For Option (2), we could restore the constructor with interface ThreadPool(Settings, ExecutorBuilder<?>... customBuilders) present up to and including 8.12, keeping it deprecated to discourage additional adoption:

diff --git a/server/src/main/java/org/elasticsearch/threadpool/ThreadPool.java b/server/src/main/java/org/elasticsearch/threadpool/ThreadPool.java
index fef0d93ec86..5a7befdd40a 100644
--- a/server/src/main/java/org/elasticsearch/threadpool/ThreadPool.java
+++ b/server/src/main/java/org/elasticsearch/threadpool/ThreadPool.java
@@ -193,6 +193,11 @@ public class ThreadPool implements ReportingService<ThreadPoolInfo>, Scheduler {
         Setting.Property.NodeScope
     );
 
+    @Deprecated(since="8.13.0")
+    public ThreadPool(final Settings settings, final ExecutorBuilder<?>... customBuilders) {
+        this(settings, MeterRegistry.NOOP, customBuilders);
+    }
+
     @SuppressWarnings({ "rawtypes", "unchecked" })
     public ThreadPool(final Settings settings, MeterRegistry meterRegistry, final ExecutorBuilder<?>... customBuilders) {
         assert Node.NODE_NAME_SETTING.exists(settings);

For option (3), we would simply add a static method to the existing LogstashInternalBridge, and migrate the Logstash ElasticIntegration Filter's construction of the ThreadPool to use the bridge. This would ensure the external-to-this-project dependency on the ThreadPool's constructor would be made apparent, preventing future surprises.

The down-side of this approach is that we would need to backport it to the 8.12 series so that the ElasticIntegration builds would work against both Elasticsearch 8.12 and 8.13:

  • Targeting main/8.13:
    diff --git a/server/src/main/java/org/elasticsearch/ingest/LogstashInternalBridge.java b/server/src/main/java/org/elasticsearch/ingest/LogstashInternalBridge.java
    index ce79a36da8c..10c42b3898c 100644
    --- a/server/src/main/java/org/elasticsearch/ingest/LogstashInternalBridge.java
    +++ b/server/src/main/java/org/elasticsearch/ingest/LogstashInternalBridge.java
    @@ -8,6 +8,11 @@
     
     package org.elasticsearch.ingest;
     
    +import org.elasticsearch.common.settings.Settings;
    +import org.elasticsearch.telemetry.metric.MeterRegistry;
    +import org.elasticsearch.threadpool.ThreadPool;
    +import org.elasticsearch.threadpool.ExecutorBuilder;
    +
     /**
      * This Elastic-internal API bridge class exposes package-private components of Ingest
      * in a way that can be consumed by Logstash's Elastic Integration Filter without
    @@ -33,4 +38,13 @@ public class LogstashInternalBridge {
         public static void resetReroute(final IngestDocument ingestDocument) {
             ingestDocument.resetReroute();
         }
    +
    +    /**
    +     * @param settings
    +     * @param customBuilders
    +     * @return a new {@link ThreadPool} with a noop {@link MeterRegistry}
    +     */
    +    public static ThreadPool createThreadPool(final Settings settings, final ExecutorBuilder<?>... customBuilders) {
    +        return new ThreadPool(settings, MeterRegistry.NOOP, customBuilders);
    +    }
     }
  • Backport to 8.12:
    diff --git a/server/src/main/java/org/elasticsearch/ingest/LogstashInternalBridge.java b/server/src/main/java/org/elasticsearch/ingest/LogstashInternalBridge.java
    --- a/server/src/main/java/org/elasticsearch/ingest/LogstashInternalBridge.java
    +++ b/server/src/main/java/org/elasticsearch/ingest/LogstashInternalBridge.java
    @@ -8,6 +8,10 @@
     
     package org.elasticsearch.ingest;
     
    +import org.elasticsearch.common.settings.Settings;
    +import org.elasticsearch.threadpool.ThreadPool;
    +import org.elasticsearch.threadpool.ExecutorBuilder;
    +
     /**
      * This Elastic-internal API bridge class exposes package-private components of Ingest
      * in a way that can be consumed by Logstash's Elastic Integration Filter without
    @@ -33,4 +37,13 @@ public class LogstashInternalBridge {
         public static void resetReroute(final IngestDocument ingestDocument) {
             ingestDocument.resetReroute();
         }
    +
    +    /**
    +     * @param settings
    +     * @param customBuilders
    +     * @return a new {@link ThreadPool}
    +     */
    +    public static ThreadPool createThreadPool(final Settings settings, final ExecutorBuilder<?>... customBuilders) {
    +        return new ThreadPool(settings, customBuilders);
    +    }
     }

@DaveCTurner
Copy link
Contributor

When building the plugin, we internally use Elasticsearch sources.

Can you explain this a bit more clearly? From a quick read through the code, my understanding is that this is a Logstash plugin which imports and depends on a substantial fraction of the Elasticsearch codebase. If so, I think the problem is more fundamental than just the ThreadPool constructor change mentioned here. This code is not intended to run outside of Elasticsearch like this.

I don't think it's going to work to simply add things to the ES codebase that appear to ES developers to be dead code. I fully expect it will get spotted and refactored away at some point, especially if it ends up being in the way of some future change.

If we want to be able to run ingest pipelines outside of ES, we're going to need to make some bigger changes to support that use case properly. I'm going to ask @elastic/es-core-infra for comment too, and also ping @elastic/es-data-management who would likely end up taking on the bulk of any such changes since they own the ingest pipeline subsystem.

@DaveCTurner DaveCTurner added :Core/Infra/Core Core issues without another label and removed needs:triage Requires assignment of a team area label labels Jan 31, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Jan 31, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@roaksoax
Copy link

cc @tylerperk @flexitrev @jsvd @dakrone

@jsvd
Copy link
Member

jsvd commented Jan 31, 2024

@DaveCTurner the logstash-filter-elastic_integration plugin arose from a growing request of users with large deployments wanting to avoid the execution of ingest pipelines in Elasticsearch for multiple reasons:

  1. facilitating upgrade from beats modules (where most of the transformation happened at the edge) to agent integration without a very significant increase in cluster capacity
  2. allow Logstash-based post processing of data from integrations after it has been "schematized"/processed by the integration's pipelines
  3. allow said data to be sent to other destinations alongside ES (similar to what a vendor agnostic otel collector would do)

For this purpose we did an initial RFC for the feature back in Oct 2022 and included the Data Management team from the start, with the first - 0.0.1 - release in April 2023 and a few more since then.
During development we've hit multiple challenges in making the code work outside of ES like you have mentioned. Some of these have required introduction of code in ES to better support this different behaviour, such as the LogstashInternalBridge.java in #96958.

I hope the above brings some more context as to why and how we got here, happy to chat more through this or other medium.

@rjernst rjernst added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP and removed :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team labels Jan 31, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Jan 31, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@yaauie
Copy link
Member

yaauie commented Feb 5, 2024

I've opened two PR's:

@joegallo joegallo assigned joegallo and unassigned joegallo Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

8 participants