[DISCUSS] Enable pure-Java plugins #7986

jordansissel · 2017-08-11T22:53:11Z

Goal: Design the API for pure Java plugins

Code Status - all code is work-in-progress and subject to commentary :)

Plugin Interfaces: Java API: Add interfaces for Inputs, Processors (filters), and Outputs. #8227
Runtime plugin configuration: Java API: Add mechanism for configuring an input/processor/output with a confg map #8240
Supporting ruby plugins: Java API: Allow ruby filters and outputs to appear as Java processors and outputs. #8230 (depends on Java API: Add interfaces for Inputs, Processors (filters), and Outputs. #8227)

Design/Discussion issues:

Plugin loading Java API: Design extension loading for new Java plugins #8375

WORK IN PROGRESS

Background: Logstash plugins still require at least some Ruby component(s) in order to load. We have several plugins now (beats, dissect, date, geoip, etc) that are written mostly in Java but still have some ruby parts.

Greenfield Opportunities:

✅ Remove filter plugin flush concept
❌ Merge filter and output plugin concepts into a unifying processor type
✅ Provide major categories of settings (SSLContext, HTTPClient) that can be shared
❔ Replace filter plugin filter_matched with something that is less a burden on the plugin author.
❔ Simplify decorate concept for inputs/codecs
✅ Move to rely on Java features where possible.

Key Objectives:

Plugin contains no Ruby code
Plugin tests contain no Ruby code
Stretch goal: Plugin build tools contain no Ruby code

Areas for design:

Plugin discovery and loading.
Plugin API.

Overview of Plugin API proposed:

This section is relevant to plugin authors.

Plugin Configuration

Today, we have the config class method on ruby Plugin classes:

class MyPlugin < Input
  config :foo, :validate => :string, ...
end

Ideally I would like to keep a similar model. The goal of the config method above is to make it simple for the author to define settings and have Logstash do the hard part of parsing, validating, and providing feedback to the user.

Accepted implementation (so far):

Have plugin instances accept a configuration object to configure themselves with.

This is what Tealess and Elasticsearch do.

Rejected implementation options:

Copy what Ingest Node does (Processor.Factory) (I don't like this, all validation/setup would be hand-coded by the author)
Factory pattern like PluginLoader.newInput(name, configuration)

Inputs

public interface Input {
  public void run(Consumer<Batch> consumer) {
    // Builds a Batch and calls consumer.accept(batch) when events are ready
    // Terminating conditions:
    //    * when Thread.isInterrupted() is true
    //    * when InterruptedException is received
    //    * if the input has a normal terminating condition (like Elasticsearch query is done, etc)
  }
}

filters and outputs

public interface Processor {
  public process(Batch batch) {
    // must abort if the thread is interrupted.
  }
}

optional new concept: triggers

Watcher has triggers that start inputs. Currently there is only scheduled triggers.

It is common for our input plugins to support cron schedules (jdbc, http_poller, etc)
other activity could be a trigger, such as an output plugin could trigger a job on an input.

I don't want to go into much detail here. This is optional and best discussed in a separate issue, but if we do want to explore this, I want the new API to be open to it.

(other notes here)

Base plugins

I would like to remove this concept. Currently we rely somewhat on subclassing, and maybe we don't' need this anymore.

Existing key Ruby methods:

Plugin#close - optional method for a plugin author to use to perform any tasks during shutdown.
Plugin#reloadable? - indicator of a plugin that supports reloading.

Proposals:

I think we can get rid of reloadable? by requiring all plugins be reloadable.
I think we should try to get rid of close.

Input plugins

Replace `stop` with `Thread.interrupt()`

Inputs are basically java.lang.Runnable that, today, can be commanded to terminate if their stop method is invoked from another thread. This is done today with an AtomicBoolean. In Java, all threads already have such a concept which we may be able to use for this which is to have stop call Thread.interrupt() and should we stop? be Thread.isInterrupted().

remove methods

I would also like to remove the register method. I'd also like to see if we can relieve plugin authors from having to invoke decorate themselves.

Filter plugins

The text was updated successfully, but these errors were encountered:

jakelandis · 2017-08-11T23:57:46Z

Re: Plugin Discovery

Have you considered an annotation driven approach ?

For example, I really like the way JSR-330 is implemented. This is the specification for Dependency Injection that is the brain child of Guice and Spring founders. They cleanly separated the specification from the implementation by simply providing well documented annotations and single generic interface.

The entire code for JSR-330 is here:
https://github.com/javax-inject/javax-inject/tree/master/src/javax/inject

Implementations, such as Guice, Spring, Weld, etc. all use the annotations but provide their own implementation. I assume that all implementations are scanning the class path on start up looking for these annotations. (Scanning your entire class path for annonations using reflections at startup is not a scary as it sounds).

I think there is an analogy to be drawn here with Logstash Plugin's. We could have a Plugin interface with @Input @Output @Filter, etc. annotations, or with parameters such as @Input(name="tcp"). At startup (or pipeline reload), LS could re-scan for new plugins.

jakelandis · 2017-08-12T00:09:39Z

Re: Plugin Configuration

Have plugin instances accept a configuration object to configure themselves with

+1

and have Logstash do the hard part of parsing, validating, and providing feedback to the user.

+1

The Builder pattern tends to lend itself nicely to extensible configuration options.

jakelandis · 2017-08-12T00:31:30Z

Re: Inputs and Outputs

I think we should try to minimize the amount of thought that developer will need with respect to a thread terminating. Perhaps providing optional lifecycle hooks (start, stop, reload) from the API. If needed we could document some guarantees on how long we will wait for the reload/stop methods to execute before simply killing the thread it is running on.

I am not sure I understand the need for the run method for the Input, won't the input be responsible for obtaining the events off the wire/file/etc. ? It would seem that the contract would be the lifecycle events and what to do with the event after it is has accepted it.

If we combine the filters and output to a singular unit (a Processor) , does it still makes sense within that unit to separate transform and sink ?

jordansissel · 2017-08-12T04:06:32Z

I am not sure I understand the need for the run method for the Input

This is the way Logstash Inputs work today. We can certainly do differently, and I am open to discussion.

Today, every input plugin has a run method. The pipeline uses this by doing basically this:

plugin = LogStash::Input::Stdin.new(config)
Thread.new { plugin.run(queue) }

I think this mechanism (which pretty much emulates Java's Runnable with small modifications) has served us well, even for inputs that do their own work management (kafka, rabbitmq, etc).

Having a start+stop, to me, puts the burden of task management on the plugin author where, for most plugins, there is generally only one while (true) { /* do work */ }.

If we combine the filters and output to a singular unit (a Processor) , does it still makes sense within that unit to separate transform and sink ?

Yes. As we have today an elasticsearch filter and elasticsearch output which perform very different tasks, we'll still have this in whatever future we build. The reason for unification, for me, is that both filter and output have the same characteristics:

filter and output are both handed an event
currently they are implemented with the same basic function signature but just different names

simply killing the thread

One does not simply walk into Mordor .. err .. kill threads ;)

Thread termination in Java requires cooperation on the part of the thread being killed.

I'm open to exploring start/stop lifecycles, though.

BTW, "reloadable" is more for the pipeline's concept of "reload" which means to read the config file, build a new pipeline, stop the old pipeline, and start a new one. Fundamentally, "reloadable" (I think?) really means "will this plugin terminate if I ask it to do so"

jordansissel · 2017-08-12T04:13:19Z

The entire code for JSR-330 is here:

I'm in favor of reusing an existing library or pattern. I read the docs on javax.inject and it's not clear to me the purpose of the @Inject annotation. I'll probably need to study it more to understand the purpose. Maybe we can chat next week and you can fill me in -- my confusion is mostly that the annotation alone seems to do nothing at all? Were you referring to the @Named annotation instead?

jordansissel · 2017-08-12T04:15:50Z

Regarding JSR-330's Provider<T>, this same interface is available by a different name in Java 8 as Supplier<T>.

jakelandis · 2017-08-12T13:23:45Z

I may have confused matters by bringing up JSR-330. I only meant that as an example of an API implemented multiple times, driven almost soley by annotations. (though using formal DI may be useful, that wasn't my point)

Annonations by themselves don't carry any functionality, it just a development time decision to semantically mark classes/methods with the expectation that the framework (in this case LS) will read those annotations and do something useful with them.

Thread termination in Java requires cooperation on the part of the thread being killed.

Only for a clean shut down :)

"reloadable" is more for the pipeline's concept

I assume that some actions (such as reseting the metrics) may be desirable for a plugin at reload time.

I can put together some sample code to better illustrate discovery via annotations and reflections.

jordansissel · 2017-08-12T16:02:55Z

Ahh yeah we annotations can be a way for plugin authors to describe properties of the plugin.

…

On Sat, Aug 12, 2017 at 6:24 AM Jake Landis ***@***.***> wrote: I may have confused matters by bringing up JSR-330. I only meant that as an example of an API implemented multiple times, driven almost soley by annotations. *(though using formal DI may be useful, that wasn't my point)* Annonations by themselves don't carry any functionality, it just a development time decision to semantically mark classes/methods with the expectation that the framework (in this case LS) will read those annotations and do something useful with them. Thread termination in Java requires cooperation on the part of the thread being killed. Only for a clean shut down :) "reloadable" is more for the pipeline's concept I assume that some actions (such as reseting the metrics) may be desirable for a plugin at reload time. I can put together some sample code to better illustrate discovery via annotations and reflections. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7986 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAIC6oRkCGNtqWddcbuzLFEcGrpVl_s_ks5sXadvgaJpZM4O1Nif> .

jordansissel · 2017-08-12T17:00:50Z

Something like:

@Name("stdin")
public class Stdin implements Input {
  // ...
}

?

jordansissel · 2017-08-14T17:02:44Z

I added two more options for plugin discovery/registration (copy elasticsearch's model, or try to use java.util.ServiceLoader)

guyboertje · 2017-08-16T09:12:54Z

I am working on a EventEmbellisher Java and JRuby ext to inject into a plugin that does the type, add_field, add_tag, remove_field and remove_tag functions. It has a simple API: a Constructor that takes all the data args expect type, isEmpty, withType(fluent) and apply(event). The Jruby ext exposes the same API (for use in Rubyland) with Ruby -> Java conversions as necessary and a Javaland getEmbellisher call to return the actual object to give to the plugin Java code.

For pure java plugins the EventEmbellisher instance can be accessed via the proposed Configuration class.

The JRuby extensions are still of use while we have the Ruby plugin base class and the input_base etc.

jakelandis · 2017-08-20T05:54:25Z

I took a swipe at a POC with the following goals:

Expose as little as possible to plugins. For example, don't expose the whole queue.
Try to keep the API small and in domain familiar terms.
Try to keep as much of the threading/looping complexity hidden from the plugin.
Clearly define a life-cycle.
Custom isolated class loaders
- Allow for a drop-in style plugin (or even http downloaded plugins)
- Allow different versions of transitive dependencies
- First step towards more secure sandboxed security managed plugins

The project is here: https://github.com/jakelandis/skunkstash

The project is FAR from production quality (and never will be)...but it has enough there to emulate the workflow of Logstash, with an Event generated by an input, passed to a queue, run through a filter, then sent to an output. This project can be used to help flush out ideas, in particular the API, isolated class loading, and threading model for plugins.

jordansissel · 2017-08-21T06:22:49Z

I haven't had time to review your repo (it's the weekend for me) but I am not sure what you mean about threading model -- I don't think I am proposing any threading model changes?

…

On Sat, Aug 19, 2017 at 10:54 PM Jake Landis ***@***.***> wrote: I took a swipe at a POC with the following goals: - Expose as little as possible to plugins. For example, don't expose the whole queue. - Try to keep the API small and in domain familiar terms. - Try to keep as much of the threading/looping complexity hidden from the plugin. - Clearly define a life-cycle. - Custom isolated class loaders - Allow for a drop-in style plugin (or even http downloaded plugins) - Allow different versions of transitive dependencies - First step towards more secure sandboxed security managed plugins The project is here: https://github.com/jakelandis/skunkstash The project is *FAR* from production quality (and never will be)...but it has enough there to emulate the workflow of Logstash, with an Event generated by an input, passed to a queue, run through a filter, then sent to an output. This project can be used to help flush out ideas, in particular the API, isolated class loading, and threading model for plugins. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7986 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAIC6uJpGQb9isZq0ydAO4zHe_cozxOWks5sZ8ojgaJpZM4O1Nif> .

jakelandis · 2017-08-21T15:05:16Z

Threading model is probably a poor choice of words. I am not trying to propose a change to the Threading strategy.

I really meant the Threading implementation (Executor service, Thread Factory, Runnable, Callable etc.) and Threading contract with the plugin itself (lifecycle management, state to/from thread, interrupt expectations, etc.)

The the POC is surely not a perfect replication, but hopefully close enough to help flush out ideas.

jordansissel · 2017-09-08T20:06:17Z

Current status:

I have a light interface defined for plugins: Input, Processor,(filter), Output
Also wrappers for Ruby filters and outputs
Some work on plugin entry point with a model similar to Elasticsearch and the existing Logstash "universal" plugin: https://github.com/jordansissel/logstash/blob/feature/java-plugin-api/logstash-core/src/main/java/org/logstash/plugin/Plugin.java
I spent today working on the plugin configuration (making it so that plugin authors can define config settings, validation, etc).

I talked with @talevy and others a few weeks back about plugin construction. We both aren't a fan of how IngestNode does the plugin factory things. In our discussion @talevy pointed me at ConstructingObjectParser which is very similar to what I was hoping we could accomplish for plugins in Logstash.

Unfortunately the ConstructingObjectParser code is pretty deeply embedded with Elasticsearch, so we'll write our own. It'll look very similar with the exception that our inputs will be config values instead of XContent and XContentParser.

jordansissel · 2017-09-08T20:07:45Z

I'm working in a personal branch right now as I figure some of these things out and experiment. Once I get to a good place, I intend to create a few separate PRs:

defining the plugin interfaces and api
for the ConstructingObjectParser and related code
for plugin building, discovery, and loading.

jordansissel · 2017-09-11T08:34:02Z

Been thinking about discovery, and I think Elasticsearch's model seems pretty good:

plugins install to {ES_HOME}/plugins/<name>/...
Any jars go in that directory.
also includes a plugin-descriptor.properties file which is scanned by Elasticsearch in order to know about the plugin: the main Class entrypoint, the human name for the plugin, version, etc.

For #7986

guyboertje · 2017-09-20T14:16:09Z

I thought about this while on holiday.

Do we need a listener/observer interface for plugins that allows for plugin authors to "trigger" certain activities in a loosely coupled way. It would add flexibility to the plugin callback mechanisms. An example of this would be filter_matched. I know that these are usually void methods that would operate on the event arg but I'm not sure how to do this with an encode activity that should produce an encoded version of the event as a string.

These listener/observer activities would be fired during an API call from the pipeline into the plugin and plugins are not required to fire any if they are not needed.

It may also be a way for the plugin author to influence plugin support building during initialisation - say in an input if the author needs to preprocess some config options then fire a setup activity with a 'properties' arg. By "plugin support" I mean the building of any kind of Consumer chain that describes the exact set of event manipulation functions that need to be applied to an event for the active LS config.

For #7986

danhermann · 2019-02-27T23:20:37Z

Closing in favor of #9215

jordansissel mentioned this issue Aug 11, 2017

Java projects overview #7900

Open

jordansissel mentioned this issue Sep 12, 2017

[WIP] Make it possible to write plugins entirely in Java. #8225

Closed

jordansissel added a commit that referenced this issue Sep 12, 2017

Add interfaces for Inputs, Processors (filters), and Outputs.

c267989

For #7986

jordansissel mentioned this issue Sep 12, 2017

Java API: Add interfaces for Inputs, Processors (filters), and Outputs. #8227

Closed

2 tasks

jordansissel added a commit that referenced this issue Sep 13, 2017

Add interfaces for Inputs, Processors (filters), and Outputs.

edbca89

For #7986

suyograo added high hanging fruit Java API v6.2.0 labels Sep 18, 2017

jordansissel mentioned this issue Sep 22, 2017

Java API: Design extension loading for new Java plugins #8375

Open

jordansissel added a commit that referenced this issue Sep 27, 2017

Add interfaces for Inputs, Processors (filters), and Outputs.

a71eadb

For #7986

jakelandis mentioned this issue Nov 29, 2017

Deploy to maven central. logstash-plugins/logstash-input-beats#271

Closed

yaauie self-assigned this Jan 8, 2018

guyboertje mentioned this issue Jun 12, 2018

Ensure that support for custom config setting validation is included in Java API design. #9733

Open

danhermann closed this as completed Feb 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCUSS] Enable pure-Java plugins #7986

[DISCUSS] Enable pure-Java plugins #7986

jordansissel commented Aug 11, 2017 •

edited

Loading

jakelandis commented Aug 11, 2017

jakelandis commented Aug 12, 2017

jakelandis commented Aug 12, 2017

jordansissel commented Aug 12, 2017 •

edited

Loading

jordansissel commented Aug 12, 2017

jordansissel commented Aug 12, 2017

jakelandis commented Aug 12, 2017 •

edited

Loading

jordansissel commented Aug 12, 2017 via email

jordansissel commented Aug 12, 2017

jordansissel commented Aug 14, 2017

guyboertje commented Aug 16, 2017

jakelandis commented Aug 20, 2017

jordansissel commented Aug 21, 2017 via email

jakelandis commented Aug 21, 2017

jordansissel commented Sep 8, 2017

jordansissel commented Sep 8, 2017

jordansissel commented Sep 11, 2017

guyboertje commented Sep 20, 2017

danhermann commented Feb 27, 2019

[DISCUSS] Enable pure-Java plugins #7986

[DISCUSS] Enable pure-Java plugins #7986

Comments

jordansissel commented Aug 11, 2017 • edited Loading

WORK IN PROGRESS

Overview of Plugin API proposed:

Plugin Configuration

Inputs

filters and outputs

optional new concept: triggers

Base plugins

Input plugins

Replace stop with Thread.interrupt()

remove methods

Filter plugins

jakelandis commented Aug 11, 2017

jakelandis commented Aug 12, 2017

jakelandis commented Aug 12, 2017

jordansissel commented Aug 12, 2017 • edited Loading

jordansissel commented Aug 12, 2017

jordansissel commented Aug 12, 2017

jakelandis commented Aug 12, 2017 • edited Loading

jordansissel commented Aug 12, 2017 via email

jordansissel commented Aug 12, 2017

jordansissel commented Aug 14, 2017

guyboertje commented Aug 16, 2017

jakelandis commented Aug 20, 2017

jordansissel commented Aug 21, 2017 via email

jakelandis commented Aug 21, 2017

jordansissel commented Sep 8, 2017

jordansissel commented Sep 8, 2017

jordansissel commented Sep 11, 2017

guyboertje commented Sep 20, 2017

danhermann commented Feb 27, 2019

jordansissel commented Aug 11, 2017 •

edited

Loading

Replace `stop` with `Thread.interrupt()`

jordansissel commented Aug 12, 2017 •

edited

Loading

jakelandis commented Aug 12, 2017 •

edited

Loading