Java API Proposal #9137

original-brownbear · 2018-02-13T13:59:57Z

Full suggestion for an API with:

Discovery of plugins via annotation
Definition of filter, input and output including sequence numbering for E2E acks
- Could add one for codecs too but it didn't seem essential at this point. A codec is simply a mapping of InputStream to Iterator<Event> or better yet Iterator<Map<String, Object>> imo
Low level Queue write and read clients
- We could obviously provide slower high-level wrappers here, but the suggested version is the fastest interface I could think of for now and imo we should go for the fastest possible in step 0 and provide wrappers instead of going for easily usable directly and wasting an opportunity for significant gains for plugins that want to go the extra mile for performance
Hope the JavaDoc is enough to get the general idea :)

danhermann · 2018-02-13T15:48:32Z

++ on the simplicity of the API and auto-discovery of plugins!

jakelandis · 2018-02-13T19:44:15Z

logstash-core/src/main/java/org/logstash/execution/DiscoverPlugins.java

+ */
+public final class DiscoverPlugins {
+
+    public static void main(final String... args) throws NoSuchMethodException {


What are your thoughts on isolated class loaders per plugin ?

@jakelandis I think I'd stay away from that for the time being (meaning, for while JRuby is around probably ...). Though I think this is something we can work towards once we got rid of JRuby types and the Event only contain standard JDK types (but as long as Event holds RubyString etc. that introduces a lot of potentially complex constraints for the JRuby version we have to enforce plugins and I'm not even sure it's easily possible in principle since JRuby sets up its own classloader for the JRuby classes).

I agree with your assessment of introduction of alot of complexity. However, if using a security managed LS is in the near future, it may be worth tackling that complexity sooner then later since its usually easier to bake things like that in rather then layer them in them later. Also if security managed LS is in the future, then maybe Java9 modules provides some wins there without isolated class loading (just speculation, I haven't researched).

I don't think a security managed LS is that close. I'd rather bite off that complexity a bit later.

jordansissel · 2018-02-13T19:47:21Z

A codec is simply a mapping of InputStream to Iterator

A codec is not what you describe. Codecs today operate on more than just streams. For example, the Redis input provides a fixed-size payload for each LPOP with no stream concept. I'm not sure if you're using InputStream as a metaphor or an actual implementation detail, but as a metaphor it might work, though the implementation detail does not (client libraries like Redis, Kafka, etc, do not provide anything that looks like an InputStream)

jakelandis · 2018-02-13T20:03:30Z

logstash-core/src/main/java/org/logstash/execution/DiscoverPlugins.java

+        for (final Class<?> cls : annotated) {
+            System.out.println(cls.getName());
+            System.out.println(((LogstashPlugin) cls.getAnnotations()[0]).name());
+            final Constructor<?> ctor = cls.getConstructor(LsConfiguration.class);


There is a small handful of things we want each plugin to have beyond it's configuration.

Access to the DLQ

Access to the scoped metric

Settings (maybe)

Logger (maybe)

Possibly x-cutting things in the future, like an abstracted TLS implementation or common JDBC abstractions.

What are your thoughts on handling pushing of things other then Configuration to the plugins ?

@jakelandis good point! We should have a second ctor argument that holds references to all this stuff ( I'd call it LogstashContext I guess). Will add it soon :)

added a "sketch" of this

original-brownbear · 2018-02-13T20:07:39Z

@jordansissel

A codec is not what you describe. Codecs today operate on more than just streams. For example, the Redis input provides a fixed-size payload for each LPOP with no stream concept. I'm not sure if you're using InputStream as a metaphor or an actual implementation detail, but as a metaphor it might work, though the implementation detail does not (client libraries like Redis, Kafka, etc, do not provide anything that looks like an InputStream)

Yea, see it as a metaphor for now. I guess the best 1:1 Java equivalent of what the current API gets as an input is Stream<String>. I guess here we could optimize a little though and offer some overrides on the Codec interface where applicable like Avro where we want an InputStream and kinda build one from a bunch of individual chunks of data.

andrewvc · 2018-02-13T20:30:26Z

EDIT I don't think this approach will work. After speaking with @jordansissel it doesn't map well to the file input, where we need stream identity on top of the input stream in a non-blocking fashion.

@jordansissel @original-brownbear Thinking about codecs, I think the 'real' signature is: Stream<InputStream> (or Iterator<InputStream> if you prefer). An InputStream encapsulates something with a beginning and possibly an end. The outer stream maps to message boundaries.

So, the stdin input would have a stream that only ever yields a single InputStream that never closes. The Redis input would yield a new InputStream for each message. This fixes a problem today where handling message boundaries is a bit weird. I prefer an InputStream to String because it is more clean than having Stream<Stream<String>> and it is a bit faster as well. It also is more suitable to binary formats. The codec author can choose to decode bytes to strings if they wish. We can even provide a convenience hook for them if they wish.

jakelandis · 2018-02-13T20:32:04Z

@original-brownbear (pulled from inline comment... thanks Github)

How does core know what is valid / not valid configuration and type ?

For example, i configure:

mutate {
junk => { "foo" => "bar" }
}

How does Logstash know to error ?

Does this require an external file to describe the valid configuration ? Also, where would deprecation warnings, obsoleted warnings, and validation happen ?

If it is an external file, can you provide an example of what that may look like ?

danhermann · 2018-02-13T20:44:40Z

How does core know what is valid / not valid configuration and type ? ... How does Logstash know to error ? ... Does this require an external file to describe the valid configuration ? Also, where would deprecation warnings, obsoleted warnings, and validation happen ?

I believe this validation happens now in each plugin's register method which is called at pipeline creation time and Logstash refuses to start if any plugins fail to register. I think the same validation could occur in the Java plugin's constructor which would also be invoked at pipeline creation time so Logstash could error at the same stage of the startup process if any plugin had invalid configuration information.

jakelandis · 2018-02-13T20:58:48Z

I believe this validation happens now in each plugin's register method which is called at pipeline creation time and Logstash refuses to start if any plugins fail to register.

I agree with this strategy for custom level validation. But for common validation, such as valid configuration options, type/file/other common validation, and deprecate/obsolete messaging we are pushing a lot to the author of the plugin vs. the ruby way which is already handled.

andrewvc

Thanks for the excellent effort here. I've left a number of questions / concerns inline.

WRT codecs, I think we should leave them out of V1 of this proposal. Not every input needs them, and we can add them later.

andrewvc · 2018-02-14T00:14:39Z

logstash-core/src/main/java/org/logstash/execution/Filter.java

+ */
+public interface Filter extends AutoCloseable {
+
+    QueueReader filter(QueueReader reader);


The QueueReader isn't always a queue, obviously it could just be a previous filter. What about PipelineReader as a name?

@andrewvc I have no opinion here tbh. We can go with PipelineReader :) Or maybe just EventIterator?

I like the latter, EventIterator sounds good.

andrewvc · 2018-02-14T00:15:03Z

logstash-core/src/main/java/org/logstash/execution/Filter.java

+
+    QueueReader filter(QueueReader reader);
+
+    void flush(boolean isShutdown);


I don't think we want to support flushing in the future, can we remove this from V1 here?

@andrewvc :D I'm always happen to remove flush why do you even ask :) Will remove it

andrewvc · 2018-02-14T00:18:09Z

logstash-core/src/main/java/org/logstash/execution/Filter.java

+                @Override
+                public long poll(final Event event) {
+                    final long seq = reader.poll(event);
+                    if (seq > -1L) {


When would the sequence be negative?

@andrewvc see https://github.com/elastic/logstash/pull/9137/files#diff-cf259dc78531f006beca5e143b529554R13 ... -1 indicates a failure to poll an event (think interrupts, timeouts).

Ah, of course :)

andrewvc · 2018-02-14T00:20:55Z

logstash-core/src/main/java/org/logstash/execution/Filter.java

+        public QueueReader filter(final QueueReader reader) {
+            return new QueueReader() {
+                @Override
+                public long poll(final Event event) {


We really do want the ability for filtering to be done batch-wise. This is the only way to efficiently do things like Elasticsearch lookups. We kind of getaway with not doing this with the JDBC filters, but that's only possible due to caching. Even with that, we can't exploit the ability to search multiple values in a single query for the cache seeding portion.

Does this API let you create new events? I understand we can drop events with Event.cancel, how would the split filter work?

@andrewvc the API doesn't prevent you from first collecting x Events before actually doing anything to them in a batched way and then returning another Reader that wraps batched results :) That's the beauty here imo ... if your plugin needs batches you can do batching, if it doesn't it can just save a bunch of memory by going one by one.

Also, there's nothing preventing you from creating new events here. Simply overwrite the given Event with the new event, don't increment the sequence number and don't call poll on the QueueReader and you created a new Event :)

Ah, I see! While I do like that I feel like we should implement higher level default implementations here for real-world plugins. The problem with patterns is that it's an extra place for bugs and code duplication. Previously each output would do its own batching and there'd be occasional bugs and differences between them.

Additionally, the batch size is set at the logstash level. We don't expose that option to the plugin, and each plugin would have to know what that size is to batch as the user is expecting it to. A default implementation could do that correctly.

Can we add the ability for plugin authors to instead define different callbacks as shown below? I think keeping the low-level API is fine, but I don't think it should be the default.

Something like...

//for the common case of writing a filter that takes one event and modifies (or cancels) an event public void filterSingle(Event event); //For the common case of filters that take a batch public void filterBatch(Collection<Event>);

WRT creating new events, I realize that's relatively rare, but I think some sugar would be nice. Implementing the split filter would be awkward as is, because you need to extract stuff out of the event, then queue it waiting for poll to be invoked again.

Where do these Event objects come from in this API? Are they from an Event pool? Would there be a way to simply ask for a new event from the pool instead without waiting for a new poll?

@original-brownbear so, in this example, I can get a new event by invoking what? It'd help me to see a short version of the clone filter written out here if you have the chance.

@andrewvc I think this is how I'd do clone here:

private Event clone; private long lastSeq = -1L; @Override public long poll(final Event event) { if (clone != null) { event.overwrite(clone); clone = null; return lastSeq; } final long seq = reader.poll(event); lastSeq = seq; if (seq > -1L) { clone = event.clone(); } return seq; }

Just don't poll upstream if there is a clone to return and don't increment the sequence number since it's still the result of the last "real" input event that was polled from upstream.

After meeting with @original-brownbear we decided to use a filter instance per pipeline pattern as we do today. This means that there will be one instance used across pipeline worker threads. These instances will not be invoked concurrently to prevent people from shooting themselves in the foot. There will be an annotation for opting out of this, much like the concurrency :shared option we have today.

We won't tackle sugaring the clone/split filter use case initially.

@andrewvc - can you clarify ?

we decided to use a filter instance per pipeline pattern as we do today. This means that there will be one instance used across pipeline worker threads.

Seem to be saying opposite things... do you mean one instance per pipeline worker thread, or do you really mean a single instance across all pipeline worker threads ?

One instance across all threads, just as we do today. One instance per pipeline, shared across all worker threads.

andrewvc · 2018-02-14T00:21:31Z

logstash-core/src/main/java/org/logstash/execution/Filter.java

+        }
+
+        @Override
+        public void flush(final boolean isShutdown) {


No need to include flush since it's an API we'll be deprecating soon.

andrewvc · 2018-02-14T00:29:46Z

logstash-core/src/main/java/org/logstash/execution/Filter.java

+         */
+        public Mutate(final LsConfiguration configuration, final LsContext context) {
+            this.field = configuration.getString("ls.plugin.mutate.field");
+            this.value = configuration.getString("ls.plugin.mutate.value");


Why do we need the boilerplate of ls.plugin.mutate.*, why can't we just have configuration.getString("value").

Can you go into a bit more detail as to how LsConfiguration would work? Is there a way to declare: "field must be a string" as you can today? This is important for driving a visual builder in the future.

@andrewvc you're right the prefixes aren't necessary :) WIll remove them :)

@andrewvc I'd suggest that a plugin has to implement a method that returns Map<String, LsConfigTypeInfo> configSchema() that describes each key's expected value (type, deprecated etc.).
Then we can use the return of that for a visual builder + also write validation logic using this map into the ConfigCompiler :) I'll add some code illustration for that approach soon :)

I think that's fine for the scope here.

In the future we should improve this to drive the visual builder. It'd be great to support a nested description as well. I'd love @ycombinator 's input here. What do we need to drive a pipeline builder UI? We should create a new issue for that discussion however. I think parity with the existing API is good for v1 here.

In a perfect world (likely not v1) the interface of a plugin (properties, their data types, any additional validation rules, short descriptive documentation, etc.) could be described in a language-agnostic file. That file could then be used (perhaps after further "compilation") by LS core as well as the LS UI. I'm deliberately leaving out the specifics of "used".

Alternatively, such an interface definition file could be generated from one of these classes as well, if that's possible.

The crux of the problem I'm trying to solve here is sharing of definitions between core and UI so we don't have to duplicate them in both places, keep them in sync, etc.

Of course, we shouldn't let this concern weigh down v1 unless solving for it later from the v1 design is going to be difficult.

andrewvc · 2018-02-14T00:32:31Z

logstash-core/src/main/java/org/logstash/execution/Input.java

+            try {
+                while (!stopped && inpt.hasNext()) {
+                    final String message = inpt.next();
+                    writer.push(Collections.singletonMap("message", message));


Can you talk me through why the writer writes Map objects, not Event objects? Was that more efficient given that a Map is more lightweight?

@andrewvc yea that was just an idea for optimizing things a little more. It's kinda annoying that we have all these Eventthat don't hold any state. For filters and outputs I can see the value in having Event around for serialization and such, but for Inputs it's just a waste of memory to already create Event when all inputs in fact simply create Map<String, Object>.

andrewvc · 2018-02-14T00:38:28Z

logstash-core/src/main/java/org/logstash/execution/QueueWriter.java

+     * Pushes a multiple events to the Queue, blocking for the given timeout if the Queue is not
+     * ready for a write.
+     * Guarantees that a return {@code != -1} means that all events were pushed to the Queue
+     * successfully and no partial writes of only a subset of the input events will ever occur.


We really can't make this guarantee with E2E ACKs, partial writes may happen.

@andrewvc I was under the impression that we want to fix that fact and force batch wise writes to the PQ to fix stuff like the HTTP input timing out?
To me it seems this is simply something to fix in the Queue implementation(s)?

Spoke with @original-brownbear we decided not to guarantee partial writes .

My mistake, we'll just remove this initially.

original-brownbear · 2018-02-20T13:19:05Z

Added an outline of how we could do config validation + exposing the config schema in cf44a52.

IMO we could just do something like that, where plugins are forced to return a list of config specs and then have the validation happen in the plugin factory.

andrewvc · 2018-02-20T17:47:16Z

logstash-core/src/main/java/org/logstash/execution/QueueWriter.java

+     * @param millis Timeout in millis
+     * @return Sequence number of the first event or -1 if push failed
+     */
+    long push(Collection<Map<String, Object>> events, long millis);


Spoke with @original-brownbear , we'll remove this in the MVP.

original-brownbear · 2018-02-21T07:55:42Z

@andrewvc alright, timeout and batched writes removed for now as discussed. What else can I do here to move this along so we can get going on implementing on top of it? :)

jakelandis · 2018-02-21T22:50:40Z

@original-brownbear

Added an outline of how we could do config validation + exposing the config schema in cf44a52.

Thanks, I think this pattern will work, but it is a bit verbose.

        @Override
        public Collection<PluginConfigSpec<?>> configSchema() {
            return Arrays.asList(
                new PluginConfigSpec<>("one", String.class, null, false),
                new PluginConfigSpec<>("two", String.class, "foo", true)
            );

Have you considered an annotation + reflection based approach ?

@Config 
String one;

@Config
@Deprecated
String two = "foo"

I have a working example over at https://github.com/jakelandis/skunkstash/blob/master/processors/src/main/java/org/logstash/skunk/plugin/processors/Translate.java#L19

andrewvc · 2018-02-21T23:33:40Z

@jakelandis how would you deal with nested hash-like structures as in the http poller?

I think annotations are the easiest way to deal with most things.

We'd also need to add an @Obsolete annotation for our purposes.

JSON Schema + Jackson would be an easy way to solve this from an implementation standpoint, but its pretty ugly for plugin authors.

original-brownbear · 2018-02-22T08:32:27Z

@jakelandis

Have you considered an annotation + reflection based approach?

Not really tbh. I really hate that one for two reasons:

You may make the code a little easier on the eyes with this stuff, but as soon as you have to debug a problem with it, it becomes really involved to figure out what's going on once you have the reflection magic in there (think line debugging cases where the wrong config is used for a plugin that you have twice in your pipeline, a different config each time).
The minimal code for bootstrapping this stuff becomes a lot more involved as can be seen here https://github.com/jakelandis/skunkstash/blob/master/core/src/main/java/org/logstash/skunk/App.java#L120
- Also, you still need something like public Collection<PluginConfigSpec<?>> configSchema() { for the UI as far as I understand. So that's gonna be another big chunk of magic, which as @andrewvc points out will be tricky for nested hashes which come really naturally if you just validate . separated hard-coded setting names in a flat map (don't worry about settings with a . in their name, we can always escape them when parsing).

Again (as in the poll method's case) I think this is something that can be resolved by providing some wrapping logic (in this case you could do an abstract base class that implements the Plugin interface by providing the configSchema method (with the annotation magic in it) and that you then use annotated fields in). I don't think this is something that we need to bake in right from the start and for debugging + test writing purposes also not something you want to bake into your most low-level implementation (imo).

andrewvc · 2018-02-22T17:40:29Z

We should try to show examples of configuring:

https://www.elastic.co/guide/en/logstash/current/plugins-inputs-http_poller.html
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-tcp.html

using this new API?

original-brownbear · 2018-02-23T11:38:30Z

@andrewvc see https://github.com/elastic/logstash/pull/9137/files#diff-63d7bea0e5acbcf6cb19e590783f202cR13 and https://github.com/elastic/logstash/pull/9137/files#diff-7336563974b577c2998327d7bbb4581bR74 :)

I didn't fully implement both plugins now, but I think I have a sufficient number of cases covered to see the way it works.
Imo, looking at the mutate filter, it's not that much more verbose than the current Ruby API (while being (unlike the Ruby one) 100% crystal clear in how a defined setting translates into the ability to look it up).

ycombinator · 2018-02-23T13:21:39Z

@andrewvc @original-brownbear Following-up to our discussion yesterday, here's what the UI would like to know about each plugin, in some sort of structured format (preferably JSON-based):

Its version
The Logstash core version range that this plugin is compatible with
Its type, one of input, filter, output, or codec
Its configuration options (aka settings). For each option:
- Its name
- Its datatype, one of boolean, number, string, password (or sensitive), filesystem path , array[<datatype>], object{<sub-option>:<datatype>,...}
- Any value constraints (these are nice to haves):
  - If the value is an enumeration, the enumeration options
  - If the value is a range, the range bounds, noting whether they are inclusive/exclusive
- Whether the option is required or optional
- A default value, if one exists

I have no opinion on the exact format of such a spec. As long as all the information above is encoded in the spec, the UI can work with it.

original-brownbear · 2018-02-23T14:49:49Z

@ycombinator that sounds very doable :) With the exception of the value constraints, we can easily serialize the current ConfigSpec proposal to JSON imo :) Only the value constraints we'd have to find some agreement on what kind constraints we want to expose to the UI I think.

ycombinator · 2018-02-23T15:18:01Z

That's awesome, @original-brownbear! Lets not worry about the value constrains for v1 then. We can add those in later and enhance the UI at the time as well. Thanks!

ycombinator · 2018-02-23T15:32:06Z

@original-brownbear I want to amend my statement in the previous comment :) Lets not worry about the range value constraint. It would be very nice to have enumerations expressed in the JSON spec. This would let the UI do things like show a drop down menu for the options, rather than having the user type something and make a mistake.

guyboertje · 2018-02-23T16:25:38Z

@andrewvc
Please have a look at jdbc_static for complex validation.

I over-rode the validate_value(value, validator) method so I can supply my own complex validation classes and have it validate at the same time as the other settings.

Ideally I would love to have a similar mechanism on the Java stuff.

andrewvc · 2018-02-23T19:11:29Z

logstash-core/src/main/java/org/logstash/execution/inputs/HttpPoller.java

+    private static final PluginConfigSpec<Path> CA_CERT_CONFIG =
+        LsConfiguration.pathSetting("cacert");
+
+    private static final PluginConfigSpec<Map<String, String>> URLS_CONFIG =


This doesn't really escapsulate the urls setting in the poller. The value can be either a String, as it is here, or a complex hash: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-http_poller.html . That was my reason for us trying to figure out that case here.

My suspicion is that we'll wind up with Map<String, Object>, and use code to sort out the mess.

I have no idea how to make this programmatic for a UI.

@ycombinator , your thoughts?

Reading https://www.elastic.co/guide/en/logstash/current/plugins-inputs-http_poller.html#plugins-inputs-http_poller-urls, it looks like this is always expected to be a hash? The Map<String, String> type in the code on this line seems to mirror that expectation too. I guess I'm not seeing how this could be set as "just" a single String?

I have no idea how to make this programmatic for a UI.

One way to encode this in a spec for the UI would be to use some sort of a special designation for user-defined keys in a hash/object data type. Let me try to elaborate below.

First, consider a setting whose datatype is a hash, but with keys provided by the plugin itself. For example, the schedule setting. In this case, the spec for this setting could be something like the following (stealing from ES mapping syntax for illustration purposes only):

{ "schedule": { "type": "object" // or "hash" if you prefer that "properties": { "cron": { "type": "string" }, "every": { "type": "string" }, "in": { "type": "string" }, "at": { "type": "string" } } } }

Now, consider the urls setting. The spec for that could be:

{ "urls": { "type": "object" // or "hash" if you prefer that "properties": { "__user_defined__": { "type": "string", "max_count": -1 // to indicate that there could an unlimited number of URLs } } } }

Thoughts?

@ycombinator @andrewvc my thought was that we could programmatically encapsulate what ES does as PluginConfigSpec<Map<String, PluginConfigSpec>> and build up a nested mapping that way.
For some reason, I forgot adding that part ... will be up soon.

@ycombinator it can be either a string or a hash. It's weird that way.

@original-brownbear ah, yes, that makes more sense.

Still, encoding this sort of thing into JSON is inherently strange. It's pretty complex to encode can be a string or a map, and even after you do encode that, how do you render the UI?

So, here's a thought, maybe there's some sort of 'override' UI spec the plugin can bundle. In this case, we don't need the UI to support the simple syntax, just the full one. We could let the plugin author manually specify what the UI controls should be instead of doing that programatically.

@andrewvc I'm good with that proposal — that the UI gets the "highest fidelity" syntax. We can always figure out optimizations/shortcuts later.

WDYT about my proposal for encoding plugin-specified keys vs. user-defined keys above?

@andrewvc here you go, quick sketch but the idea should be clear:

11f2718

I think you can simply encode this as a nesting of LsConfiguration. This way the UI can also render it as such with the ability to add and remove elements (that are essentially the same as the top-level element).

In this case, you could simply add a UI element that allows adding nested "rows" for each Map<String, LsConfiguration> typed setting.
Hopefully, this makes sense, happy to go into more detail, but I'm optimistic we can properly encode this for the UI :)

@ycombinator I tried to capture your proposal in some Java code.

I actually left out the spot with the flat hash I just noticed, my bad:

@SuppressWarnings("unchecked") public static <T> PluginConfigSpec<Map<String, T>> requiredFlatHashSetting( final String name, Class<T> type) { //TODO: enforce subtype return new PluginConfigSpec( name, Map.class, null, false, true ); } @SuppressWarnings("unchecked") public static PluginConfigSpec<Map<String, LsConfiguration>> requiredNestedHashSetting( final String name, final Collection<PluginConfigSpec<?>> spec) { return new PluginConfigSpec( name, Map.class, null, false, true, spec ); }

Something like this would be doable though that captures flat sub-hashes :)
I think we can easily render that in that or a similar ES style JSON for the UI.

@original-brownbear I have to admit, I'm a bit lost :) But that's okay! I mostly just want to know, from a UI POV, if we'll be able to encode the following:

A setting whose data type is a hash. Almost 100% certain this is possible :)

For such settings, there are two cases for the hash's keys:
a. Plugin-defined, like in the case of the schedule setting setting, and
b. User-defined, like in the case of the urls setting

The UI needs to be able to distinguish between these two cases by looking at the spec so it can decide whether to a) show "static" keys or b) make the key a text box field for the user to type something in, and also allow them to add N number of such rows.

So my question is, can this sort of distinction be generated from the Java API you are proposing?

@ycombinator yes, that's doable and was exactly the plan in the Java snippet above :)

original-brownbear · 2018-03-07T05:32:59Z

@andrewvc what do you have in mind for next steps here? :)

andrewvc

Left some further feedback.

andrewvc · 2018-03-08T01:33:30Z

logstash-core/src/main/java/org/logstash/execution/Filter.java

+                }
+
+                @Override
+                public long poll(final Event event, final long millis) {


Thinking through this, most people will need to write the same polling code over and over again.

I think we should create some simple default implementations here that people can use. I put two up here: https://github.com/original-brownbear/logstash/pull/1/files

WDYT? Batching is necessary for things like the ES filter, etc. It's pretty complicated to implement in terms of this API (I'm sure there's bugs in my code up there for instance).

I'd rather we promote those two filter APIs (and make similar ones for the Output class) than have people writing this boilerplate over and over again.

@andrewvc

Batching is necessary for things like the ES filter, etc

Yea maybe, but the efficient way of doing it is also highly plugin dependent. As of right now, I think it's fair to say that there are no plugins out there that actively make use of the batched interface in a quantifiable way. Otherwise, I wouldn't have found #8428 by mere chance :)

=> I'm not really comfortable with defining a batched API like that (yet). I'd much rather build a few plugins and see what works in the real world than guessing the right approach here now tbh.

I don't see how it's plugin dependent? The only reason filters don't batch today is that it isn't possible because they can only get one event at a time. I'd rather not box us in.

BTW, how does the efficiency of https://github.com/original-brownbear/logstash/pull/1/files#diff-eb733fe2c6eb811410b0e0a88226263d , it should be garbage free, aside from a single new Event[] call per batch, which is pretty cheap given a reasonable batch size.

Also, looking forward to your feedback on the SimpleFilter API.

I'd rather we only expose these APIs than the low level one here. The low level one is so implementation dependent it boxes us in. I'm also not sure who would actually want to use them directly, they're much more finnicky, and I'm not sure how much they buy a programmer in any situation I can think of.

andrewvc · 2018-03-08T03:12:16Z

logstash-core/src/main/java/org/logstash/execution/inputs/HttpPoller.java

+
+    private static final PluginConfigSpec<Map<String, LsConfiguration>> URLS_CONFIG =
+        LsConfiguration.requiredHashSetting(
+            "urls", Arrays.asList(URL_METHOD_CONFIG, USER_CONFIG)


I don't see how this encapsulates an HTTP poller configuration. urls can either look like:

urls = {"elastic" => "http://elastic.co"}

or

urls = {"elastic" => {"method" => "get", "url" => "http://elastic.co", "headers" => {...}}}

Maybe this is too flexible for Java?

@andrewvc hah you're right :) The "or" case that allows for two different schemas isn't covered by my proposal.

Tbh. ... I'm tempted to vote for not allowing that in Java plugins. The only downside to making that simplification is, that we'd have to adjust the config schema for the HTTP poller if we fully port it to Java (and the HTTP poller probably isn't the only case). Other than that I only see upsides to simplifying this (especially when thinking about the UI!) ...

Think it's feasible to break with the flexibility of the Ruby implementation here for the sake of type safety?

Wow, I never responded here. At any rate, yes, I think it's fine to break with Ruby here.

andrewvc · 2018-03-08T03:16:41Z

One further thought, since the plugin objects have state, I take it we will have to instantiate one plugin per worker thread? That's tricky, but OK so long as there's some way to share state across instances of the plugin with the same ID across threads. I think that means we'll need each plugin to be able to share some sort of context object. I think there should be some sort of makeContextObject method that plugins can implement, the context object would be instantiated by the pipeline and passed to each instance per worker.

andrewvc · 2018-03-08T03:17:14Z

@guyboertje can you take a look at the validators here and let us know what could be improved in this proposal based on your work there?

original-brownbear · 2018-03-19T12:33:00Z

@guyboertje ping :) I'd like to be able to continue here and that part will be quite a bit of code still. Better to speak up before I start moving on this if possible :)
Thanks!

guyboertje · 2018-03-22T16:56:51Z

It seems weird to me that you ask me to spend hours to fully understand what you are proposing and whether it does or does not suit the current custom validation that I cited when it would take you 30 minutes to read the current custom validation and reach the same conclusion.

If after you look at the existing code and you say "we got that covered" then I'm good with that.

original-brownbear · 2018-04-03T18:26:11Z

just to give this some structure I'll start linking the dependent PRs that I'm opening/merging in that'll get us the necessary compiler logic for the Java plugins :)

for now, waiting on #9320

original-brownbear · 2018-04-17T18:15:13Z

closing in favour of continuing in #9342 :)

original-brownbear added work in progress discuss labels Feb 13, 2018

jakelandis reviewed Feb 13, 2018

View reviewed changes

andrewvc reviewed Feb 14, 2018

View reviewed changes

original-brownbear force-pushed the java-api branch from 8e34ab6 to cf44a52 Compare February 20, 2018 13:17

andrewvc reviewed Feb 20, 2018

View reviewed changes

original-brownbear added 6 commits February 23, 2018 12:04

start

1070eed

LS context

0f6940c

back

79b379d

outline config spec

58f95f5

remove timeout and batch writes

fdc6702

Config spec

df9bb6c

original-brownbear force-pushed the java-api branch from 0f3ea63 to df9bb6c Compare February 23, 2018 11:37

Remove flush

1b16b09

andrewvc reviewed Feb 23, 2018

View reviewed changes

Config spec

11f2718

andrewvc requested changes Mar 8, 2018

View reviewed changes

andrewvc mentioned this pull request Mar 8, 2018

[META] Native support for Java plugins #9215

Closed

20 tasks

original-brownbear closed this Apr 17, 2018


		QueueReader filter(QueueReader reader);

		void flush(boolean isShutdown);

Java API Proposal #9137

Java API Proposal #9137

Conversation

original-brownbear commented Feb 13, 2018

danhermann commented Feb 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jordansissel commented Feb 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

original-brownbear commented Feb 13, 2018

andrewvc commented Feb 13, 2018 • edited Loading

jakelandis commented Feb 13, 2018

danhermann commented Feb 13, 2018

jakelandis commented Feb 13, 2018

andrewvc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ycombinator Feb 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

original-brownbear commented Feb 20, 2018

Choose a reason for hiding this comment

original-brownbear commented Feb 21, 2018

jakelandis commented Feb 21, 2018

andrewvc commented Feb 21, 2018

original-brownbear commented Feb 22, 2018 • edited Loading

andrewvc commented Feb 22, 2018

original-brownbear commented Feb 23, 2018 • edited Loading

ycombinator commented Feb 23, 2018 • edited Loading

original-brownbear commented Feb 23, 2018

ycombinator commented Feb 23, 2018

ycombinator commented Feb 23, 2018

guyboertje commented Feb 23, 2018

Choose a reason for hiding this comment

ycombinator Feb 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

original-brownbear Feb 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

original-brownbear commented Mar 7, 2018

andrewvc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewvc commented Feb 13, 2018 •

edited

Loading

ycombinator Feb 14, 2018 •

edited

Loading

original-brownbear commented Feb 22, 2018 •

edited

Loading

original-brownbear commented Feb 23, 2018 •

edited

Loading

ycombinator commented Feb 23, 2018 •

edited

Loading

ycombinator Feb 23, 2018 •

edited

Loading

original-brownbear Feb 23, 2018 •

edited

Loading