[Feature Branch] Java API #9342

andrewvc · 2018-04-09T18:47:06Z

@original-brownbear just opening this so we can track the deltas on this branch.

andrewvc · 2018-04-09T19:04:07Z

One thing that occurs to me, since we'll want to publish an artifact, is that we'll have to stop using the org.logstash package name since we don't actually own that.

we'll have to use net.logstash or co.elastic (or org.elasticsearch)

andrewvc · 2018-04-09T19:22:14Z

logstash-core/src/main/java/org/logstash/execution/Filter.java

+        @Override
+        public Collection<Event> filter(final Collection<Event> events) {
+            //TODO: Impl.
+            return events;


I assume this just returns a new collection. Something like:

Collection<Event> cloned = new ArrayList<>(events.size*2); events.forEach( e -> { cloned.add(e); cloned.add(e.clone()) }) return cloned;

@andrewvc jup will adjust soon :)

original-brownbear · 2018-04-09T20:42:26Z

@andrewvc yikes on the package change, that'll break at least the date filter. Should we maybe make that move in master independently of this PR to keep the noise level lower here?

Java input execution. Functioning stdin input, line codec, and stdout output.

Fixes #9387

Fixes #9388

original-brownbear · 2018-04-17T18:12:03Z

@danhermann we gotta fix the build here :) Taking a look at the ITs now.

original-brownbear · 2018-11-08T07:46:38Z

@danhermann it's back alive :) Fixed all merge conflicts and at least locally tests were green again.

yaauie

In general, I think this is pretty amazing stuff 🎉

I've left some comments in-line, but my primary issues are:

mismatch in abstractions between Codec#decode(ByteBuffer) and Codec#encode(OutuputStream) feels awkward and a little half-baked. This may be just a side-effect of us using "codec" within Logstash to encapsulate two related but different concepts (in which case I'm begrudgingly okay with it)
doc blocks in API interface methods could use clarity; separating the concerns between implementations requirements and client requirements may make things more clear.
validation gap; is Plugin#configSchema() used anywhere?

yaauie · 2018-12-14T17:12:11Z

logstash-core/src/main/java/co/elastic/logstash/api/Codec.java

+
+    /**
+     * Decodes events from the specified {@link ByteBuffer} and passes them to the provided
+     * {@link Consumer}. Clients of the codec are responsible for ensuring that the input buffer


At my first reading (and second and third, honestly), it looked like we were putting a lot of requirements on the implementation of the Codec, but I now understand that we are attempting to describe the whole relationship to both clients and implementations.

I think this section could benefit from bullet points:

The client (typically an {@link Input}) must provide a ByteBuffer that is ready for reading (with {@link ByteBuffer#position} indicating the next position to read and {@link ByteBuffer#limit} indicating the first byte in the buffer that is not safe to read)

Implementations of {@link Codec} must ensure that the {@link ByteBuffer#position} indicates the last-read position before returning control.

The client must then return the buffer to write mode (typically with {@link ByteBuffer#compact}) before writing more bytes.

yaauie · 2018-12-16T04:20:53Z

logstash-core/src/main/java/co/elastic/logstash/api/Codec.java

+     * Decodes events from the specified {@link ByteBuffer} and passes them to the provided
+     * {@link Consumer}. Clients of the codec are responsible for ensuring that the input buffer
+     * is in a valid state for reading. Upon completion of {@link Codec#decode}, the codec is
+     * responsible for ensuring that {@link ByteBuffer#limit} reflects the last point at which


Do we mean ByteBuffer#position?

I don't think that the "ensuring that {@link ByteBuffer#limit} reflects the last point at which input bytes were decoded to events" requirement is quite aligned with the spirit of Buffer#limit:

A buffer's limit is the index of the first element that should not be read or written
-- java.nio.Buffer

By setting ByteBuffer#limit to the point at which the codec stopped processing, we prevent the client from being able to reliably use ByteBuffer#compact() as we later declare in this comment that it should:

The bytes between the buffer's current position and its limit, if any, are copied to the beginning of the buffer. That is, the byte at index p = position() is copied to index zero, the byte at index p + 1 is copied to index one, and so forth until the byte at index limit() - 1 is copied to index n = limit() - 1 - p. The buffer's position is then set to n+1 and its limit is set to its capacity. The mark, if defined, is discarded.
-- ByteBuffer#compact()

By requiring that implementations set the ByteBuffer#limit to reflect "the last point at which input bytes were decoded to events", we are ensuring that bytes beyond that point are not moved up to the beginning of the byte buffer and setting them up to be overwritten.

Yes, the code functions properly using ByteBuffer#position but the comment mistakenly specifies ByteBuffer#limit. I'll fix it.

yaauie · 2018-12-16T04:23:08Z

logstash-core/src/main/java/co/elastic/logstash/api/Codec.java

+     * @param event The event to encode.
+     * @param output The stream to which the encoded event should be written.
+     */
+    void encode(Event event, OutputStream output);


The ByteBuffer/OutputStream mismatch between the encode and decode operations feels a bit awkward, like we're using one class for two distinct things (admittedly: we are). Can we converge on one shared abstraction?

I think that the shared abstraction is that decode takes a stream of bytes through the provided ByteBuffer and returns one or more Events and encode provides the reverse operation by taking an Event and writing a stream of bytes to the provided OutputStream. The logical interface is the same as the Ruby codecs in which the decode method takes a byte array and yields one or more Events and the encode method takes an Event and yields a string. Am I misunderstanding the shared abstraction that you'd like to see?

To me, the OutputStream is a higher level of abstraction than ByteBuffer, so it seems unbalanced that the Output gets to take advantage of simpler requirements while Input has to juggle with flipping the buffer between read and write modes.

It would make more sense to me if either both used ByteBuffer or if decode used an InputStream to match encode's OutputStream.

I think it would make sense to use ByteBuffers for both encoding and decoding.

A peculiar thing the current codec implementation in Logstash is that it does not differentiate between stream-based codecs (e.g. reading from a TCP stream) and event-based codecs (e.g. reading from a Kafka topic). If a single interface is defined for both use cases, I think a ByteBuffer-based would make the most sense.

@yaauie, @praseodym, I see what you're saying and I agree that using a ByteBuffer for encode as decode already does could make sense. I'm going to work on the other changes first and see if I can get this one in before feature freeze. If not, I'll add it to the requirements list for the beta phase if that's acceptable.

@danhermann, the explicit no_codec is great, I hadn't seen that before. Ensuring that codecs stay optional should be good enough for now.

For the near future, it'd be good to have a way to package 'codec plugins' that do not strictly conform to the Logstash codec interface, so that it would be possible to e.g. package a custom Netty ChannelInboundHandlerAdapter for use in a Netty ChannelPipeline, or package a custom Kafka custom Serde class. That is related to #9521 and I'm not sure if that's already in scope for this PR.

The more I look at it, the more I think that java.nio.channels.WritableByteChannel and java.nio.channels.ReadableByteChannel make the most sense for Codec#encode and Codec#decode respectively; there are a lot of usable implementations, including those that efficiently wrap InputStream, OutputStream, to eliminate unnecessary copying, and they allow the buffering (or lack thereof) to be solely a concern of the codecs, which I think makes sense because the codecs are the things that need to hold onto state (especially while decoding).

I have an 80% WIP to swap that out that I hope to complete in the morning, so keep an eye out.

I can see the point in using ByteBuffer for both encode and decode since ByteBuffer is equally usable for either stream-based or channel-based IO, but I think going all the way to Channel for both encode and decode unnecessarily disadvantages plugins that use stream-based IO. Most Java libraries use stream-based rather than channel-based IO, so I wouldn't want to impose that penalty on them.

There are the Channels.newChannel(InputStream) and Channels.newChannel(OutputStream) wrappers that allow us to consume streams as channels with very minimal overhead (e.g., no extra memory copying), so I still think it is a "nicer" abstraction to work with than passing ByteBuffers around. I'd like to at least get to a POC before ruling it out.

But the channel-based aplroach does diverge from the Ruby API enough that it would require shim implementations to support pipelines that mixed plugins from the two APIs, and that is more than I can commit to before the coming feature freeze.

I am okay with merging as-is, with ByteBuffer used on one and OutputStream used in the other IFF we include something like a v0 in the API'S package name (e.g., making this co.elastic.logstash.api.v0.Codec) and document that the v0 java API is not yet stable and requires compiling your plugin against the exact major/minor pair of Logstash. That would give us the freedom to iterate without breaking expectations.

I thought we always documented features in experimental status as subject to breaking changes?

yaauie · 2018-12-16T04:34:27Z

logstash-core/src/main/java/co/elastic/logstash/api/Input.java

+    }
+
+    @LogstashPlugin(name = "stream")
+    final class StreamInput implements Input {


would it be worth moving these example implementations either to their own files or to the test package?

Definitely. These are examples from the really early days of the Java API and I'll probably just remove them since the example Java plugins in https://github.com/logstash-plugins should take their place.

yaauie · 2018-12-16T04:43:43Z

logstash-core/src/main/java/org/logstash/plugins/inputs/Stdin.java

+                buffer.compact();
+            }
+        } catch (AsynchronousCloseException e2) {
+            // do nothing -- this happens when stop is called during a pending read


worth logging?

That exception occurs in response to the user action of either shutting down LS or reloading the pipeline when the input loop is waiting on the input.read(buffer) statement rather than in the codec.decode statements (in practice, it's usually the former). I'm not sure that's useful information to log, but I'm open to reasons why it might be.

If we have accepted bytes from the wire (e.g., TCP ACK), and those bytes don't end up contributing to events due to a shutdown or pipeline reload, having a clue somewhere in the logs could save a whole lot of debugging.

yaauie · 2018-12-16T04:44:01Z

logstash-core/src/main/java/org/logstash/plugins/inputs/Stdin.java

+            // do nothing -- this happens when stop is called during a pending read
+        } catch (IOException e) {
+            stopRequested = true;
+            throw new IllegalStateException(e);


logging here may prove helpful

yaauie · 2018-12-16T04:45:46Z

logstash-core/src/main/java/org/logstash/plugins/outputs/Stdout.java

+    }
+
+    Stdout(final Configuration configuration, final Context context, OutputStream targetStream) {
+        printer = new PrintStream(targetStream); // replace this with a codec


unresolved TODO

yaauie · 2018-12-16T05:10:01Z

logstash-core/src/main/java/co/elastic/logstash/api/Configuration.java

+    }
+
+    @SuppressWarnings("unchecked")
+    public <T> T get(final PluginConfigSpec<T> configSpec) {


I think we have a gap in validation that could lead to the silent acceptance of bad input that is then ignored.

Because this method is only ever called from the implementations with the PluginConfigSpecs that the particular implementation knows about, I believe we have nothing to validate that plugins correctly report errors when a pipeline configuration attempts to use config directives that the implementation does not know about.

Example:

input { stdin { codek => typo } }

I do see that we have a Plugin#configSchema that must return a `Collection<PluginConfigSpec<?>>, but I do not see it used anywhere to perform this validation.

Agreed. It wasn't explicit that config validation would be part of the beta phase of the API, so I just now added it as a requirement on the Java API meta issue.

yaauie · 2018-12-16T05:10:20Z

logstash-core/src/main/java/co/elastic/logstash/api/Configuration.java

+import java.util.Map;
+
+/**
+ * LS Configuration example. Should be implemented like Spark config or Hadoop job config classes.


is it an example, or is it the implementation?

It's the implementation. I'll update the javadoc accordingly.

danhermann · 2018-12-16T13:59:13Z

@yaauie, thanks for the detailed review! I will add some replies and make a number of changes later this afternoon.

danhermann · 2018-12-16T22:03:00Z

@yaauie, I think I have addressed all of your review comments except for the change to the codec interface which I might defer to the beta phase if that's ok. I'm working on resolving the many merge conflicts between this PR and the 6.x branch.

logstash-core/src/main/java/org/logstash/plugins/codecs/Line.java

yaauie · 2018-12-17T18:25:23Z

It looks like we have a couple errors in CI; once these are resolved, I'm 👍 for a merge.

17:49:42     An error occurred while loading ./logstash-core/spec/logstash/filter_delegator_spec.rb.
17:49:42     Failure/Error: Unable to find org/jruby/RubyModule.java to read failed line
17:49:42     
17:49:42     NameError:
17:49:42       uninitialized constant LogStash::FilterDelegator::DELEGATED_METHODS
17:49:42     # ./logstash-core/spec/logstash/filter_delegator_spec.rb:152:in `block in (root)'
17:49:42     # ./logstash-core/spec/logstash/filter_delegator_spec.rb:148:in `block in (root)'
17:49:42     # ./logstash-core/spec/logstash/filter_delegator_spec.rb:8:in `<main>'

danhermann · 2018-12-18T00:41:21Z

This work was completed in #10216

Start Java API Feature Branch

e40097d

andrewvc added work in progress Java API labels Apr 9, 2018

andrewvc commented Apr 9, 2018

View reviewed changes

original-brownbear self-assigned this Apr 9, 2018

andrewvc mentioned this pull request Apr 10, 2018

[META] Native support for Java plugins #9215

Closed

20 tasks

elastic deleted a comment from original-brownbear Apr 12, 2018

original-brownbear and others added 13 commits April 13, 2018 12:24

merge in master

22e87cc

merge in master

70f8169

merge in master

6fa078b

merge in master

7c77cb8

Merge remote-tracking branch 'elastic/master' into feature/java-api

5079107

Remove debug jvm option

0b7c28f

Merge in master

12a89e9

exclude guava

4ba5232

Pinn Guava to 20.0

d1e4124

Java input execution (#9360)

8c02360

Java input execution. Functioning stdin input, line codec, and stdout output.

Minimal plugin config, all unit tests now passing. (#9390)

9b72f6f

MINOR+BUG: Clean doesn't delete gem jars for logstash-core

f97714f

Fixes #9387

BUILD: Stop using Exec tasks for ITs

f2a8bb5

Fixes #9388

original-brownbear and others added 2 commits April 17, 2018 20:12

Merge remote-tracking branch 'elastic/master' into feature/java-api

572b1d1

Clean up stdin plugin to use config, remove unused variables (#9392)

1a476fa

original-brownbear mentioned this pull request Apr 17, 2018

Java API Proposal #9137

Closed

original-brownbear added 4 commits April 18, 2018 09:07

Merge remote-tracking branch 'elastic/master' into feature/java-api

2d84e4c

Merge remote-tracking branch 'elastic/master' into feature/java-api

b338adc

Merge branch 'master' into feature/java-api

b2e9807

Merge remote-tracking branch 'elastic/master' into feature/java-api

020e9b5

original-brownbear added 2 commits November 8, 2018 08:19

Merge remote-tracking branch 'elastic/master' into feature/java-api

9f58c74

Fix compile error from merge

ff8e4d0

original-brownbear assigned danhermann Nov 8, 2018

danhermann added 9 commits November 12, 2018 08:48

add Java codec discovery (#10137)

c7c6b44

support for Java filters and outputs (#10165)

931939c

move Java API to its own package (#10167)

dd3d377

Remove LS prefix for class names (#10168)

888b301

Fix metric error by initializing configName earlier (#10174)

4ed4883

support for Java filters packaged as gems (#10202)

06efd84

Move Java API to co.elastic.logstash.api package (#10203)

b226d74

Support for Java inputs, config support for all Java plugins (#10211)

2cad83a

Strongly-typed configuration values (#10214)

974d991

yaauie requested changes Dec 16, 2018

View reviewed changes

danhermann added 4 commits December 16, 2018 14:19

use codec in stdout output

12200e1

Improve Codec javadocs

f03168f

Remove obsolete plugin implementation examples

a88a904

Improve Configuration javadocs, add error logging for Stdin

92dba09

yaauie reviewed Dec 17, 2018

View reviewed changes

logstash-core/src/main/java/org/logstash/plugins/codecs/Line.java Show resolved Hide resolved

danhermann added 2 commits December 17, 2018 11:31

Move Codec to v0 package

7ec3704

move input, output, and filter interfaces to v0 package

71082b3

danhermann removed the work in progress label Dec 17, 2018

yaauie approved these changes Dec 17, 2018

View reviewed changes

danhermann mentioned this pull request Dec 17, 2018

Java plugin API (experimental) #10216

Closed

danhermann closed this Dec 18, 2018

jsvd deleted the feature/java-api branch October 11, 2019 14:53

[Feature Branch] Java API #9342

[Feature Branch] Java API #9342

Conversation

andrewvc commented Apr 9, 2018

andrewvc commented Apr 9, 2018

andrewvc Apr 9, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

original-brownbear commented Apr 9, 2018

original-brownbear commented Apr 17, 2018

original-brownbear commented Nov 8, 2018

yaauie left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danhermann commented Dec 16, 2018

danhermann commented Dec 16, 2018

yaauie commented Dec 17, 2018

danhermann commented Dec 18, 2018

andrewvc Apr 9, 2018 •

edited

Loading