Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java API Proposal #9137

Closed

Conversation

original-brownbear
Copy link
Member

@andrewvc @danhermann @yaauie

Full suggestion for an API with:

  • Discovery of plugins via annotation
  • Definition of filter, input and output including sequence numbering for E2E acks
    • Could add one for codecs too but it didn't seem essential at this point. A codec is simply a mapping of InputStream to Iterator<Event> or better yet Iterator<Map<String, Object>> imo
  • Low level Queue write and read clients
    • We could obviously provide slower high-level wrappers here, but the suggested version is the fastest interface I could think of for now and imo we should go for the fastest possible in step 0 and provide wrappers instead of going for easily usable directly and wasting an opportunity for significant gains for plugins that want to go the extra mile for performance
  • Hope the JavaDoc is enough to get the general idea :)

@danhermann
Copy link
Contributor

++ on the simplicity of the API and auto-discovery of plugins!

*/
public final class DiscoverPlugins {

public static void main(final String... args) throws NoSuchMethodException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on isolated class loaders per plugin ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakelandis I think I'd stay away from that for the time being (meaning, for while JRuby is around probably ...). Though I think this is something we can work towards once we got rid of JRuby types and the Event only contain standard JDK types (but as long as Event holds RubyString etc. that introduces a lot of potentially complex constraints for the JRuby version we have to enforce plugins and I'm not even sure it's easily possible in principle since JRuby sets up its own classloader for the JRuby classes).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your assessment of introduction of alot of complexity. However, if using a security managed LS is in the near future, it may be worth tackling that complexity sooner then later since its usually easier to bake things like that in rather then layer them in them later. Also if security managed LS is in the future, then maybe Java9 modules provides some wins there without isolated class loading (just speculation, I haven't researched).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a security managed LS is that close. I'd rather bite off that complexity a bit later.

@jordansissel
Copy link
Contributor

A codec is simply a mapping of InputStream to Iterator

A codec is not what you describe. Codecs today operate on more than just streams. For example, the Redis input provides a fixed-size payload for each LPOP with no stream concept. I'm not sure if you're using InputStream as a metaphor or an actual implementation detail, but as a metaphor it might work, though the implementation detail does not (client libraries like Redis, Kafka, etc, do not provide anything that looks like an InputStream)

for (final Class<?> cls : annotated) {
System.out.println(cls.getName());
System.out.println(((LogstashPlugin) cls.getAnnotations()[0]).name());
final Constructor<?> ctor = cls.getConstructor(LsConfiguration.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a small handful of things we want each plugin to have beyond it's configuration.

  • Access to the DLQ
  • Access to the scoped metric
  • Settings (maybe)
  • Logger (maybe)
  • Possibly x-cutting things in the future, like an abstracted TLS implementation or common JDBC abstractions.

What are your thoughts on handling pushing of things other then Configuration to the plugins ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakelandis good point! We should have a second ctor argument that holds references to all this stuff ( I'd call it LogstashContext I guess). Will add it soon :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a "sketch" of this

@original-brownbear
Copy link
Member Author

@jordansissel

A codec is not what you describe. Codecs today operate on more than just streams. For example, the Redis input provides a fixed-size payload for each LPOP with no stream concept. I'm not sure if you're using InputStream as a metaphor or an actual implementation detail, but as a metaphor it might work, though the implementation detail does not (client libraries like Redis, Kafka, etc, do not provide anything that looks like an InputStream)

Yea, see it as a metaphor for now. I guess the best 1:1 Java equivalent of what the current API gets as an input is Stream<String>. I guess here we could optimize a little though and offer some overrides on the Codec interface where applicable like Avro where we want an InputStream and kinda build one from a bunch of individual chunks of data.

@andrewvc
Copy link
Contributor

andrewvc commented Feb 13, 2018

EDIT I don't think this approach will work. After speaking with @jordansissel it doesn't map well to the file input, where we need stream identity on top of the input stream in a non-blocking fashion.

@jordansissel @original-brownbear Thinking about codecs, I think the 'real' signature is: Stream<InputStream> (or Iterator<InputStream> if you prefer). An InputStream encapsulates something with a beginning and possibly an end. The outer stream maps to message boundaries.

So, the stdin input would have a stream that only ever yields a single InputStream that never closes. The Redis input would yield a new InputStream for each message. This fixes a problem today where handling message boundaries is a bit weird. I prefer an InputStream to String because it is more clean than having Stream<Stream<String>> and it is a bit faster as well. It also is more suitable to binary formats. The codec author can choose to decode bytes to strings if they wish. We can even provide a convenience hook for them if they wish.

@jakelandis
Copy link
Contributor

@original-brownbear (pulled from inline comment... thanks Github)

How does core know what is valid / not valid configuration and type ?

For example, i configure:

mutate {
junk => { "foo" => "bar" }
}

How does Logstash know to error ?

Does this require an external file to describe the valid configuration ? Also, where would deprecation warnings, obsoleted warnings, and validation happen ?

If it is an external file, can you provide an example of what that may look like ?

@danhermann
Copy link
Contributor

How does core know what is valid / not valid configuration and type ? ... How does Logstash know to error ? ... Does this require an external file to describe the valid configuration ? Also, where would deprecation warnings, obsoleted warnings, and validation happen ?

I believe this validation happens now in each plugin's register method which is called at pipeline creation time and Logstash refuses to start if any plugins fail to register. I think the same validation could occur in the Java plugin's constructor which would also be invoked at pipeline creation time so Logstash could error at the same stage of the startup process if any plugin had invalid configuration information.

@jakelandis
Copy link
Contributor

I believe this validation happens now in each plugin's register method which is called at pipeline creation time and Logstash refuses to start if any plugins fail to register.

I agree with this strategy for custom level validation. But for common validation, such as valid configuration options, type/file/other common validation, and deprecate/obsolete messaging we are pushing a lot to the author of the plugin vs. the ruby way which is already handled.

Copy link
Contributor

@andrewvc andrewvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the excellent effort here. I've left a number of questions / concerns inline.

WRT codecs, I think we should leave them out of V1 of this proposal. Not every input needs them, and we can add them later.

*/
public interface Filter extends AutoCloseable {

QueueReader filter(QueueReader reader);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The QueueReader isn't always a queue, obviously it could just be a previous filter. What about PipelineReader as a name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewvc I have no opinion here tbh. We can go with PipelineReader :) Or maybe just EventIterator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the latter, EventIterator sounds good.


QueueReader filter(QueueReader reader);

void flush(boolean isShutdown);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to support flushing in the future, can we remove this from V1 here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewvc :D I'm always happen to remove flush why do you even ask :) Will remove it

@Override
public long poll(final Event event) {
final long seq = reader.poll(event);
if (seq > -1L) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would the sequence be negative?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewvc see https://github.com/elastic/logstash/pull/9137/files#diff-cf259dc78531f006beca5e143b529554R13 ... -1 indicates a failure to poll an event (think interrupts, timeouts).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, of course :)

public QueueReader filter(final QueueReader reader) {
return new QueueReader() {
@Override
public long poll(final Event event) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really do want the ability for filtering to be done batch-wise. This is the only way to efficiently do things like Elasticsearch lookups. We kind of getaway with not doing this with the JDBC filters, but that's only possible due to caching. Even with that, we can't exploit the ability to search multiple values in a single query for the cache seeding portion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this API let you create new events? I understand we can drop events with Event.cancel, how would the split filter work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewvc the API doesn't prevent you from first collecting x Events before actually doing anything to them in a batched way and then returning another Reader that wraps batched results :) That's the beauty here imo ... if your plugin needs batches you can do batching, if it doesn't it can just save a bunch of memory by going one by one.

Also, there's nothing preventing you from creating new events here. Simply overwrite the given Event with the new event, don't increment the sequence number and don't call poll on the QueueReader and you created a new Event :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see! While I do like that I feel like we should implement higher level default implementations here for real-world plugins. The problem with patterns is that it's an extra place for bugs and code duplication. Previously each output would do its own batching and there'd be occasional bugs and differences between them.

Additionally, the batch size is set at the logstash level. We don't expose that option to the plugin, and each plugin would have to know what that size is to batch as the user is expecting it to. A default implementation could do that correctly.

Can we add the ability for plugin authors to instead define different callbacks as shown below? I think keeping the low-level API is fine, but I don't think it should be the default.

Something like...

//for the common case of writing a filter that takes one event and modifies (or cancels) an event
public void filterSingle(Event event);

//For the common case of filters that take a batch
public void filterBatch(Collection<Event>);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WRT creating new events, I realize that's relatively rare, but I think some sugar would be nice. Implementing the split filter would be awkward as is, because you need to extract stuff out of the event, then queue it waiting for poll to be invoked again.

Where do these Event objects come from in this API? Are they from an Event pool? Would there be a way to simply ask for a new event from the pool instead without waiting for a new poll?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@original-brownbear so, in this example, I can get a new event by invoking what? It'd help me to see a short version of the clone filter written out here if you have the chance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewvc I think this is how I'd do clone here:


               private Event clone;

               private long lastSeq = -1L;

                @Override
                public long poll(final Event event) {
                    if (clone != null) {
                        event.overwrite(clone);
                        clone = null;
                        return lastSeq;
                    }
                    final long seq = reader.poll(event);
                    lastSeq = seq;
                    if (seq > -1L) {
                        clone = event.clone();
                    }
                    return seq;
                }

Just don't poll upstream if there is a clone to return and don't increment the sequence number since it's still the result of the last "real" input event that was polled from upstream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After meeting with @original-brownbear we decided to use a filter instance per pipeline pattern as we do today. This means that there will be one instance used across pipeline worker threads. These instances will not be invoked concurrently to prevent people from shooting themselves in the foot. There will be an annotation for opting out of this, much like the concurrency :shared option we have today.

We won't tackle sugaring the clone/split filter use case initially.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewvc - can you clarify ?

we decided to use a filter instance per pipeline pattern as we do today. This means that there will be one instance used across pipeline worker threads.

Seem to be saying opposite things... do you mean one instance per pipeline worker thread, or do you really mean a single instance across all pipeline worker threads ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One instance across all threads, just as we do today. One instance per pipeline, shared across all worker threads.

}

@Override
public void flush(final boolean isShutdown) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to include flush since it's an API we'll be deprecating soon.

*/
public Mutate(final LsConfiguration configuration, final LsContext context) {
this.field = configuration.getString("ls.plugin.mutate.field");
this.value = configuration.getString("ls.plugin.mutate.value");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the boilerplate of ls.plugin.mutate.*, why can't we just have configuration.getString("value").

Can you go into a bit more detail as to how LsConfiguration would work? Is there a way to declare: "field must be a string" as you can today? This is important for driving a visual builder in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewvc you're right the prefixes aren't necessary :) WIll remove them :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewvc I'd suggest that a plugin has to implement a method that returns Map<String, LsConfigTypeInfo> configSchema() that describes each key's expected value (type, deprecated etc.).
Then we can use the return of that for a visual builder + also write validation logic using this map into the ConfigCompiler :) I'll add some code illustration for that approach soon :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine for the scope here.

In the future we should improve this to drive the visual builder. It'd be great to support a nested description as well. I'd love @ycombinator 's input here. What do we need to drive a pipeline builder UI? We should create a new issue for that discussion however. I think parity with the existing API is good for v1 here.

Copy link
Contributor

@ycombinator ycombinator Feb 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a perfect world (likely not v1) the interface of a plugin (properties, their data types, any additional validation rules, short descriptive documentation, etc.) could be described in a language-agnostic file. That file could then be used (perhaps after further "compilation") by LS core as well as the LS UI. I'm deliberately leaving out the specifics of "used".

Alternatively, such an interface definition file could be generated from one of these classes as well, if that's possible.

The crux of the problem I'm trying to solve here is sharing of definitions between core and UI so we don't have to duplicate them in both places, keep them in sync, etc.

Of course, we shouldn't let this concern weigh down v1 unless solving for it later from the v1 design is going to be difficult.

try {
while (!stopped && inpt.hasNext()) {
final String message = inpt.next();
writer.push(Collections.singletonMap("message", message));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you talk me through why the writer writes Map objects, not Event objects? Was that more efficient given that a Map is more lightweight?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewvc yea that was just an idea for optimizing things a little more. It's kinda annoying that we have all these Eventthat don't hold any state. For filters and outputs I can see the value in having Event around for serialization and such, but for Inputs it's just a waste of memory to already create Event when all inputs in fact simply create Map<String, Object>.

* Pushes a multiple events to the Queue, blocking for the given timeout if the Queue is not
* ready for a write.
* Guarantees that a return {@code != -1} means that all events were pushed to the Queue
* successfully and no partial writes of only a subset of the input events will ever occur.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really can't make this guarantee with E2E ACKs, partial writes may happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewvc I was under the impression that we want to fix that fact and force batch wise writes to the PQ to fix stuff like the HTTP input timing out?
To me it seems this is simply something to fix in the Queue implementation(s)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke with @original-brownbear we decided not to guarantee partial writes .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, we'll just remove this initially.

@original-brownbear
Copy link
Member Author

Added an outline of how we could do config validation + exposing the config schema in cf44a52.

IMO we could just do something like that, where plugins are forced to return a list of config specs and then have the validation happen in the plugin factory.

* @param millis Timeout in millis
* @return Sequence number of the first event or -1 if push failed
*/
long push(Collection<Map<String, Object>> events, long millis);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke with @original-brownbear , we'll remove this in the MVP.

@original-brownbear
Copy link
Member Author

@andrewvc alright, timeout and batched writes removed for now as discussed. What else can I do here to move this along so we can get going on implementing on top of it? :)

@jakelandis
Copy link
Contributor

@original-brownbear

Added an outline of how we could do config validation + exposing the config schema in cf44a52.

Thanks, I think this pattern will work, but it is a bit verbose.

        @Override
        public Collection<PluginConfigSpec<?>> configSchema() {
            return Arrays.asList(
                new PluginConfigSpec<>("one", String.class, null, false),
                new PluginConfigSpec<>("two", String.class, "foo", true)
            );

Have you considered an annotation + reflection based approach ?

@Config 
String one;

@Config
@Deprecated
String two = "foo"

I have a working example over at https://github.com/jakelandis/skunkstash/blob/master/processors/src/main/java/org/logstash/skunk/plugin/processors/Translate.java#L19

@andrewvc
Copy link
Contributor

@jakelandis how would you deal with nested hash-like structures as in the http poller?

I think annotations are the easiest way to deal with most things.

We'd also need to add an @Obsolete annotation for our purposes.

JSON Schema + Jackson would be an easy way to solve this from an implementation standpoint, but its pretty ugly for plugin authors.

@original-brownbear
Copy link
Member Author

original-brownbear commented Feb 22, 2018

@jakelandis

Have you considered an annotation + reflection based approach?

Not really tbh. I really hate that one for two reasons:

  • You may make the code a little easier on the eyes with this stuff, but as soon as you have to debug a problem with it, it becomes really involved to figure out what's going on once you have the reflection magic in there (think line debugging cases where the wrong config is used for a plugin that you have twice in your pipeline, a different config each time).
  • The minimal code for bootstrapping this stuff becomes a lot more involved as can be seen here https://github.com/jakelandis/skunkstash/blob/master/core/src/main/java/org/logstash/skunk/App.java#L120
    • Also, you still need something like public Collection<PluginConfigSpec<?>> configSchema() { for the UI as far as I understand. So that's gonna be another big chunk of magic, which as @andrewvc points out will be tricky for nested hashes which come really naturally if you just validate . separated hard-coded setting names in a flat map (don't worry about settings with a . in their name, we can always escape them when parsing).

Again (as in the poll method's case) I think this is something that can be resolved by providing some wrapping logic (in this case you could do an abstract base class that implements the Plugin interface by providing the configSchema method (with the annotation magic in it) and that you then use annotated fields in). I don't think this is something that we need to bake in right from the start and for debugging + test writing purposes also not something you want to bake into your most low-level implementation (imo).

@andrewvc
Copy link
Contributor

@original-brownbear
Copy link
Member Author

original-brownbear commented Feb 23, 2018

@andrewvc see https://github.com/elastic/logstash/pull/9137/files#diff-63d7bea0e5acbcf6cb19e590783f202cR13 and https://github.com/elastic/logstash/pull/9137/files#diff-7336563974b577c2998327d7bbb4581bR74 :)

I didn't fully implement both plugins now, but I think I have a sufficient number of cases covered to see the way it works.
Imo, looking at the mutate filter, it's not that much more verbose than the current Ruby API (while being (unlike the Ruby one) 100% crystal clear in how a defined setting translates into the ability to look it up).

@ycombinator
Copy link
Contributor

ycombinator commented Feb 23, 2018

@andrewvc @original-brownbear Following-up to our discussion yesterday, here's what the UI would like to know about each plugin, in some sort of structured format (preferably JSON-based):

  • Its version
  • The Logstash core version range that this plugin is compatible with
  • Its type, one of input, filter, output, or codec
  • Its configuration options (aka settings). For each option:
    • Its name
    • Its datatype, one of boolean, number, string, password (or sensitive), filesystem path , array[<datatype>], object{<sub-option>:<datatype>,...}
    • Any value constraints (these are nice to haves):
      • If the value is an enumeration, the enumeration options
      • If the value is a range, the range bounds, noting whether they are inclusive/exclusive
    • Whether the option is required or optional
    • A default value, if one exists

I have no opinion on the exact format of such a spec. As long as all the information above is encoded in the spec, the UI can work with it.

@original-brownbear
Copy link
Member Author

@ycombinator that sounds very doable :) With the exception of the value constraints, we can easily serialize the current ConfigSpec proposal to JSON imo :) Only the value constraints we'd have to find some agreement on what kind constraints we want to expose to the UI I think.

@ycombinator
Copy link
Contributor

That's awesome, @original-brownbear! Lets not worry about the value constrains for v1 then. We can add those in later and enhance the UI at the time as well. Thanks!

@ycombinator
Copy link
Contributor

@original-brownbear I want to amend my statement in the previous comment :) Lets not worry about the range value constraint. It would be very nice to have enumerations expressed in the JSON spec. This would let the UI do things like show a drop down menu for the options, rather than having the user type something and make a mistake.

@guyboertje
Copy link
Contributor

@andrewvc
Please have a look at jdbc_static for complex validation.

I over-rode the validate_value(value, validator) method so I can supply my own complex validation classes and have it validate at the same time as the other settings.

Ideally I would love to have a similar mechanism on the Java stuff.

private static final PluginConfigSpec<Path> CA_CERT_CONFIG =
LsConfiguration.pathSetting("cacert");

private static final PluginConfigSpec<Map<String, String>> URLS_CONFIG =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really escapsulate the urls setting in the poller. The value can be either a String, as it is here, or a complex hash: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-http_poller.html . That was my reason for us trying to figure out that case here.

My suspicion is that we'll wind up with Map<String, Object>, and use code to sort out the mess.

I have no idea how to make this programmatic for a UI.

@ycombinator , your thoughts?

Copy link
Contributor

@ycombinator ycombinator Feb 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading https://www.elastic.co/guide/en/logstash/current/plugins-inputs-http_poller.html#plugins-inputs-http_poller-urls, it looks like this is always expected to be a hash? The Map<String, String> type in the code on this line seems to mirror that expectation too. I guess I'm not seeing how this could be set as "just" a single String?

I have no idea how to make this programmatic for a UI.

One way to encode this in a spec for the UI would be to use some sort of a special designation for user-defined keys in a hash/object data type. Let me try to elaborate below.

First, consider a setting whose datatype is a hash, but with keys provided by the plugin itself. For example, the schedule setting. In this case, the spec for this setting could be something like the following (stealing from ES mapping syntax for illustration purposes only):

{
   "schedule": {
     "type": "object" // or "hash" if you prefer that
     "properties": {
       "cron": { "type": "string" },
       "every": { "type": "string" },
       "in": { "type": "string" },
       "at": { "type": "string" }
     }
  }
}

Now, consider the urls setting. The spec for that could be:

{
   "urls": {
     "type": "object" // or "hash" if you prefer that
     "properties": {
       "__user_defined__": {
         "type": "string",
         "max_count": -1 // to indicate that there could an unlimited number of URLs
       }
     }
  }
}

Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ycombinator @andrewvc my thought was that we could programmatically encapsulate what ES does as PluginConfigSpec<Map<String, PluginConfigSpec>> and build up a nested mapping that way.
For some reason, I forgot adding that part ... will be up soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ycombinator it can be either a string or a hash. It's weird that way.

@original-brownbear ah, yes, that makes more sense.

Still, encoding this sort of thing into JSON is inherently strange. It's pretty complex to encode can be a string or a map, and even after you do encode that, how do you render the UI?

So, here's a thought, maybe there's some sort of 'override' UI spec the plugin can bundle. In this case, we don't need the UI to support the simple syntax, just the full one. We could let the plugin author manually specify what the UI controls should be instead of doing that programatically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewvc I'm good with that proposal — that the UI gets the "highest fidelity" syntax. We can always figure out optimizations/shortcuts later.

WDYT about my proposal for encoding plugin-specified keys vs. user-defined keys above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewvc here you go, quick sketch but the idea should be clear:

11f2718

I think you can simply encode this as a nesting of LsConfiguration. This way the UI can also render it as such with the ability to add and remove elements (that are essentially the same as the top-level element).

In this case, you could simply add a UI element that allows adding nested "rows" for each Map<String, LsConfiguration> typed setting.
Hopefully, this makes sense, happy to go into more detail, but I'm optimistic we can properly encode this for the UI :)

Copy link
Member Author

@original-brownbear original-brownbear Feb 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ycombinator I tried to capture your proposal in some Java code.

I actually left out the spot with the flat hash I just noticed, my bad:

    @SuppressWarnings("unchecked")
    public static <T> PluginConfigSpec<Map<String, T>> requiredFlatHashSetting(
        final String name, Class<T> type) {
        //TODO: enforce subtype
        return new PluginConfigSpec(
            name, Map.class, null, false, true
        );
    }

    @SuppressWarnings("unchecked")
    public static PluginConfigSpec<Map<String, LsConfiguration>> requiredNestedHashSetting(
        final String name, final Collection<PluginConfigSpec<?>> spec) {
        return new PluginConfigSpec(
            name, Map.class, null, false, true, spec
        );
    }

Something like this would be doable though that captures flat sub-hashes :)
I think we can easily render that in that or a similar ES style JSON for the UI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@original-brownbear I have to admit, I'm a bit lost :) But that's okay! I mostly just want to know, from a UI POV, if we'll be able to encode the following:

  1. A setting whose data type is a hash. Almost 100% certain this is possible :)

  2. For such settings, there are two cases for the hash's keys:
    a. Plugin-defined, like in the case of the schedule setting setting, and
    b. User-defined, like in the case of the urls setting

    The UI needs to be able to distinguish between these two cases by looking at the spec so it can decide whether to a) show "static" keys or b) make the key a text box field for the user to type something in, and also allow them to add N number of such rows.

    So my question is, can this sort of distinction be generated from the Java API you are proposing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ycombinator yes, that's doable and was exactly the plan in the Java snippet above :)

@original-brownbear
Copy link
Member Author

@andrewvc what do you have in mind for next steps here? :)

Copy link
Contributor

@andrewvc andrewvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some further feedback.

}

@Override
public long poll(final Event event, final long millis) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking through this, most people will need to write the same polling code over and over again.

I think we should create some simple default implementations here that people can use. I put two up here: https://github.com/original-brownbear/logstash/pull/1/files

WDYT? Batching is necessary for things like the ES filter, etc. It's pretty complicated to implement in terms of this API (I'm sure there's bugs in my code up there for instance).

I'd rather we promote those two filter APIs (and make similar ones for the Output class) than have people writing this boilerplate over and over again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewvc

Batching is necessary for things like the ES filter, etc

Yea maybe, but the efficient way of doing it is also highly plugin dependent. As of right now, I think it's fair to say that there are no plugins out there that actively make use of the batched interface in a quantifiable way. Otherwise, I wouldn't have found #8428 by mere chance :)

=> I'm not really comfortable with defining a batched API like that (yet). I'd much rather build a few plugins and see what works in the real world than guessing the right approach here now tbh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how it's plugin dependent? The only reason filters don't batch today is that it isn't possible because they can only get one event at a time. I'd rather not box us in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, how does the efficiency of https://github.com/original-brownbear/logstash/pull/1/files#diff-eb733fe2c6eb811410b0e0a88226263d , it should be garbage free, aside from a single new Event[] call per batch, which is pretty cheap given a reasonable batch size.

Also, looking forward to your feedback on the SimpleFilter API.

I'd rather we only expose these APIs than the low level one here. The low level one is so implementation dependent it boxes us in. I'm also not sure who would actually want to use them directly, they're much more finnicky, and I'm not sure how much they buy a programmer in any situation I can think of.


private static final PluginConfigSpec<Map<String, LsConfiguration>> URLS_CONFIG =
LsConfiguration.requiredHashSetting(
"urls", Arrays.asList(URL_METHOD_CONFIG, USER_CONFIG)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this encapsulates an HTTP poller configuration. urls can either look like:

urls = {"elastic" => "http://elastic.co"}

or

urls = {"elastic" => {"method" => "get", "url" => "http://elastic.co", "headers" => {...}}}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is too flexible for Java?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewvc hah you're right :) The "or" case that allows for two different schemas isn't covered by my proposal.

Tbh. ... I'm tempted to vote for not allowing that in Java plugins. The only downside to making that simplification is, that we'd have to adjust the config schema for the HTTP poller if we fully port it to Java (and the HTTP poller probably isn't the only case). Other than that I only see upsides to simplifying this (especially when thinking about the UI!) ...

Think it's feasible to break with the flexibility of the Ruby implementation here for the sake of type safety?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I never responded here. At any rate, yes, I think it's fine to break with Ruby here.

@andrewvc
Copy link
Contributor

andrewvc commented Mar 8, 2018

One further thought, since the plugin objects have state, I take it we will have to instantiate one plugin per worker thread? That's tricky, but OK so long as there's some way to share state across instances of the plugin with the same ID across threads. I think that means we'll need each plugin to be able to share some sort of context object. I think there should be some sort of makeContextObject method that plugins can implement, the context object would be instantiated by the pipeline and passed to each instance per worker.

@andrewvc
Copy link
Contributor

andrewvc commented Mar 8, 2018

@guyboertje can you take a look at the validators here and let us know what could be improved in this proposal based on your work there?

@original-brownbear
Copy link
Member Author

@guyboertje ping :) I'd like to be able to continue here and that part will be quite a bit of code still. Better to speak up before I start moving on this if possible :)
Thanks!

@guyboertje
Copy link
Contributor

It seems weird to me that you ask me to spend hours to fully understand what you are proposing and whether it does or does not suit the current custom validation that I cited when it would take you 30 minutes to read the current custom validation and reach the same conclusion.

If after you look at the existing code and you say "we got that covered" then I'm good with that.

@original-brownbear
Copy link
Member Author

just to give this some structure I'll start linking the dependent PRs that I'm opening/merging in that'll get us the necessary compiler logic for the Java plugins :)

for now, waiting on #9320

@original-brownbear
Copy link
Member Author

closing in favour of continuing in #9342 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants