Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Create end-user facing DSL to configure SpringXD #1

Closed
markpollack opened this Issue May 2, 2013 · 10 comments

Comments

Projects
None yet
8 participants
Member

markpollack commented May 2, 2013

Introduction

Users configure Spring XD with a high level DSL that is easy to use and not XML. This DSL describes how to handle data across the key use-cases SpringXD addresses:

  • Distributed Data ingestion from a variety of input sources into big data stores
  • Real-time analytics at ingestion time, e.g. gathering metrics and counting values.
  • Workflow management via batch jobs.
  • Distributed Data export, e.g. from HDFS to a RDBMS or NoSQL database.

This primarily will require mapping the high level DSL to assemble Spring Integration and Spring Batch configuration files.

Prototype work

The prototype work that has been done so far is around Spring Integration flows that are linear, aka 'streams'. The syntax is similar to basic UNIX pipes and filters.

http | file

would use the http source and file sink modules, each module corresponding to a Spring Integration configuration files, and each chained to the next through channels.

Configuring Modules

Options to configure the module are represented using two dashes:

http --port=8080 | file --dir=/data/

Options are modeled in the corresponding Spring configuration as property-placeholders.

Naming of streams

By default the name of the stream is a concatenation of the module names, so http | file would be http2file. They can also be explicitly declared when creating the stream. This was done implicitly via URI properties in the prototype though a web api, and doesn't yet have a way to specify this in the DSL.

Adding Processing steps to the linear flow.

Add another pipe and filter component

http | filter | file

Taps

Taps are a way to get access to data that is already being ingested via an existing flow, useful for collecting metrics that may not be central to the existing data processing flow. Here is an example:

twitter-search --query=Bieber | file 
tap @ twitter-search | fieldvaluecounter --fields=tags

Jobs

The syntax 'jobname @ trigger' was used to reference Spring Batch jobs or a bean that would implement Runnable.

Scheduling jobs

The trigger above could either be a cron expression (e.g. 0 0/10 * * * ?) or a periodic interval in seconds (e.g. 10).

Non-linear flows

Not all flows will be linear. The running suggestion is to model this case in a style similar to the DOT Graph Language.

We should take the SI Cafe Example as well as some from Spring Batch to experiment with this flow.

Topology

We would like to be able to describe how different modules would be run as independent process on different machines, also taking into account that multiple instances of a given module may be required.

http | (filter)*2 | file

A general configuration would place and run each module in an independent process. The parentheses indicate two instances of the filter module should be run.

Multiple modules could be grouped together inside the parentheses:

http | (filter | transformer)*2 | file

There are probably limitations to how well one can describe the logical scale out of the flow and the underlying infrastructure in one line. There likely needs to be some separate language for the topology that references components of the flow.

http | @processingModule(filter | transformer) | file

A topology DSL could then, in the most simple case, state

processingModule : machine1, machine2

We have also discussed being able to create composite modules, and that could lead to simplifications of the above model:

composite = filter | transformer
http | composite | file

Likewise we may want to support an option for "extending" templates by binding a subset of a module's parameters:

foowebservice = webservice --url=http://foo.com

I'm wondering if we could consider a Groovy DSL for that, instead of inventing one's own parser.

With a Groovy DSL, supporting a syntax similar to the above (with a few minor modifications), you could also get the chance to further configure and program your SpringXD configuration, thanks to loops or some calculations that you may need.

To give you an idea of what it could look like, here's how I'd modify the examples above to follow a Groovy syntax:

// simple or() operator overloading
http | file

// use function named arguments to specify parameters
http(port: 8080) | file(dir: '/data/')

twitter_search(query: "Bieber") | file 
tap(twitter_search) | fieldvaluecounter(fields: 'tags') // or twitter_search.tap perhaps?

// again fine with multiply() operator overloading
http | (filter)*2 | file
http | (filter | transformer)*2 | file

// I'd drop the @ from your example
http | processingModule(filter | transformer) | file

// for the topology, I'd use the familiar assignment syntax, but assigning a list with square brackets
processingModule = [machine1, machine2]

// again with assignments and or() operator overloading
composite = filter | transformer
http | composite | file

foowebservice = webservice(url: 'http://foo.com')

With Groovy 2.1 and beyond, you could also have the option to statically type checked such DSLs if needed, to get compile errors if you're making mistakes, like typos, wrong kind of assignments, etc.

The whole configuration would be inside a script that would really just contain the kind of "sentences" above, without requiring any method / class boilerplate surrounding those sentence definitions.

I'm happy to further discuss this idea, if you think it's interesting.

Contributor

aclement commented May 16, 2013

Hey Guilluame. A groovy DSL is a route we were going to explore, depending on the latest work Mark and Mark are doing on latest requirements for Spring XD. Taking cues from the syntax proposals here and the DSL route taken for Spring Integration.

Member

markpollack commented May 17, 2013

is there any way to keep the 'dash dash' syntax? It is nice because in the command line shell we might be able to use the 'dash dash' syntax in a natural way (wouldn't work with spring shell as it is now though..

xdshell> stream create http --port 8080 | hdfs --outputDir /tmp

Ideally after the 'http' part is typed, tab completion would let you tab your way to options of the http module...

Well, no, that particular syntax isn't valid Groovy syntax, so some adaptation would be needed.

Member

markpollack commented May 17, 2013

Thanks, not saying it is a deal breaker, i'm quite inclined to use groovy here if we can. just wanted to have a cup of coffee with a blank sheet of paper to think a bit and didn't find the time yet.

Does adaptation mean it is possible but significantly harder to do? I remember the rubix's cube DSL you showed once, seems like anything is possible!

Member

dturanski commented May 21, 2013

Non-linear flows relates to XD-97 . Routing semantics should support fan out (recipient list) or conditional (switch) routing. In the context of composing streams, the logic should be coarse. A processor module may expose multiple (named) outputs which emit messages as a result of internal, content-based, routing. So we can pipe each output to a module using map semantics

    [output1: module1, output2:module2,...].  

There should be some way to specify a default or 'otherwise' destination. Also, we need to think about how a stream assembler discovers what outputs are available from a module and what they mean.

Recipient list is associated a single output and simply requires a list of destinations.

Of course these concepts may be arbitrarily nested & combined:

     foo |  [fooOut1: [module1 | [m1out1:module3 | file1, m1out2: module4 | file2 ], module2], fooOut2: someSink] 

This is a good use case for the composite pattern described above.

Contributor

jencompgeek commented Jun 3, 2013

Small quirk with the DSL. Seems I can't use spaces in an expression even if I quote it:

curl -X POST -d "http --port=9014 | filter --expression=\"payload == 'foo'\" | log" http://localhost:8080/streams/jenn14

produces the following error:

curl -X POST -d "foo" http://localhost:9014

Caused by: org.springframework.core.convert.ConversionFailedException: Failed to convert from type java.lang.String to type java.lang.Boolean for value 'payload == 'foo''; nested exception is java.lang.IllegalArgumentException: Invalid boolean value 'payload == 'foo''

But, this works:

curl -X POST -d "http --port=9015 | filter --expression=payload=='foo' | log" http://localhost:8080/streams/jenn14
Member

garyrussell commented Jun 3, 2013

Yeah - but that's not OK either because expression='new Foo()' doesn't work because it's treated as a SpEL literal - 'new Foo()' see https://jira.springsource.org/browse/XD-159

@markfisher markfisher assigned pperalta and unassigned pperalta May 13, 2014

Member

markpollack commented Dec 11, 2014

This is an old issue. The DSL is documented here https://github.com/spring-projects/spring-xd/wiki/DSL-Reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment