language proposal #8

Closed
nikhan opened this Issue Oct 29, 2014 · 9 comments

Comments

Projects
None yet
2 participants
@nikhan
Contributor

nikhan commented Oct 29, 2014

Streamtools as a Language

The following document intends to illustrate what Streamtools might look like if it was actually a language.

Current Problems

Lack of modularity

Every time we want to do some kind of operation that is not currently afforded by Streamtools we need to write Go, recompile, rerelease. This is especially annoying for situations where we are limited by the grammar of ST. The lack of modularity is also impacts people who may wish to curate their own work environment with blocks that are specific to their work.

Inconsistant Flow Control

There are several problems with how ST currently uses flow for control.

  1. Each block has a queue that is allowed to overflow and drop messages. This destroys the utility blocking as a flow mechanism.
  2. Streams are not "typed" -- that is, anything coming over a stream can be a signal. Currently the combined "out routes"" for each block make it so that it is difficult to understand what kind of signal is being sent by a block that may have multiple possible outputs.
  3. Whether a block needs to triggered with an "emit" "bang" "poll" and whether the block automatically emits a message when various inputs are hit is entirely a mystery and nonstandard.
  4. It is EXTREMELY difficult to have two or more flows accessing the same state due to the above 2 points.
Inconsistant Block API

The inconsistent block API is product of inconsistent use of flow for control as well as inconsistent access to state. For instance, the cache, set, and histogram blocks should be very closely coupled. They are also lacking, in different ways, many operations we expect from a key-value data structure.

Proposal

screen shot 2014-10-24 at 4 08 49 pm

Diagram 1

The above diagram represents a system where two messages enter the + block to produce a sum. This layout is impossible in the current instantiation of ST without some kind of workaround since it is impossible for the 2 input streams to ever arrive at the same time. An example workaround would be to associate the message containing '1' with the message containing '2' with a delay to wait for the message that has yet to be arrived. Another workaround would be something like our join block, where we assume order is correct.

This is undesirable, as it introduces all kinds of issues of synchronicity in the stream or unnecessary metadata embedded in each message. For instance, we could associate the wrong messages with one another or we could require a whole new system to deal with metadata in each message to resolve how the messages should be paired. Either way, this is added complexity that does not help in the development/understanding of a grammar consistent across blocks. This leads us to our first proposal:

1. Core blocks are responsible for ONE synchronous operation

This means that blocks like timeseries which seem to allow asynchronous access (the pushing to the timeseries and the polling of a timeseries) are no longer valid. The funny thing is that this rule actually isn't changing how blocks currently work internally: currently each block does one operation at at time, all this idea does is express the synchronous aspect already internal to the block to the input routes.

What the above means is that all input routes block until all input routes have received a message. Once all routes have been satisfied the operation of the block is triggered.

The above suggestion is a bit at odds with our current instantiation of the idea of "rules" which are used to give parameters to a block. Many of our blocks have "rules" that persist throughout the duration the blocks life and never need to be changed. Which leads us to...

2. "Constant" routes and "Path" routes

route2

Diagram 2

Since our core blocks depend having all inputs satisfied once per block operation, this presents a problem when we have situations where we'd like a constant value to be applied as a parameter for a given block. We don't want to send one message per message in the above addition example if we want to add a constant to a stream.

Given the ability to configure how a route accepts and treats its value fixes this problem. In diagram 2, two modes of configuration are presented. The dialog on top is for a situation where a block is meant to accept a stream and retrieve a specific value from within an input object. In the case of the diagram, the route is configured accept the number for the + block from {foo:{bar:{baz:}}}. This is a "Path" route.

The preserve value checkbox is meant to satisfy the need when a block needs to be modified out-of-step with the other input stream. This means that if you have the left input stream running at 50hz, and you modify the parameter for the right stream at 1hz, that the value parameter is preserved for all messages until the right stream is updated again.

The bottom dialog in Diagram 2 presents an option to input JSON directly into the right stream input, so that it is always constant across operations. This is a "Constant" route. This provides us with 3 types of patterns:

routes3

Diagram 3

The middle pattern in Diagram 3 presents a problem: how can we guarantee two streams appear at that block in a way that is useful to us?

3. blocking and flow control

array_pattern

Diagram 4

Diagram 4 introduces some new blocks:

  1. source: let's consider this some magical place where messages come from.
  2. unpack: given the key-path configured in the input route, emit 1 message per array element found at that key-path location on the left output stream. Before starting a stream of elements, it sends the number of elements that it will be unpacking out its right output.
  3. pack: pack accepts a the length of elements expected to be packed into array in the right input. Once a value is accepted, the right input blocks until the left input has satisfied the correct length. Upon reaching the length, the array is emitted, and the right input becomes unblocked.
  4. set: set accepts a JSON object on the right input. This JSON object serves as a template for which the values (left input) are inserted. If the right input contains the value {foo: ""} (the value can be anything, it will be overwritten), every message emitted will be in the format {foo: value}.
  5. merge: like +, merge requires an input on both left input and right input to commit an operation. merge combines two maps, privileging the right input.

This pattern takes advantage of blocking in order to maintain consistency across events. When a message is broadcast from source, it is immediately sent to merge which blocks source from sending any more messages into the system. Next, a message is sent to unpack, which immediately sends a message to pack. pack then blocks until the array length is satisfied, blocking unpack from sending any more messages downstream. Each message is added to 1 and sent to pack. Once pack emits the array, unpack is unblocked, and the message is set into an object. This object is sent to merge which unblocks source.

NOTE: The right input for pack may seem at odds with proposals 1 & 2. Instead of having a 1:1 relationship with its fellow left input, it has a 1:N relationship. Considering this block as a portal between many messages and a single message, I feel less icky about the situation.

b

4. Shared State

stores

Diagram 5

One of the deficiencies in our current implementation is that complex operations with state are rather difficult. This is because we can only access the state via one block -- a block that has one output. Determining what the output means in regards to the state can be rather tricky -- and has resulted in the inclusion metadata into our outputs.

One way to amend the current situation is to afford multiple outputs that signal various circumstances. Similar to the unpack block in Diagram 4, various output routes can be added to afford the control of flow. For example, if a lookup query returns no result when burying a set, then instead of returning null (which is valid JSON that should be able to be included in a set) or nothing (which is unhelpful), we should send a signal out of a separate output channel, signaling that no message was found.

In addition to multiple outputs, another way to fix the problem of having to worry about how many different state queries work together is to divorce the state from the block. This amounts to having a completely new element in ST -- a data store.

The data store does not participate in message flow. It is a node that is associated with various blocks that act as an API to its contents. Diagram 5 illustrates what some of this blocks may look like. The "association" is illustrated as a route that exists on the side -- this is for illustration purposes only. There is no directionality when associating a block with a store.

This affords us the ability to run multiple control flows with a single data store and affords the construction of complex patterns.

streamtools

Diagram 6

The above diagram illustrates a timeseries. values come in through source, source broadcasts one message to set which puts the value into an object, like {"value":22}. source also sends a message to now which produces a time in epoch ms. This epoch ms is also set into an object, like {"time": 140092818110000}. The two are merged, and then pushed to an array. push returns the new length of the array, and if the new length of the array is greater than our sample size (60), then pop an element off the array. If at any time we would like to view the timeseries, we send a message to dump which does not interfere with our push/pop flow at all.

5. Building Blocks

->map<-

->Diagram 7<-

Building blocks is still not something I've entirely figured out. The above diagram displays how a map would work. It's a lot more work, but the data is much more clearly expressed. One thing you'll see is that merge has 3 inputs. I am of the mind that merge should be a magic block that can allow infinite inputs. I imagine this UI-wise as just dragging the block wider.

Here are some scattered notes on how I think custom blocks should work:

  1. There should be some way to couple the input routes similar in the same way as the way the core blocks are coupled -- i.e., two inputs, no message is processed until both are satisfied. I suppose this could be represented internally to the block as an array.
  2. I am not adverse to custom blocks with tons of routes doing all sorts of things. Not all custom block routes need to be coupled/block one another.
  3. Blocks afford more flow control, so you can do things like only let one message into the block at a time to ensure that you don't mess up your consistency while doing crazy things. This could be signaled by some internal route -- something called "finish" -- that gets hit at the end of your pattern. This could signal that the block is ready for a new message.
  4. Blocks can also afford the ability to do data-store locking. In the circumstance where you have a series of operations that have to act uninterrupted on a store, if you put them all in a single custom block, then that custom block should have uninterrupted access to the store while processing that single message.
  5. Custom blocks should have access to external stores, but external stores should not have access to stores internal to the block.

Errata:
No more DSLs, for anything. All key paths are represented as JSON -- no go-fetch. I realized this at the very end here, but there is no reason to represent a key-path as a string. Just use a JSON object. Just imagine Diagram 2 has {foo:{bar{baz:""}}} instead of .foo.bar.baz.

to sum up, the goals of this proposal are:

  1. remove as much DSL stuff as possible
  2. ensure that all data is accessible to all operations (i.e., we can't operate on individual array elements in current ST)
  3. remove all in-message metadata like we have in ST now
  4. use streams for context of message (send signals, not metadata, i.e. pack/unpack, EOF, that kind of thing)
  5. make a grammar of streams that incorporates blocking/locking
  6. make a grammar that can recreate ST as it stands now
  7. provide a means for some kind modularity, so that people can create their own library without having to rebuild ST/progrma Go
@nikhan

This comment has been minimized.

Show comment
Hide comment
@nikhan

nikhan Oct 29, 2014

Contributor

Addendum: drop the "Preserve Value" nonsense and only have two routes: constant and key-selector

Contributor

nikhan commented Oct 29, 2014

Addendum: drop the "Preserve Value" nonsense and only have two routes: constant and key-selector

@nikhan

This comment has been minimized.

Show comment
Hide comment
@nikhan

nikhan Oct 30, 2014

Contributor

A priority queue that removes elements after a specified time difference:
screen shot 2014-10-30 at 11 51 05 am

(updated)

Contributor

nikhan commented Oct 30, 2014

A priority queue that removes elements after a specified time difference:
screen shot 2014-10-30 at 11 51 05 am

(updated)

@nikhan

This comment has been minimized.

Show comment
Hide comment
@nikhan

nikhan Oct 30, 2014

Contributor

How do messages travel through blocks?
screen shot 2014-10-30 at 12 03 23 pm

Contributor

nikhan commented Oct 30, 2014

How do messages travel through blocks?
screen shot 2014-10-30 at 12 03 23 pm

@nikhan

This comment has been minimized.

Show comment
Hide comment
@nikhan

nikhan Oct 30, 2014

Contributor

How do "emit-on-bang" blocks work?
screen shot 2014-10-30 at 1 36 40 pm

Contributor

nikhan commented Oct 30, 2014

How do "emit-on-bang" blocks work?
screen shot 2014-10-30 at 1 36 40 pm

@nikhan

This comment has been minimized.

Show comment
Hide comment
@nikhan

nikhan Oct 30, 2014

Contributor

more possibilities of messages traveling through blocks:
screen shot 2014-10-30 at 1 41 54 pm

Contributor

nikhan commented Oct 30, 2014

more possibilities of messages traveling through blocks:
screen shot 2014-10-30 at 1 41 54 pm

@nikhan

This comment has been minimized.

Show comment
Hide comment
@nikhan

nikhan Oct 30, 2014

Contributor

are "emit-on-bang" routes different than routes that require a specific key or value, since they don't necessarily need any kind of information (other than a signal) to proceed?

Contributor

nikhan commented Oct 30, 2014

are "emit-on-bang" routes different than routes that require a specific key or value, since they don't necessarily need any kind of information (other than a signal) to proceed?

@nikhan

This comment has been minimized.

Show comment
Hide comment
@nikhan

nikhan Oct 30, 2014

Contributor

Count Block
screen shot 2014-10-30 at 3 03 11 pm

Contributor

nikhan commented Oct 30, 2014

Count Block
screen shot 2014-10-30 at 3 03 11 pm

@nikhan

This comment has been minimized.

Show comment
Hide comment
@nikhan

nikhan Oct 30, 2014

Contributor

Above example presents a new problem:
When constructing a block, how do you give parameters to things downstream? (represented above by the blue arrow coming from the "count window" input to the ">" block.

Proposal:

  1. if you set an input value for your constructed block to a constant value, this acts as a "pusher" to inputs downstream.
  2. if you set an input value for your constructed block to a constant value some magic happens where that input simply access as a proxy to the input downstream.
Contributor

nikhan commented Oct 30, 2014

Above example presents a new problem:
When constructing a block, how do you give parameters to things downstream? (represented above by the blue arrow coming from the "count window" input to the ">" block.

Proposal:

  1. if you set an input value for your constructed block to a constant value, this acts as a "pusher" to inputs downstream.
  2. if you set an input value for your constructed block to a constant value some magic happens where that input simply access as a proxy to the input downstream.
@nikhan

This comment has been minimized.

Show comment
Hide comment
@nikhan

nikhan Oct 30, 2014

Contributor

Another large bit to add to the "problems" list:
Separation of concerns. In addition to the fact that we duplicate common mechanisms across blocks, the relative usability of each block is obscured. There are blocks like Histogram that live along blocks like gaussian. By presenting a low-level vocabulary we:

  • make more explicit what is designed to be used for enthusiasts and what it is designed for a crowd that is expecting something much more higher level
  • makes low-level functions even more modular, so that if a simplexnoise block were to be created, the block itself would be a generic implementation
  • affords the creation of domain-specific vocabularies for particular people, like a twitter parsing block (which would be composed of all sorts of low-level blocks).
Contributor

nikhan commented Oct 30, 2014

Another large bit to add to the "problems" list:
Separation of concerns. In addition to the fact that we duplicate common mechanisms across blocks, the relative usability of each block is obscured. There are blocks like Histogram that live along blocks like gaussian. By presenting a low-level vocabulary we:

  • make more explicit what is designed to be used for enthusiasts and what it is designed for a crowd that is expecting something much more higher level
  • makes low-level functions even more modular, so that if a simplexnoise block were to be created, the block itself would be a generic implementation
  • affords the creation of domain-specific vocabularies for particular people, like a twitter parsing block (which would be composed of all sorts of low-level blocks).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment