-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
language proposal #8
Comments
Addendum: drop the "Preserve Value" nonsense and only have two routes: constant and key-selector |
are "emit-on-bang" routes different than routes that require a specific key or value, since they don't necessarily need any kind of information (other than a signal) to proceed? |
Above example presents a new problem: Proposal:
|
Another large bit to add to the "problems" list:
|
Streamtools as a Language
The following document intends to illustrate what Streamtools might look like if it was actually a language.
Current Problems
Lack of modularity
Every time we want to do some kind of operation that is not currently afforded by Streamtools we need to write Go, recompile, rerelease. This is especially annoying for situations where we are limited by the grammar of ST. The lack of modularity is also impacts people who may wish to curate their own work environment with blocks that are specific to their work.
Inconsistant Flow Control
There are several problems with how ST currently uses flow for control.
Inconsistant Block API
The inconsistent block API is product of inconsistent use of flow for control as well as inconsistent access to state. For instance, the
cache
,set
, andhistogram
blocks should be very closely coupled. They are also lacking, in different ways, many operations we expect from a key-value data structure.Proposal
Diagram 1
The above diagram represents a system where two messages enter the
+
block to produce a sum. This layout is impossible in the current instantiation of ST without some kind of workaround since it is impossible for the 2 input streams to ever arrive at the same time. An example workaround would be to associate the message containing '1' with the message containing '2' with a delay to wait for the message that has yet to be arrived. Another workaround would be something like ourjoin
block, where we assume order is correct.This is undesirable, as it introduces all kinds of issues of synchronicity in the stream or unnecessary metadata embedded in each message. For instance, we could associate the wrong messages with one another or we could require a whole new system to deal with metadata in each message to resolve how the messages should be paired. Either way, this is added complexity that does not help in the development/understanding of a grammar consistent across blocks. This leads us to our first proposal:
1. Core blocks are responsible for ONE synchronous operation
This means that blocks like
timeseries
which seem to allow asynchronous access (the pushing to the timeseries and the polling of a timeseries) are no longer valid. The funny thing is that this rule actually isn't changing how blocks currently work internally: currently each block does one operation at at time, all this idea does is express the synchronous aspect already internal to the block to the input routes.What the above means is that all input routes block until all input routes have received a message. Once all routes have been satisfied the operation of the block is triggered.
The above suggestion is a bit at odds with our current instantiation of the idea of "rules" which are used to give parameters to a block. Many of our blocks have "rules" that persist throughout the duration the blocks life and never need to be changed. Which leads us to...
2. "Constant" routes and "Path" routes
Diagram 2
Since our core blocks depend having all inputs satisfied once per block operation, this presents a problem when we have situations where we'd like a constant value to be applied as a parameter for a given block. We don't want to send one message per message in the above addition example if we want to add a constant to a stream.
Given the ability to configure how a route accepts and treats its value fixes this problem. In diagram 2, two modes of configuration are presented. The dialog on top is for a situation where a block is meant to accept a stream and retrieve a specific value from within an input object. In the case of the diagram, the route is configured accept the number for the
+
block from {foo:{bar:{baz:}}}. This is a "Path" route.The
preserve value
checkbox is meant to satisfy the need when a block needs to be modified out-of-step with the other input stream. This means that if you have the left input stream running at 50hz, and you modify the parameter for the right stream at 1hz, that the value parameter is preserved for all messages until the right stream is updated again.The bottom dialog in Diagram 2 presents an option to input JSON directly into the right stream input, so that it is always constant across operations. This is a "Constant" route. This provides us with 3 types of patterns:
Diagram 3
The middle pattern in Diagram 3 presents a problem: how can we guarantee two streams appear at that block in a way that is useful to us?
3. blocking and flow control
Diagram 4
Diagram 4 introduces some new blocks:
source
: let's consider this some magical place where messages come from.unpack
: given the key-path configured in the input route, emit 1 message per array element found at that key-path location on the left output stream. Before starting a stream of elements, it sends the number of elements that it will be unpacking out its right output.pack
: pack accepts a the length of elements expected to be packed into array in the right input. Once a value is accepted, the right input blocks until the left input has satisfied the correct length. Upon reaching the length, the array is emitted, and the right input becomes unblocked.set
: set accepts a JSON object on the right input. This JSON object serves as a template for which the values (left input) are inserted. If the right input contains the value {foo: ""} (the value can be anything, it will be overwritten), every message emitted will be in the format {foo: value}.merge
: like+
, merge requires an input on both left input and right input to commit an operation.merge
combines two maps, privileging the right input.This pattern takes advantage of blocking in order to maintain consistency across events. When a message is broadcast from
source
, it is immediately sent tomerge
which blockssource
from sending any more messages into the system. Next, a message is sent tounpack
, which immediately sends a message topack
.pack
then blocks until the array length is satisfied, blockingunpack
from sending any more messages downstream. Each message is added to 1 and sent topack
. Oncepack
emits the array,unpack
is unblocked, and the message isset
into an object. This object is sent tomerge
which unblockssource
.NOTE: The right input for
pack
may seem at odds with proposals 1 & 2. Instead of having a 1:1 relationship with its fellow left input, it has a 1:N relationship. Considering this block as a portal between many messages and a single message, I feel less icky about the situation.b
4. Shared State
Diagram 5
One of the deficiencies in our current implementation is that complex operations with state are rather difficult. This is because we can only access the state via one block -- a block that has one output. Determining what the output means in regards to the state can be rather tricky -- and has resulted in the inclusion metadata into our outputs.
One way to amend the current situation is to afford multiple outputs that signal various circumstances. Similar to the
unpack
block in Diagram 4, various output routes can be added to afford the control of flow. For example, if a lookup query returns no result when burying a set, then instead of returning null (which is valid JSON that should be able to be included in a set) or nothing (which is unhelpful), we should send a signal out of a separate output channel, signaling that no message was found.In addition to multiple outputs, another way to fix the problem of having to worry about how many different state queries work together is to divorce the state from the block. This amounts to having a completely new element in ST -- a data store.
The data store does not participate in message flow. It is a node that is associated with various blocks that act as an API to its contents. Diagram 5 illustrates what some of this blocks may look like. The "association" is illustrated as a route that exists on the side -- this is for illustration purposes only. There is no directionality when associating a block with a store.
This affords us the ability to run multiple control flows with a single data store and affords the construction of complex patterns.
Diagram 6
The above diagram illustrates a timeseries. values come in through
source
,source
broadcasts one message toset
which puts the value into an object, like {"value":22}.source
also sends a message tonow
which produces a time in epoch ms. This epoch ms is also set into an object, like {"time": 140092818110000}. The two are merged, and then pushed to an array.push
returns the new length of the array, and if the new length of the array is greater than our sample size (60), then pop an element off the array. If at any time we would like to view the timeseries, we send a message todump
which does not interfere with our push/pop flow at all.5. Building Blocks
-><-
->Diagram 7<-
Building blocks is still not something I've entirely figured out. The above diagram displays how a
map
would work. It's a lot more work, but the data is much more clearly expressed. One thing you'll see is thatmerge
has 3 inputs. I am of the mind thatmerge
should be a magic block that can allow infinite inputs. I imagine this UI-wise as just dragging the block wider.Here are some scattered notes on how I think custom blocks should work:
Errata:
No more DSLs, for anything. All key paths are represented as JSON -- no go-fetch. I realized this at the very end here, but there is no reason to represent a key-path as a string. Just use a JSON object. Just imagine Diagram 2 has {foo:{bar{baz:""}}} instead of .foo.bar.baz.
to sum up, the goals of this proposal are:
The text was updated successfully, but these errors were encountered: