Skip to content
Brian Marco edited this page Nov 27, 2018 · 95 revisions

Review of literature on Datalog, tuple syncing, CRDTs etc, in relation to Datsync.

Feel free to add a link, and once you've read, provide a contextual abstract/summary as pertains to the Datsync project.

Ideas

Eve

From the makers of LightTable. This is an interesting project as they have many overlapping concerns. We should keep ourselves informed.

Eve consists of a runtime that efficiently compiles the queries to a language build around Changes:

Changes

//------------------------------------------------------------------------
// Changes
//------------------------------------------------------------------------

// Because Eve's runtime is incremental from the ground up, the primary unit of
// information in the runtime is a Change. The content of a change is in the
// form of "triples," a tuple of entity, attribute, and value (or in the RDF
// world, subject, object, predicate). For example, if we wanted to talk about
// my age, we might have a triple of ("chris", "age", 30). Beyond the content
// of the change, we also want to know who created this change and what
// transaction it came from. This gives us enough information to work out the
// provenance of this information, which is very useful for debugging as well
// as doing clever things around verification and trust. The final two pieces
// of information in a change are the round and count, which are used to help
// us maintain our program incrementally. Because Eve runs all blocks to
// fixedpoint, a single change may cause multiple "rounds" of evaluation which
// introduce more changes. By tracking what round these changes happened in, we
// can do some clever reconciling to handle removal inside recursive rules
// efficiently, which we'll go into more depth later. Count tells us how many
// of these triples we are adding or, if the number is negative, removing from
// the system.

https://github.com/witheve/Eve/blob/master/src/runtime/runtime.ts#L431

Joins

//------------------------------------------------------------------------
// Joins
//------------------------------------------------------------------------

// Buckle up, we're going for a ride.
//
// Now that we have a change representation, we need to actually do something
// with it. Eve is a relational language, which means the primary action in
// the language is to join tuples together. Unlike in most relational databases
// where we might do joins by looking at full relations pair-wise and joining
// them together, we need to operate on changes and we want to sidestep the
// cost of figuring out a good query plan for the pair-wise joins. Both of
// these properties require us to look at joins very differently than we
// normally would in say Postgres. Instead, we're going to use a magical join
// algorithm called Generic Join [1] and extend it to work on incremental
// changes instead of just fully realized relations.
//
// The core idea behind Generic Join is that instead of breaking a query down
// into a set of binary joins on relations, we look at each unique variable in
// the query and have all of the relations that might say something about that
// variable do an intersection. Let's look at an example:
//
//  people(person-id, name)
//  dogs(person-id, dog-name, dog-age)
//
// Here we have two relations we want to join together: "people" and "dogs".
// The people relation has two fields that are represented by the variables
// "person-id" and "name." The dogs relation has three fields: "person-id",
// "dog-name", and "dog-age." In postgres, we'd take these two relations and do
// a hash or merge join based on the first column of each. In generic join we
// look at all the variables we need to solve for, in this case four of them,
// and then ask each relation which variable they could propose values for.
// These proposals include not just what variable this relation could solve
// for, but also an estimate of how many values the variable would have. In the
// interest of doing the least amount of work possible, we select the proposal
// with the smallest estimate and then for each proposed value of the variable,
// we ask all the other relations if they "accept" the value.  If they do, we
// recursively solve for the rest of the variables in depth-first fashion.
//
// In this algorithm, each relation acts as a constraint on the space of
// valid solutions. We don't just look at every row in the people table or
// every row in the dogs table, but instead look at the unique values per
// variable given some set of already solved variables. We call that
// solved set of variables a "prefix". So when we ask a constraint to propose
// values, we hand it the prefix and ask it which variable it would solve for
// next. We then ask each constraint if they accept the new prefix and continue
// to solve for the rest of the variables. By selecting the proposal with the
// smallest estimate, we can make some interesting guarantees about the upper
// bound [2] of the work we will do to satisfy our join and we side step the
// need for the insanely complex query planners that exist in most commercial
// databases. An interesting aspect of this algorithm is that it's basically
// making planning decisions for every unique value of a variable, which means
// that it is resilient to the high skew you often end up with in real-world
// data.
//
// So the key parts of Generic Join are prefixes, constraints, and proposals,
// which we'll start to layout below. We'll talk more about the ways we have
// to change Generic Join to make it work incrementally later.
//
// [1]: Generic Join is presented in "Skew Strikes Back: New Developments in
//      the Theory of Join Algorithms" https://arxiv.org/abs/1310.3314
// [2]: "Worst-case Optimal Join Algorithms "https://arxiv.org/abs/1203.1952

cqrs-server

Doing things similar to Eve, but using onyx, kafka, and datomic.

Dedalus

Dedalus: Datalog in Time and Space [:research]

Om (next)

Differential Dataflow

Differential/Neural Computation

  • Programming with a differentiable Fourth Interpreter [:research] This paper presents a completely numerically differentiable, fourth interpreter. This allows the engineer to provide prior data and parts of the code which they would like approximated. Using the same techniques for optimizing a neural network (it's built on tensorflow) the interpreter is able to approximate the specified unit of code. While not directly related to stream processing this paper sheds some light on the properties that one desires in a derivative, namely being able to use the derivative to incorporate prior knowledge to approximate unknown units of code.

  • End-to-End Differentiable Proving [:research] A differentiable technique for representing atoms as dense vectors and chaining them via recursive combination. The result is a structure that represents the relationship between the atoms in a way that is differentiable with respect to the particular representation of the relationship. This allows us to optimize the representation. In the context of streaming data, it may be interesting to apply the same technique to deduce stream processing primitive which exposes datoms which satisfies the demands an given query.

Magic Sets

Dealing with a co-NP-complete problem can appear unfeasible, or even crazy in a database setting where input may be very large. However, our present results show that suitable optimization techniques can “localize” the computation and limit the inefficient (co-NP) computation to a very small fragment of the input, obtaining fast queryanswering, even in a powerful data-integration framework.

Propagators

Other