live joins? #3

dominictarr · 2013-04-27T13:27:10Z

Would it be possible to use level-live-stream on the joins
so that you can get a changes feed of the relations as they are added?

also, is it possible to do open ended joins, perhaps:

db.join([{
  subject: db.v('author1'),
  predicate: 'maintains',
  object: db.v('module'),
},
{
  subject: db.v('module'),
  predicate: 'depends',
  object: db.v('module2')
},
{
  subject: db.v('author2'),
  predicate: 'maintains',
  object: db.v('module2')
}
], function (err, join) {
  console.log(join)
})

which might return:

{author1: 'mcollina', module: 'levelgraph', author2: 'rvagg', module2: 'levelup'}

Is that correct?

mcollina · 2013-04-27T13:38:36Z

These are two questions: 1) live stream joins and 2) open ended joins.

Live stream joins are not currently supported. Doing that is very different from plain old joins, and require some fairly new algorithms that I need to study. Moreover, as stuff are implemented at the moment everything will blow up in a huge memory leak. In the end, that's another library.
Yes. If it's not working it's a bug. Have you tried to pour NPM data into levelgraph? If you do, let me know!! :)

dominictarr · 2013-04-27T14:34:15Z

I have a module for doing joins here that I use here https://github.com/dominictarr/level-inverted-index/blob/map-reduce/index.js#L99-L106

Yes, I have some levelup stuff for working with npm (which is a great dataset) over here https://github.com/dominictarr/npmd

will try doing this in levelgraph and see what happens.

mcollina · 2013-04-27T16:33:34Z

Your module for joins seems more efficient than the current approach, which is quite naive (but it works).
At the moment I am opening a new stream for each of the query condition for each of the possible results.
it is as bad as it sound, but it allow multiple possible solutions to be discovered.
Moreover this implementation limits the subsequent queries only to the 'bounds' variables, i.e. if the first condition finds out that x = 42, then the second condition reuses that result.

Would you like to try an implementation of the join using the your module? I believe it may be possible to get better throughput at the expense of RAM (the big joinhash in your module).
The problem with this approach is that it cannot use the knowledge of the result of the other conditions until the reduce phase. How can you handle binding multiple variables, when your getKey method have to return a string?
Still, it may be very tricky to get it done right, but implementing it will also allow to do live joins.

Ideally these two approach might be complementary, as a proper database can use multiple join strategy. I bet a pluggable system will be nice.

It would also be cool to get some numbers with a real dataset, so we can know which one works better (the NPM one will be super).

dominictarr · 2013-04-28T11:55:32Z

Hmm, so I think using a hash-join you will need to have multiple joins.
I'm not sure it applies in the case of graphs, but if you wanted to join on two values, you could just hash a combination of keys.

If the second condition refuses the result,
does this mean my query will not return modules that I have have written, that I also depend on?

Agree, we need more generic join modules. Also, we can use db.getApproximateSize(range) to estimate which sides of the join is smaller, and thus which should be in memory.

Another thing - leveldb is probably pretty smart about caching - so maybe reading from the database is not so bad.

mcollina · 2013-04-28T12:15:58Z

Hmm, so I think using a hash-join you will need to have multiple joins.
I'm not sure it applies in the case of graphs, but if you wanted to join on two values, you could just hash a combination of keys.

The problem with hashing is that you can't hash what you do not know. In your query, the first condition is about variables author1 and module. So you can produce an hash on these two, but you must ignore module2 and author2.

If the second condition refuses the result,
does this mean my query will not return modules that I have have written, that I also depend on?

I'm not getting this question, but I'll try answering.
It will returns all the modules you have written that you depend on.
The first condition try binding some variables, but it cannot bind all of them. So it emits a "partial" solution, then the second condition search the store for triples matching the pattern and the already bound variables, and then it emits the matching context.

Another thing - leveldb is probably pretty smart about caching - so maybe reading from the database is not so bad.

It seems so, I think that having a real use case that performs badly can drive the optimization.

dominictarr · 2013-04-30T12:22:01Z

Agree about the "real use case that performs badly". I'm gonna change to 0.10, and try sticking npm into levelgraph and running this query. Unfortunately, I have more pressing "real work" but I'll get to this!

mcollina · 2013-04-30T12:33:08Z

No hurry!
Adding 0.8 support should be easy, if you need it just ping me :).

You might also want to use #4, I believe it's stable enough.

dominictarr · 2013-04-30T12:46:04Z

it's time to move to 10 anyway. would have done this on sunday, but I encounterd a few bugs, in my stuff and fixed them instead.

mcollina · 2013-07-02T18:17:12Z

I have been thinking about this live joining. It is doable, albeit not fast and/or optimizable on the first run.

How about something:

db.join([{
  stream: yourStream // note: this will only work for the very first condition.
  subject: db.v('author1'),
  predicate: 'maintains',
  object: db.v('module'),
},
{
  subject: db.v('module'),
  predicate: 'depends',
  object: db.v('module2')
},
{
  subject: db.v('author2'),
  predicate: 'maintains',
  object: db.v('module2')
}
], function (err, join) {
  console.log(join)
})

It should be easy to implement.
@dominictarr if you need it, I can give it a go.

mcollina · 2013-07-02T18:18:26Z

Another possible syntax:

db.join(yourstream, [{
  subject: db.v('author1'),
  predicate: 'maintains',
  object: db.v('module'),
},
{
  subject: db.v('module'),
  predicate: 'depends',
  object: db.v('module2')
},
{
  subject: db.v('author2'),
  predicate: 'maintains',
  object: db.v('module2')
}
], function (err, join) {
  console.log(join)
})

mcollina · 2013-07-02T18:45:03Z

The only take in this solution is that the live-stream will only be matched by the first condition, while the rest will be based on the internal graph's data.

Another solution might be:

db.join([{
  live: true, // works only on the first condition
  subject: db.v('author1'),
  predicate: 'maintains',
  object: db.v('module'),
},
{
  subject: db.v('module'),
  predicate: 'depends',
  object: db.v('module2')
},
{
  subject: db.v('author2'),
  predicate: 'maintains',
  object: db.v('module2')
}
], function (err, join) {
  console.log(join)
})

This is actually what I like most, it will need to depend on sublevel, but it is ok.

The real question is: should this be forwarding all "internal" matches first, or just the "live" results?

dominictarr · 2013-07-02T19:47:55Z

by "internal", you mean it should match the current values in the database?

Most definately - that is extremely useful - I think the most useful is to be able to match the history, the live changes,
and either just the history or live changes.

However, since you are using streams for this, it is pretty straightforward to adapt the idea to both.

mcollina · 2013-07-02T20:12:53Z

By 'internal' I mean the current values of the database. Not sure about how to support the history (if it is not what is currently in the db).

The Join are implemented like a pipeline. An empty solution start, then it is denied or augmented with variable bindings at each step/condition. Implementing a live join will be starting from a live stream of triples matching the first condition. How does it sound?

What about the interface?

mcollina · 2013-07-09T10:58:52Z

Closing in favor of #24

mcollina mentioned this issue Apr 28, 2013

Query planner for JOIN #7

Closed

mcollina mentioned this issue Jul 2, 2013

Live join support #24

Closed

5 tasks

mcollina closed this as completed Jul 9, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

live joins? #3

live joins? #3

dominictarr commented Apr 27, 2013

mcollina commented Apr 27, 2013

dominictarr commented Apr 27, 2013

mcollina commented Apr 27, 2013

dominictarr commented Apr 28, 2013

mcollina commented Apr 28, 2013

dominictarr commented Apr 30, 2013

mcollina commented Apr 30, 2013

dominictarr commented Apr 30, 2013

mcollina commented Jul 2, 2013

mcollina commented Jul 2, 2013

mcollina commented Jul 2, 2013

dominictarr commented Jul 2, 2013

mcollina commented Jul 2, 2013

mcollina commented Jul 9, 2013

live joins? #3

live joins? #3

Comments

dominictarr commented Apr 27, 2013

mcollina commented Apr 27, 2013

dominictarr commented Apr 27, 2013

mcollina commented Apr 27, 2013

dominictarr commented Apr 28, 2013

mcollina commented Apr 28, 2013

dominictarr commented Apr 30, 2013

mcollina commented Apr 30, 2013

dominictarr commented Apr 30, 2013

mcollina commented Jul 2, 2013

mcollina commented Jul 2, 2013

mcollina commented Jul 2, 2013

dominictarr commented Jul 2, 2013

mcollina commented Jul 2, 2013

mcollina commented Jul 9, 2013