Calling run() multiple times #30

B-Lorentz · 2023-12-27T14:02:21Z

If I understand correctly, ascent is't a 'differential datalog'-style implementation, but is optimized for batch processing.
The API does however allow .run() to be called multiple times, so users will eventually do it.
I did a little experiment:

use std::time::Instant;

use ascent::ascent;

fn main() {
    ascent!{
        #![measure_rule_times]
        relation x(u64,u64);
        relation y(u64,u64);
        y(i,n) <-- x(i,n), x(i,n+100);
        x(i,n+200) <-- y(i,n), if *n < 400000;  
    }
    let mut prog = AscentProgram::default();
    fn time(prog: &mut AscentProgram, title: &str) {
        let start = Instant::now();
        prog.run();
        println!("{:?} {:?}", prog.x.len(), prog.y.len());
        println!("{}", prog.scc_times_summary());
        println!("{}: {:?}",title, start.elapsed());
    }
    prog.x = vec![(0, 1,), (0, 101,)];
    time(&mut prog, "Initial run");

    prog.x.push((1,1,));    
    time(&mut prog,"Add of non-consequential fact");

    prog.x.push((1,101,));
    time(&mut prog,"Add of consequential fact");

    let mut prog = AscentProgram::default();
    prog.x = vec![(0, 1,), (0, 101,), (1,1,), (1,101)];
    time(&mut prog,"Full rerun");
}

The output is:

4002 4001
scc 0 time: 1.436391002s
  some of rule times: 1.42136186s
  rule x <-- y_indices_none_delta, if ⋯
    time: 4.024887ms
  rule y <-- x_indices_none_delta, x_indices_0_1_total+delta
    time: 2.227919ms
  rule y <-- x_indices_none_total, x_indices_0_1_delta
    time: 1.415109054s
-----------------------------------------

Initial run: 1.436422348s
4003 4001
scc 0 time: 1.443135223s
  some of rule times: 1.42810157s
  rule x <-- y_indices_none_delta, if ⋯
    time: 5.573965ms
  rule y <-- x_indices_none_delta, x_indices_0_1_total+delta
    time: 7.418424ms
  rule y <-- x_indices_none_total, x_indices_0_1_delta
    time: 1.415109181s
-----------------------------------------

Add of non-consequential fact: 9.375828ms
8004 8002
scc 0 time: 11.459918911s
  some of rule times: 11.423892339s
  rule x <-- y_indices_none_delta, if ⋯
    time: 12.857089ms
  rule y <-- x_indices_none_delta, x_indices_0_1_total+delta
    time: 17.993869ms
  rule y <-- x_indices_none_total, x_indices_0_1_delta
    time: 11.393041381s
-----------------------------------------

Add of consequential fact: 10.019436059s
8004 8002
scc 0 time: 3.073175727s
  some of rule times: 3.047507618s
  rule x <-- y_indices_none_delta, if ⋯
    time: 7.811237ms
  rule y <-- x_indices_none_delta, x_indices_0_1_total+delta
    time: 4.606486ms
  rule y <-- x_indices_none_total, x_indices_0_1_delta
    time: 3.035089895s
-----------------------------------------

Full rerun: 3.073211278s

It seems to me that:

running after adding to the input relation sometimes (my second run), does a lot less work than a full rerun.
at other times, when a lot of new facts were implied by the new fact, it does more work than a full rerun.

While (as I expected), removing facts from relations isn't supported.

 ascent!{
        struct A;
        relation x(u64);
        relation y(u64);
        y(n) <-- x(n);
    }
    let mut prog = A::default();
    prog.x = vec![(1,), (2,)];
    prog.run();
    println!("{:?}", prog.y);
    prog.x = vec![(1,)];
    prog.run();
    println!("{:?}", prog.y)

produces
[(1,), (2,)]
[(1,), (2,)]

My question is, would it be possible to either:

add some support for incrmentalized computation, so that you could indicate 'I've added facts to this relation' and 'I've removed facts from this relation', causing some of the indices to be recomputed?
prevent the user from calling .run() multiple times, by making run take ownership of the program instead of taking it by mutable reference

The text was updated successfully, but these errors were encountered:

s-arash · 2023-12-27T19:53:26Z

Great observations.

add some support for incrmentalized computation, so that you could indicate 'I've added facts to this relation' and 'I've removed facts from this relation', causing some of the indices to be recomputed?

This is what is done now. Ascent conservatively assumes that all relations may have changed, and recomputes all indices for the program. Here, it seems that this recomputing of indices is very expensive for some reason. This is something that can potentially be addressed. Although I'd have to think about its ramifications for BYODS.

prevent the user from calling .run() multiple times, by making run take ownership of the program instead of taking it by mutable reference.

This is also a possibility. But I prefer the other option.

B-Lorentz · 2023-12-29T20:55:14Z

@s-arash thanks.
But there isn't only a performance issue.
At least in my last example, where input facts are removed, the result isn't consistent after calling run at the second time.

bugeats · 2024-02-15T13:18:05Z

I have yet to run into a case where calling run() multiple times does not cause a subsequent geometric increase in eval time, even when the row counts don't actually change.

My programs are large but not super complex, and I don't think I'm doing anything too off the beaten path (using enum instances for values).

My solution for now is to manage caches myself and break things down into multiple one-off ascent_run! calls.

s-arash · 2024-03-09T18:11:34Z

At least in my last example, where input facts are removed, the result isn't consistent after calling run at the second time.

Right, I can't think of a good way to prevent removing facts from relations. Even replacing the Vec<T> type for relations with an append-only vec type will not prevent removing facts.

I have yet to run into a case where calling run() multiple times does not cause a subsequent geometric increase in eval time.

Ascent wasn't really designed for incremental processing of data.

I can just disallow calling run() multiple times. But I feel like it may prevent a few legitimate use cases.

I have purposefully not hidden the internals of Ascent's relations and indices so they can be tinkered with by those who know what they are doing.

B-Lorentz · 2024-03-11T02:37:01Z

@s-arash
I think if there is one way (calling once) that is the normal and designed for use of the API, and one that is more for power users who really know what they are doing, it may make sense to separate them, like:

fn run(self) -> ResultSet

(where ResultSet provides the contents of the relations after run, and the indices, but can't be used to run again)

and

/// Runs the program and allows it to be run later. Runs in general are not incremental, and may take longer time than rerunning from scratch, or may fail to account for facts manually removed from the input relation after the first run.
fn run_and_keep(&mut self)

Alternatively, there could be some switch attribute on the macro itself, which would generate the fields by default as private, with appropriate getters (handing out shared references), but would allow advanced users to request an 'exposed internals' ascent program.

StarGazerM · 2024-04-09T15:18:16Z

I think mod/rerun will require add provenance (like add fact/remove fact) semantic.
for example in query

edge(1,2).
path(a, b) :- edge(a, b).

changing edge (like remove edge(1,2)) will cause datalog rule not holds logically (some path(1,2), can't be inferred).
with some semantic support like semi-ring should help with incremental computation.

regexident mentioned this issue Oct 10, 2024

Provide user-friendly and fail-safe API by default (while keeping current API as escape hatch for advanced usage) #49

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calling run() multiple times #30

Calling run() multiple times #30

B-Lorentz commented Dec 27, 2023 •

edited

Loading

s-arash commented Dec 27, 2023

B-Lorentz commented Dec 29, 2023

bugeats commented Feb 15, 2024

s-arash commented Mar 9, 2024

B-Lorentz commented Mar 11, 2024

StarGazerM commented Apr 9, 2024

Calling run() multiple times #30

Calling run() multiple times #30

Comments

B-Lorentz commented Dec 27, 2023 • edited Loading

s-arash commented Dec 27, 2023

B-Lorentz commented Dec 29, 2023

bugeats commented Feb 15, 2024

s-arash commented Mar 9, 2024

B-Lorentz commented Mar 11, 2024

StarGazerM commented Apr 9, 2024

B-Lorentz commented Dec 27, 2023 •

edited

Loading