Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalise initialisation calculations #110

Merged
merged 4 commits into from Sep 3, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 8 additions & 0 deletions RELEASES.md
Expand Up @@ -29,6 +29,14 @@ Add a CLI option `--dump-liveness-graph` to dump a Graphviz file with a

# polonius-engine


## v0.???.0

- add the initialisation-tracking inputs `parent`, `var_starts_path`,
`initialized_at`, and`moved_out_at`, as well as the new `Atom` `MovePath` to
the type of `AllFacts` to capture move paths.
- remove the `var_maybe_initialized_on_exit` input, as it is now calculated by Polonius.

## v0.9.0

- add the input `var_initialized_on_exit` which indicates if a variable may be
Expand Down
7 changes: 4 additions & 3 deletions inputs/collect-facts.sh
Expand Up @@ -10,8 +10,9 @@ INPUT_FOLDERS=(drop-liveness drop-may-dangle drop-no-may-dangle enum-drop-access

for test_folder in "${INPUT_FOLDERS[@]}";
do
pushd "$test_folder"
rustc +$RUSTC_RELEASE $RUSTC_ARGS -o /dev/null -- *.rs
popd
pushd "$test_folder" || exit
find . -name "*.facts" | xargs -- rm
rustc +$RUSTC_RELEASE $RUSTC_ARGS -- *.rs
popd || exit
done

24 changes: 18 additions & 6 deletions polonius-engine/src/facts.rs
Expand Up @@ -3,7 +3,7 @@ use std::hash::Hash;

/// The "facts" which are the basis of the NLL borrow analysis.
#[derive(Clone, Debug)]
pub struct AllFacts<R: Atom, L: Atom, P: Atom, V: Atom> {
pub struct AllFacts<R: Atom, L: Atom, P: Atom, V: Atom, M: Atom> {
/// `borrow_region(R, B, P)` -- the region R may refer to data
/// from borrow B starting at the point P (this is usually the
/// point *after* a borrow rvalue)
Expand Down Expand Up @@ -43,12 +43,21 @@ pub struct AllFacts<R: Atom, L: Atom, P: Atom, V: Atom> {
/// it when dropping`
pub var_drops_region: Vec<(V, R)>,

/// `var_initialized_on_exit(V, P) when the variable `V` is initialized on
/// exit from point `P` in the program flow.
pub var_initialized_on_exit: Vec<(V, P)>,
/// `child(M1, M2) when the move path `M1` is the child of `M2`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be useful to give an example. For example, I imagine:


child(M1, M2) would be true If M1 represents a.b.c and M2 represents a.b.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. The hand-wavyness was because I was not entirely sure if it was transitive or not. It is, though, which may or may not be what we want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. The name strongly suggests non-transitive to me. I would have said something like descendant for a transitive child relationship.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I'll make a note of fixing this at some point.

pub child: Vec<(M, M)>,

/// `path_belongs_to_var(M, V) when the move path `M` starts in variable `V`.
pub path_belongs_to_var: Vec<(M, V)>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 'belongs to'? I think maybe path_starts_with_var or something.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I called it that originally, but I got confused and was unsure if it held transitively (it doesn't). I'll change it to something better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I'm not sure what "transitive" means in this case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Includes descendants"


/// `initialized_at(M, P) when the move path `M` was initialized at point
/// `P`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think an example would be good --


If we have some Rust code like:

x.y = 3 // point P

where M1 represents x.y, then we would have initialized_at(M1, P).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading over the code I had a few questions. Are you assuming that:

a = (22, 44);

would generate:

initialized_at(a);
initialized_at(a.0);
initialized_at(a.1);

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my preference would NOT to assume that -- that is, I think the compiler should give us just initialized_at(a), and if we care to elaborate the closure with respect to child paths, we do it ourselves.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had to investigate what the current fact generation actually does, and it only generates the fact for the path being initialized directly. I'll clarify this in the comments and add an example.

pub initialized_at: Vec<(M, P)>,

/// `moved_out_at(M, P) when the move path `M` was moved at point `P`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, are we assuming here that drop(a); would generate moved_out(a.0) and moved_out(a.1) facts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above -- I think we should do the elaboration ourselves.

pub moved_out_at: Vec<(M, P)>,
}

impl<R: Atom, L: Atom, P: Atom, V: Atom> Default for AllFacts<R, L, P, V> {
impl<R: Atom, L: Atom, P: Atom, V: Atom, M: Atom> Default for AllFacts<R, L, P, V, M> {
nikomatsakis marked this conversation as resolved.
Show resolved Hide resolved
fn default() -> Self {
AllFacts {
borrow_region: Vec::default(),
Expand All @@ -63,7 +72,10 @@ impl<R: Atom, L: Atom, P: Atom, V: Atom> Default for AllFacts<R, L, P, V> {
var_drop_used: Vec::default(),
var_uses_region: Vec::default(),
var_drops_region: Vec::default(),
var_initialized_on_exit: Vec::default(),
child: Vec::default(),
path_belongs_to_var: Vec::default(),
initialized_at: Vec::default(),
moved_out_at: Vec::default(),
}
}
}
Expand Down
19 changes: 15 additions & 4 deletions polonius-engine/src/output/datafrog_opt.rs
Expand Up @@ -11,24 +11,35 @@
use std::collections::{BTreeMap, BTreeSet};
use std::time::Instant;

use crate::output::initialization;
use crate::output::liveness;
use crate::output::Output;

use datafrog::{Iteration, Relation, RelationLeaper};
use facts::{AllFacts, Atom};

pub(super) fn compute<Region: Atom, Loan: Atom, Point: Atom, Variable: Atom>(
pub(super) fn compute<Region: Atom, Loan: Atom, Point: Atom, Variable: Atom, MovePath: Atom>(
dump_enabled: bool,
all_facts: AllFacts<Region, Loan, Point, Variable>,
) -> Output<Region, Loan, Point, Variable> {
all_facts: AllFacts<Region, Loan, Point, Variable, MovePath>,
) -> Output<Region, Loan, Point, Variable, MovePath> {
let mut result = Output::new(dump_enabled);

let var_maybe_initialized_on_exit = initialization::init_var_maybe_initialized_on_exit(
all_facts.child,
all_facts.path_belongs_to_var,
all_facts.initialized_at,
all_facts.moved_out_at,
&all_facts.cfg_edge,
&mut result,
);

let region_live_at = liveness::init_region_live_at(
all_facts.var_used,
all_facts.var_drop_used,
all_facts.var_defined,
all_facts.var_uses_region,
all_facts.var_drops_region,
all_facts.var_initialized_on_exit,
var_maybe_initialized_on_exit,
&all_facts.cfg_edge,
all_facts.region_live_at,
all_facts.universal_region,
Expand Down
6 changes: 3 additions & 3 deletions polonius-engine/src/output/hybrid.rs
Expand Up @@ -16,10 +16,10 @@ use crate::output::location_insensitive;
use crate::output::Output;
use facts::{AllFacts, Atom};

pub(super) fn compute<Region: Atom, Loan: Atom, Point: Atom, Variable: Atom>(
pub(super) fn compute<Region: Atom, Loan: Atom, Point: Atom, Variable: Atom, MovePath: Atom>(
dump_enabled: bool,
all_facts: AllFacts<Region, Loan, Point, Variable>,
) -> Output<Region, Loan, Point, Variable> {
all_facts: AllFacts<Region, Loan, Point, Variable, MovePath>,
) -> Output<Region, Loan, Point, Variable, MovePath> {
let lins_output = location_insensitive::compute(dump_enabled, &all_facts);
if lins_output.errors.is_empty() {
lins_output
Expand Down
117 changes: 117 additions & 0 deletions polonius-engine/src/output/initialization.rs
@@ -0,0 +1,117 @@
use std::time::Instant;

use crate::output::Output;
use facts::Atom;

use datafrog::{Iteration, Relation, RelationLeaper};

pub(super) fn init_var_maybe_initialized_on_exit<Region, Loan, Point, Variable, MovePath>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function definitely wants a comment with some kind of Rust example showing where it's true and not true.

child: Vec<(MovePath, MovePath)>,
path_belongs_to_var: Vec<(MovePath, Variable)>,
initialized_at: Vec<(MovePath, Point)>,
moved_out_at: Vec<(MovePath, Point)>,
cfg_edge: &[(Point, Point)],
output: &mut Output<Region, Loan, Point, Variable, MovePath>,
) -> Vec<(Variable, Point)>
where
Region: Atom,
Loan: Atom,
Point: Atom,
Variable: Atom,
MovePath: Atom,
{
debug!("compute_initialization()");
let computation_start = Instant::now();
let mut iteration = Iteration::new();

// Relations
let child: Relation<(MovePath, MovePath)> = child.into();
let path_belongs_to_var: Relation<(MovePath, Variable)> = path_belongs_to_var.into();
let initialized_at: Relation<(MovePath, Point)> = initialized_at.into();
let moved_out_at: Relation<(MovePath, Point)> = moved_out_at.into();
// FIXME: is there no better way to do this?
nikomatsakis marked this conversation as resolved.
Show resolved Hide resolved
let cfg_edge: Relation<(Point, Point)> = cfg_edge.iter().map(|&(p, q)| (p, q)).collect();

// Variables

// var_maybe_initialized_on_exit(V, P): Upon leaving `P`, at least one part of the
// variable `V` might be initialized for some path through the CFG.
let var_maybe_initialized_on_exit =
iteration.variable::<(Variable, Point)>("var_maybe_initialized_on_exit");

// path_maybe_initialized_on_exit(M, P): Upon leaving `P`, the move path `M`
// might be initialized for some path through the CFG.
let path_maybe_initialized_on_exit =
iteration.variable::<(MovePath, Point)>("path_maybe_initialized_on_exit");

// Initial propagation of static relations

// path_maybe_initialized_on_exit(Path, Point) :- initialized_at(Path,
// Point).
path_maybe_initialized_on_exit.insert(initialized_at);

while iteration.changed() {
// path_maybe_initialized_on_exit(M, Q) :-
// path_maybe_initialized_on_exit(M, P),
// cfg_edge(P, Q),
// !moved_out_at(M, Q).
path_maybe_initialized_on_exit.from_leapjoin(
&path_maybe_initialized_on_exit,
(
cfg_edge.extend_with(|&(_m, p)| p),
moved_out_at.extend_anti(|&(m, _p)| m),
),
|&(m, _p), &q| (m, q),
);

// path_maybe_initialized_on_exit(Mother, P) :-
// path_maybe_initialized_on_exit(Daughter, P),
// child(Daughter, Mother).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm debating the purpose of this rule. It feels sort of incomplete on its own. It also suggests that the name "maybe initialized" could be refined -- in particular I guess it includes partial initialization? That is, this rule says that if a.b is "maybe initialized", then a is "maybe initialized". There is no corresponding rule saying (for example) that a move from a is also a move from a.b, so I have to assume that moved_out facts are the "transitive closure" over children (and that suggests to me that initialization facts should be the same).

Well, I guess the point of this "maybe partially initialized" relation is precisely to inform the region-live-at computation in NLL, which is probably already imprecise in this manner? I guess I better double check, but I remember us making some decisions like that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, if the goal is to ultimately compute this relation, then why do we need the previous rule?

Well, it doesn't seem harmful -- but I guess we can either do the elaboration at the initial point -- i.e., if some path M is initialized at a point P, then we could make the initial value include that all paths which are either parents or children of M are "maybe initialized" at the point P. And then we don't need this rule to be part of the iteration -- rather, we compute this transition closure before hand.

I think this is what Lark did, but I suppose I should go back and try to write up the rules I used there. Maybe a good blog post.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this connects with your next comment, so I'll reply to it there.

path_maybe_initialized_on_exit.from_leapjoin(
&path_maybe_initialized_on_exit,
child.extend_with(|&(daughter, _p)| daughter),
|&(_daughter, p), &mother| (mother, p),
);

// var_maybe_initialized_on_exit(V, P) :-
// path_belongs_to_var(M, V),
// path_maybe_initialized_at(M, P).
var_maybe_initialized_on_exit.from_leapjoin(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is tickling me here. If the goal ultimately is to compute which variables are maybe initialized, why do we have to extend to the parents? That is, it feels like we should track initialization precisely.

Something like this:

  • When you assign to a path M, that also counts as an assignment to each child path Mc (where Mc is a transitive child of M).
  • When you move from M, that also counts as a move from from child path Mc (as above).
  • If M is initialized at P, then it is initialized at each successor Q, unless Q moves from M (as above).

then compute var_maybe_init as we are doing here.

I guess the example where this differs is something like:

a.1 = foo; // initializes a.1 in my version, but also initializes a in your version
drop(a.1); // moves out from a.1 only
// is "a" considered maybe init here? in your version it is, in mine it isn't.

The existing NLL is somewhat approximate here, so maybe this is intentional.

Am I missing something?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you are detecting is that I am computing sort of two different transitive closures. What I want to do is to trace back partial initialisation to a variable for use in the next step to determine if a drop would happen.

Probably, the cleanest design would be to compute the transitive closure over paths downwards, and separately extend the "variable-rootedness" across another transitive closure (tracing all paths back to their variable), and then finally perform the join, but I am also not sure if this is an over-approximation. The more I think about this, the less I feel like I understand it.

I am fairly certain what we did in the original fact generation was this, though: we emitted var_maybe_initialized_at whenever some sub-path of var was (maybe) initialized at that point.

&path_maybe_initialized_on_exit,
path_belongs_to_var.extend_with(|&(m, _p)| m),
|&(_m, p), &v| (v, p),
);
}

let var_maybe_initialized_on_exit = var_maybe_initialized_on_exit.complete();

info!(
"compute_initialization() completed: {} tuples, {:?}",
var_maybe_initialized_on_exit.len(),
computation_start.elapsed()
);

if output.dump_enabled {
let path_maybe_initialized_on_exit = path_maybe_initialized_on_exit.complete();
for &(path, location) in &path_maybe_initialized_on_exit.elements {
output
.path_maybe_initialized_at
.entry(location)
.or_insert_with(Vec::new)
.push(path);
}

for &(var, location) in &var_maybe_initialized_on_exit.elements {
output
.var_maybe_initialized_on_exit
.entry(location)
.or_insert_with(Vec::new)
.push(var);
}
}

var_maybe_initialized_on_exit
.iter()
.map(|&(v, p)| (v, p))
.collect()
}
60 changes: 47 additions & 13 deletions polonius-engine/src/output/liveness.rs
Expand Up @@ -18,27 +18,40 @@ use facts::Atom;

use datafrog::{Iteration, Relation, RelationLeaper};

pub(super) fn compute_live_regions<Region: Atom, Loan: Atom, Point: Atom, Variable: Atom>(
pub(super) fn compute_live_regions<Region, Loan, Point, Variable, MovePath>(
var_used: Vec<(Variable, Point)>,
var_drop_used: Vec<(Variable, Point)>,
var_defined: Vec<(Variable, Point)>,
var_uses_region: Vec<(Variable, Region)>,
var_drops_region: Vec<(Variable, Region)>,
cfg_edge: &[(Point, Point)],
var_initialized_on_exit: Vec<(Variable, Point)>,
output: &mut Output<Region, Loan, Point, Variable>,
) -> Vec<(Region, Point)> {
var_maybe_initialized_on_exit: Vec<(Variable, Point)>,
output: &mut Output<Region, Loan, Point, Variable, MovePath>,
) -> Vec<(Region, Point)>
where
Region: Atom,
Loan: Atom,
Point: Atom,
Variable: Atom,
MovePath: Atom,
{
debug!("compute_liveness()");
let computation_start = Instant::now();
let mut iteration = Iteration::new();

// Relations
let var_defined_rel: Relation<(Variable, Point)> = var_defined.into();
let cfg_edge_rel: Relation<(Point, Point)> = cfg_edge.iter().map(|(p, q)| (*p, *q)).collect();
let cfg_edge_reverse_rel: Relation<(Point, Point)> =
cfg_edge.iter().map(|(p, q)| (*q, *p)).collect();
let var_uses_region_rel: Relation<(Variable, Region)> = var_uses_region.into();
let var_drops_region_rel: Relation<(Variable, Region)> = var_drops_region.into();
let var_initialized_on_exit_rel: Relation<(Variable, Point)> = var_initialized_on_exit.into();
let var_maybe_initialized_on_exit_rel: Relation<(Variable, Point)> =
var_maybe_initialized_on_exit.into();
let var_drop_used_rel: Relation<((Variable, Point), ())> = var_drop_used
.into_iter()
.map(|(v, p)| ((v, p), ()))
.collect();

// Variables

Expand All @@ -53,8 +66,23 @@ pub(super) fn compute_live_regions<Region: Atom, Loan: Atom, Point: Atom, Variab
// This propagates the relation `var_live(V, P) :- var_used(V, P)`:
var_live_var.insert(var_used.into());

// This propagates the relation `var_drop_live(V, P) :- var_drop_used(V, P)`:
var_drop_live_var.insert(var_drop_used.into());
// var_maybe_initialized_on_entry(V, Q) :-
// var_maybe_initialized_on_exit(V, P),
// cfg_edge(P, Q).
let var_maybe_initialized_on_entry = Relation::from_leapjoin(
&var_maybe_initialized_on_exit_rel,
cfg_edge_rel.extend_with(|&(_v, p)| p),
|&(v, _p), &q| ((v, q), ()),
);

// var_drop_live(V, P) :-
// var_drop_used(V, P),
// var_maybe_initialzed_on_entry(V, P).
var_drop_live_var.insert(Relation::from_join(
&var_drop_used_rel,
&var_maybe_initialized_on_entry,
|&(v, p), &(), &()| (v, p),
));

while iteration.changed() {
// region_live_at(R, P) :-
Expand Down Expand Up @@ -89,14 +117,14 @@ pub(super) fn compute_live_regions<Region: Atom, Loan: Atom, Point: Atom, Variab
// var_drop_live(V, Q),
// cfg_edge(P, Q),
// !var_defined(V, P)
// var_initialized_on_exit(V, P).
// var_maybe_initialized_on_exit(V, P).
// extend p with v:s from q such that v is not in q, there is an edge from p to q
var_drop_live_var.from_leapjoin(
&var_drop_live_var,
(
var_defined_rel.extend_anti(|&(v, _q)| v),
cfg_edge_reverse_rel.extend_with(|&(_v, q)| q),
var_initialized_on_exit_rel.extend_with(|&(v, _q)| v),
var_maybe_initialized_on_exit_rel.extend_with(|&(v, _q)| v),
),
|&(v, _q), &p| (v, p),
);
Expand Down Expand Up @@ -157,17 +185,23 @@ pub(super) fn make_universal_region_live<Region: Atom, Point: Atom>(
}
}

pub(super) fn init_region_live_at<Region: Atom, Loan: Atom, Point: Atom, Variable: Atom>(
pub(super) fn init_region_live_at<
Region: Atom,
Loan: Atom,
Point: Atom,
Variable: Atom,
MovePath: Atom,
>(
var_used: Vec<(Variable, Point)>,
var_drop_used: Vec<(Variable, Point)>,
var_defined: Vec<(Variable, Point)>,
var_uses_region: Vec<(Variable, Region)>,
var_drops_region: Vec<(Variable, Region)>,
var_initialized_on_exit: Vec<(Variable, Point)>,
var_maybe_initialized_on_exit: Vec<(Variable, Point)>,
cfg_edge: &[(Point, Point)],
region_live_at: Vec<(Region, Point)>,
universal_region: Vec<Region>,
output: &mut Output<Region, Loan, Point, Variable>,
output: &mut Output<Region, Loan, Point, Variable, MovePath>,
) -> Vec<(Region, Point)> {
debug!("init_region_live_at()");
let mut region_live_at = if region_live_at.is_empty() {
Expand All @@ -179,7 +213,7 @@ pub(super) fn init_region_live_at<Region: Atom, Loan: Atom, Point: Atom, Variabl
var_uses_region,
var_drops_region,
cfg_edge,
var_initialized_on_exit,
var_maybe_initialized_on_exit,
output,
)
} else {
Expand Down