Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Produce even better DOT graphs #3

Merged
merged 1 commit into from Nov 12, 2016
Merged

Produce even better DOT graphs #3

merged 1 commit into from Nov 12, 2016

Conversation

benesch
Copy link
Contributor

@benesch benesch commented Nov 10, 2016

Output DOT graphs without using petgraph's stock formatter so that we
can output each node's configuration using a dense, human-readable,
relational-algebra-like format.

Here's what the main/web graph looks like in this new world:

dot-example

Copy link
Member

@ms705 ms705 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me in principle; I left some minor comments, but I think we should merge this quickly as it will be useful to everyone.

@@ -326,6 +327,35 @@ impl<Q, U, D> FlowGraph<Q, U, D>
(&self.graph, self.source)
}

/// Write a DOT representation of the graph to `f`.
pub fn write_dot<W: io::Write>(&self, f: &mut W) -> io::Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a minor comment, but would it make sense to either

  1. put this implementation into impl Display for FlowGraph<...> (or Debug rather than Display), such that one can simply print a flow graph to get the dot graph? (AIUI, your PR leaves the old behavior of printing a simplistic graph in place?); and/or
  2. change the second argument here to be fmt::Formatter, rather than a generic type that implements io::Write?

I think (1) we should do for sure, but (2) is a minor stylistic point and your code may even be preferable as it is more general and e.g., supports direct writing to a file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do what you think is best! I chose this method because I figured—at least for Debug—a dot graph might not be the most useful output format for immediate debug insight.

Regarding (2), how does one create an instance of fmt::Formatter without calling format!? It seems like you can only get one of those by implementing Display or Debug.

@@ -145,6 +155,12 @@ mod tests {
}

#[test]
fn it_describes() {
let c = setup(true, true);
assert_eq!(c.inner.description(), "Count γ[0, 2]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: might be good to test the SUM case too here, since that produces a somewhat different output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

.flat_map(|(left, rs)| {
rs.against
.iter()
.filter(move |&(right, _)| left < right)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this move?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The borrow checker complains otherwise

error: `left` does not live long enough
   --> src/ops/join.rs:449:43
    |
449 |                     .filter(|&(right, _)| left < right)
    |                             ------------- ^^^^ does not live long enough
    |                             |
    |                             capture occurs here
...
457 |             })
    |             - borrowed value only lives until here
458 |             .collect::<Vec<_>>()
    |                                - borrowed value needs to live until here

and I wasn't sure how else to fix it. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see -- yes, that seems fine. (I guess you could technically borrow left in the inside closure, i.e., use &left, but that wouldn't be much prettier.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a somewhat annoying side-effect of using flat_map. move is perfectly acceptable to use here.

@benesch benesch force-pushed the dot-graphs branch 2 times, most recently from 8a8ace7 to 5a3c950 Compare November 11, 2016 05:57
@benesch
Copy link
Contributor Author

benesch commented Nov 11, 2016

Okay, I made this Display and restored the original implementation of Debug. This has the nice side effect of making debug printouts of Node objects more what you'd expect: a near-complete listing of the struct's fields. But let me know if you'd rather I nuke the old Debug completely.

@jonhoo
Copy link
Contributor

jonhoo commented Nov 11, 2016

I don't think I currently have a strong preference for what Debug looks like, as long as it's somewhat readable and has most of the interesting information. Arguably this should not use Display, but rather some new trait like Draw, but since we're not using Display for anything, it's probably fine.

As for the visualization, I'm not entirely sure I understand the top-right notation in each box, but that's probably just due to my own ignorance. Including a legend somewhere would be useful though.

use ops::NodeOp;
let indent = " ";
try!(writeln!(f, "digraph {{"));
try!(writeln!(f, "{}node [shape=record, fontsize=10]", indent));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Rust 1.13, you can now use writeln!(..)? instead if you think that reads better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent seems unnecessary here, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent helps produce more human-readable dot graphs, which is useful for debugging them ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, though the indent seems pretty basic (you could get the same output in vim by typing gg=G), and it makes the code harder to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reformatted the function to use an indent lambda that should make the code easier to read?

@@ -65,4 +65,8 @@ impl NodeOp for Base {
fn is_base(&self) -> bool {
true
}

fn description(&self) -> String {
"𝓕".into()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for the choice of 𝓕 to signify base nodes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if so, maybe include a comment somewhere pointing to a resource laying out the "known" operators

Copy link
Contributor Author

@benesch benesch Nov 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F for fact. We could also do 𝓑 for base, or "Base"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer "B" for base. I don't think it should be stylized, because that makes me think that is some "well known" operator from relational algebra or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

fn description(&self) -> String {
let op_string = match self.op {
Aggregation::COUNT => "Count".into(),
Aggregation::SUM => format!("Sum({})", self.over),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "Sum" not be 𝛴 ? I don't know if there's an equivalent symbol for "Count"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, and I figured min and max are usually spelled out too, so I left it as "Sum" for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I suppose we could use for min and for max, and |*| for count (magnitude bars). I don't have strong feelings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like |*| for count. For min/max, I feel like T (top) and ⊥ (bottom) could work. We could also use ⌈x⌉ (ceiling) and ⌊x⌋ (floor) where x is the column being computed over.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Done!

};
let group_cols = self.group.iter().map(|g| g.to_string())
.collect::<Vec<_>>().join(", ");
format!("{} γ[{}]", op_string, group_cols)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the γ ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relational algebra for "group by", so that's how it reads in my head.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, so the γ here actually does stem from some definition. A reference would be useful. It doesn't need to be in the diagram, but at least somewhere in the code. Maybe in the docstring for the description method in the trait definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

let group_cols = group_cols.iter().map(|g| g.to_string())
.collect::<Vec<_>>().join(", ");

format!("Concat({} / \"{}\") γ[{}]", fields, self.separator, group_cols)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think || is often used to signify concatenation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

.flat_map(|(left, rs)| {
rs.against
.iter()
.filter(move |&(right, _)| left < right)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a somewhat annoying side-effect of using flat_map. move is perfectly acceptable to use here.

let op = if rs.outer { "⋉" } else { "⋈" };
rs.fields.iter().map(move |&(li, ri)| {
format!("{}:{} {} {}:{}",
left.index(), li, op, right.index(), ri)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This syntax reads weirdly to me. Can we come up with something that's easier to visually scan?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a suggestion? I picked node:col because it's consistent with unions: node:[col, col, col].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I meant more that I don't understand it. Is this showing which columns are emitted, or which columns are joined on? I feel like both should ideally be shown..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

fn description(&self) -> String {
let key_cols = self.key.iter().map(|k| k.to_string())
.collect::<Vec<_>>().join(", ");
format!("Latest γ[{}]", key_cols)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to have us use ⌈ ⌉ here, though its meaning would be somewhat overloaded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent with the other grouping operators, though I guess latest isn't considered a grouping operator as far as the implementation is concerned. You tell me!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'm fine with keeping the γ for grouping. Was more wondering if there's a more concise way to express "latest" rather than spelling it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the hourglass? ⧖

@@ -315,7 +333,7 @@ impl Node {

impl Debug for Node {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{:?}", self.name)
write!(f, "{:?}({:#?})", self.name, *self.inner)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the *?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

¯_(ツ)_/¯ that's just what it used to be: f2c8b42#diff-52cb7ae98bea74840c4725039169373cL318

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird.. Does it compile without it?

format!("{}:[{}]", src.index(), cols)
})
.collect::<Vec<_>>()
.join(" ∪ ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be ⋃, not ∪ ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, sure

@benesch
Copy link
Contributor Author

benesch commented Nov 11, 2016

Yeah, @jonhoo, the original PR added a write_dot method to the flow::Graph type instead of implementing Display. Happy to revert to that (possibly in its own trait) if you prefer. It'd be nice if Rust allowed you to install your own custom format codes

format!("{:dot}", graph)

impl FormatDot for Graph {
   fn fmt() { /* ... */ }
}

but alas.

@ms705
Copy link
Member

ms705 commented Nov 11, 2016

I think using Display and Debug in the way we do in the latest version is fine.

Adding a legend in Dot is a bit of a pain although it can be done (http://stackoverflow.com/a/4752766). I prefer "B" for "Base" over "F"/"Fact", but merely due to consistency with the implementation names.

@jonhoo
Copy link
Contributor

jonhoo commented Nov 11, 2016

Yeah, the ability to write custom formatting traits would be really neat. Maybe we should start an RFC for it :p

Output DOT graphs without using petgraph's stock formatter so that we
can output each node's configuration using a dense, human-readable,
relational-algebra-like format.
@benesch
Copy link
Contributor Author

benesch commented Nov 12, 2016

Here's the updated screenshot:

example

@ms705
Copy link
Member

ms705 commented Nov 12, 2016

@benesch Thanks, looks good!

Let's merge this -- it's a big improvement, and we have bigger research fish to fry 😉

@benesch
Copy link
Contributor Author

benesch commented Nov 12, 2016

Yessir ;)

@benesch benesch merged commit 205616c into master Nov 12, 2016
@benesch benesch deleted the dot-graphs branch November 12, 2016 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants