Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the hydroflow graph API to simplify the surface API #58

Merged
merged 27 commits into from
Feb 4, 2022

Conversation

MingweiSamuel
Copy link
Member

@MingweiSamuel MingweiSamuel commented Jan 28, 2022

We inherited the idea that subgraphs are created first, then are linked up to each other. This works okay, but requires waiting for an input/output port pair to be connected before the handoff state can be created, resulting in awkward Rc<RefCell<>> passing around to link things together.

This change swaps the order subgraphs/handoffs are created. Handoffs are created first and their state is immediately inserted into the hydroflow struct. Then it is up to the user to provide the already-created input/output ports when adding a new subgraph.

  • This removes a lot of Rc<RefCell<>> indirection.
  • This improves type inference since types are provided as arguments rather than having to be inferred as the return type.
  • This matches the structure of the surface API (handoffs first), so we can simplify/remove the connecting layer of the surface API implementation.
  • This makes code reuse (pulling graph code into separate functions) easier, potentially, since the implementation is simpler.
  • Allows the handoffs to modify their state while still in the graph assembly phase - useful for teeing handoffs
  • Better error message?

Example

Before:

type MyInputList = tt!(VecHandoff<...>, VecHandoff<...>);
type MyOutputList = tt!(VecHandoff<...>);
let (tl!(input_a, input_b), tl!(output_c)) = hydroflow.add_subgraph::<MyInputList, MyOutputList, _>(
  |ctx, tl!(recv_a, recv_b), tl!(send_c)| {
    ...
  }
);

hydroflow.connect_edge(output_c, ...);
hydroflow.connect_edge(..., input_a);
hydroflow.connect_edge(..., input_b);

After:

let (input_a, ...) = hydroflow.make_handoff::<VecHandoff<...>>();
let (input_b, ...) = hydroflow.make_handoff::<VecHandoff<...>>();
let (..., output_c) = hydroflow.make_handoff::<VecHandoff<...>>();

hydroflow.add_subgraph(
  tl!(input_a, input_b), tl!(output_c),
  |ctx, tl!(recv_a, recv_b), tl!(send_c)| {
    ...
  }
);

@MingweiSamuel MingweiSamuel force-pushed the api-swap branch 17 times, most recently from edf5c3d to f6b5174 Compare February 3, 2022 19:23
@MingweiSamuel
Copy link
Member Author

Changes

Surface API Connector layer removed

Surface API handoff simplified

No more Rc<Cell<>>s

GraphExt functions are now created by a macro and are renamed to be more consistent:

add_subgraph_sink
add_subgraph_2sink
add_subgraph_source
add_subgraph_in_out
add_subgraph_in_2out
add_subgraph_2in_out
add_subgraph_2in_2out

Demux API is commented out for now, should be replaced with a special kind of handoff

HandoffList is not used anymore, instead there are PortList and PortListSplit

This is better for type inference in the new core API

Input/OutputPorts and Send/RecvCtxs are simplified

  • No more Rc<RefCell<>> junk.
  • Now use a common BasePort or BaseCtx to share implementation, keep code DRY.

add_subgraph_n_in_m_out renamed to add_subgraph_n_m

A: Extend<B>, <A as Extend<B>>::Extended: Spit<A, Suffix = B> trait bounds copied throughout

Adds a bunch of lines and is not DRY, but is necessary for better diagnostics when writing abstraction code

Improved teeing in query API

in

impl<T: Clone> Operator<T> {
    pub fn tee(self, n: usize) -> Vec<Operator<T>>

@MingweiSamuel
Copy link
Member Author

MingweiSamuel commented Feb 3, 2022

TODOs in the future

  • Network APIs and add_input need to accept send/recv ports as arguments instead of returning them. This is needed if you are connecting a pair of network ports (or input) directly together (forwarding)
  • Rename InputPort/OutputPort to SendPort/RecvPort (not necessarily respectively?)
  • Rename/update BasePort/BaseCtx/BasePortList, don't use <const S: bool>, instead use a more nicely named tag struct.

Comment on lines 163 to 184
let mut sends = Vec::with_capacity(n);
let mut recvs = Vec::with_capacity(n);
for _ in 0..n {
let (send_port, recv_port) = df.make_handoff::<VecHandoff<T>>();
sends.push(send_port);
recvs.push(Operator {
df: self.df.clone(),
output_port,
})
.collect()
recv_port,
});
}

df.add_subgraph_n_m(vec![self.recv_port], sends, move |_ctx, recvs, sends| {
let input = recvs.iter().next().unwrap().take_inner();
if let Some((&last_output, outputs)) = sends.split_last() {
for output in outputs {
output.give(Iter(input.iter().cloned()));
}
last_output.give(Iter(input.into_iter()));
}
});

recvs
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improves the query API tee, only real change in the runtime code in this diff

Copy link
Contributor

@justinj justinj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a big improvement to me as far as simplicity of the code goes 👍

}

pub fn add_edge<H>(&mut self, output_port: OutputPort<H>, input_port: InputPort<H>)
pub fn make_handoff<H>(&mut self) -> (InputPort<H>, OutputPort<H>)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about renaming this to make_edge? We had some conversations get mildly derailed regarding this, it's not incorrect, and might appeal to the user's intuition a little more?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. I'm always down to rename things better, up until we stabilize (which is probably not soon)

benches/benches/reachability.rs Show resolved Hide resolved
@@ -130,67 +129,63 @@ fn benchmark_hydroflow_scheduled(c: &mut Criterion) {
// A dataflow that represents graph reachability.
let mut df = Hydroflow::new();

let reachable_out = df.add_source(move |_ctx, send: &SendCtx<VecHandoff<usize>>| {
type Hoff = VecHandoff<usize>;
let (reachable_out, merge_lhs) = df.make_handoff::<Hoff>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should have an alias for make_handoff<VecHandoff<...>> given that (at least for now) it's almost always what we use?

impl<T: BasePortList<false>> RecvPortList for T {}

#[sealed]
pub trait BasePortList<const S: bool>: TypeList {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way we can have something other than a bool here? I don't really understand what it's doing, but could it be possible to use a trait instead? Is there a reason having a trait with a bool method marked #[inline] is less expressive than a bool?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically we have InputPort and OutputPort which in terms of implementation are pretty much identically, but I want to treat them as different types since they're not interchangeably semantically. So here we just use one struct to represent both with a bool to differentiate (true = send, false = recv).

But I want to change this to not be a bool, and be a different tag struct instead, because the bool isn't descriptive

pub mod graph;
pub mod graph_demux;
pub mod port;
// pub mod graph_demux;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray? either delete or add a comment explaining when this could be uncommented imo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill add comment

@MingweiSamuel MingweiSamuel merged commit 7bd4f89 into hydro-project:main Feb 4, 2022
@MingweiSamuel MingweiSamuel deleted the api-swap branch February 4, 2022 00:31
@MingweiSamuel
Copy link
Member Author

Basically let (a, b) = df.add_in_out(...) becomes df.add_subgraph_in_out(a, b, ...) and df.add_edge(x, y) at the end becomes let (x, y) = df.make_handoff::<...>() in the beginning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants