Skip to content
Permalink
Browse files

cleanup: remove trailing whitespace

  • Loading branch information...
emanuelbutacu committed Mar 1, 2018
1 parent 37450db commit 92f36f63b10cf614779984fbb02239c4a9d1c41d
Showing with 261 additions and 261 deletions.
  1. +1 −1 CONTRIBUTING.md
  2. +1 −1 communication/src/allocator/thread.rs
  3. +1 −1 examples/capture_recv.rs
  4. +2 −2 examples/reachability.rs
  5. +1 −1 mdbook/src/SUMMARY.md
  6. +10 −10 mdbook/src/chapter_0/chapter_0_1.md
  7. +2 −2 mdbook/src/chapter_0/chapter_0_2.md
  8. +1 −1 mdbook/src/chapter_0/chapter_0_3.md
  9. +1 −1 mdbook/src/chapter_1/chapter_1.md
  10. +7 −7 mdbook/src/chapter_1/chapter_1_1.md
  11. +2 −2 mdbook/src/chapter_1/chapter_1_2.md
  12. +1 −1 mdbook/src/chapter_2/chapter_2.md
  13. +1 −1 mdbook/src/chapter_2/chapter_2_1.md
  14. +1 −1 mdbook/src/chapter_2/chapter_2_2.md
  15. +5 −5 mdbook/src/chapter_2/chapter_2_3.md
  16. +4 −4 mdbook/src/chapter_2/chapter_2_4.md
  17. +12 −12 mdbook/src/chapter_2/chapter_2_5.md
  18. +2 −2 mdbook/src/chapter_4/chapter_4_1.md
  19. +2 −2 mdbook/src/chapter_4/chapter_4_2.md
  20. +2 −2 mdbook/src/chapter_4/chapter_4_3.md
  21. +3 −3 mdbook/src/chapter_4/chapter_4_4.md
  22. +4 −4 mdbook/src/chapter_5/chapter_5_1.md
  23. +13 −13 mdbook/src/chapter_5/chapter_5_2.md
  24. +1 −1 mdbook/src/introduction.md
  25. +4 −4 sort/src/bin/profile.rs
  26. +10 −10 sort/src/lib.rs
  27. +1 −1 sort/src/lsb.rs
  28. +1 −1 sort/src/lsb_swc.rs
  29. +11 −11 sort/src/msb.rs
  30. +11 −11 sort/src/msb_swc.rs
  31. +4 −4 sort/src/swc_buffer.rs
  32. +1 −1 src/dataflow/channels/message.rs
  33. +2 −2 src/dataflow/channels/pact.rs
  34. +4 −4 src/dataflow/channels/pushers/buffer.rs
  35. +1 −1 src/dataflow/channels/pushers/counter.rs
  36. +2 −2 src/dataflow/channels/pushers/exchange.rs
  37. +4 −4 src/dataflow/operators/aggregation/mod.rs
  38. +2 −2 src/dataflow/operators/capture/capture.rs
  39. +3 −3 src/dataflow/operators/capture/event.rs
  40. +1 −1 src/dataflow/operators/capture/extract.rs
  41. +4 −4 src/dataflow/operators/capture/mod.rs
  42. +9 −9 src/dataflow/operators/capture/replay.rs
  43. +6 −6 src/dataflow/operators/delay.rs
  44. +1 −1 src/dataflow/operators/enterleave.rs
  45. +8 −8 src/dataflow/operators/generic/builder_raw.rs
  46. +6 −6 src/dataflow/operators/generic/builder_rc.rs
  47. +3 −3 src/dataflow/operators/input.rs
  48. +1 −1 src/dataflow/operators/map.rs
  49. +2 −2 src/dataflow/operators/probe.rs
  50. +1 −1 src/dataflow/operators/to_stream.rs
  51. +1 −1 src/dataflow/operators/unordered_input.rs
  52. +1 −1 src/dataflow/scopes/root.rs
  53. +2 −2 src/lib.rs
  54. +2 −2 src/order.rs
  55. +17 −17 src/progress/change_batch.rs
  56. +14 −14 src/progress/frontier.rs
  57. +3 −3 src/progress/nested/product.rs
  58. +18 −18 src/progress/nested/reachability.rs
  59. +8 −8 src/progress/nested/reachability_neu.rs
  60. +4 −4 src/progress/nested/summary.rs
  61. +8 −8 src/progress/timestamp.rs
@@ -1,4 +1,4 @@
Thank you for your interest in contributing!
Thank you for your interest in contributing!

Here is some legal stuff that will make you regret clicking on this link.

@@ -21,7 +21,7 @@ impl Thread {
pub fn new<T: 'static>() -> (Pusher<T>, Puller<T>) {
let shared = Rc::new(RefCell::new((VecDeque::<T>::new(), VecDeque::<T>::new())));
(Pusher { target: shared.clone() }, Puller { source: shared, current: None })
}
}
}


@@ -10,7 +10,7 @@ fn main() {
let source_peers = std::env::args().nth(1).unwrap().parse::<usize>().unwrap();

// create replayers from disjoint partition of source worker identifiers.
let replayers =
let replayers =
(0 .. source_peers)
.filter(|i| i % worker.peers() == worker.index())
.map(|i| TcpListener::bind(format!("127.0.0.1:{}", 8000 + i)).unwrap())
@@ -20,7 +20,7 @@ fn test_alt(nodes: usize, rounds: usize) {

// allocate a new empty topology builder.
let mut builder = Builder::<usize>::new();

// Each node with one input connected to one output.
for index in 1 .. nodes {
builder.add_node(index - 1, 1, 1, vec![vec![Antichain::from_elem(0)]]);
@@ -84,7 +84,7 @@ fn test_neu(nodes: usize, rounds: usize) {

// allocate a new empty topology builder.
let mut builder = Builder::<usize>::new();

// Each node with one input connected to one output.
for index in 1 .. nodes {
builder.add_node(index - 1, 1, 1, vec![vec![Antichain::from_elem(0)]]);
@@ -9,7 +9,7 @@

- [Core Concepts](./chapter_1/chapter_1.md)
- [Dataflow](./chapter_1/chapter_1_1.md)
- [Timestamps](./chapter_1/chapter_1_2.md)
- [Timestamps](./chapter_1/chapter_1_2.md)
- [Progress](./chapter_1/chapter_1_3.md)

- [Building Timely Dataflows](./chapter_2/chapter_2.md)
@@ -39,7 +39,7 @@ fn main() {
}
```

We can run this program in a variety of configurations: with just a single worker thread, with one process and multiple worker threads, and with multiple processes each with multiple worker threads.
We can run this program in a variety of configurations: with just a single worker thread, with one process and multiple worker threads, and with multiple processes each with multiple worker threads.

To try this out yourself, first clone the timely dataflow repository using `git`

@@ -53,8 +53,8 @@ To try this out yourself, first clone the timely dataflow repository using `git`

Now `cd` into the directory and build timely dataflow by typing

Echidnatron% cd timely-dataflow
Echidnatron% cargo build
Echidnatron% cd timely-dataflow
Echidnatron% cargo build
Updating registry `https://github.com/rust-lang/crates.io-index`
Compiling timely_sort v0.1.6
Compiling byteorder v0.4.2
@@ -75,7 +75,7 @@ Now we build the `hello` example

And finally we run the `hello` example

Echidnatron% cargo run --example hello
Echidnatron% cargo run --example hello
Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
Running `target/debug/examples/hello`
worker 0: hello 0
@@ -88,7 +88,7 @@ And finally we run the `hello` example
worker 0: hello 7
worker 0: hello 8
worker 0: hello 9
Echidnatron%
Echidnatron%

Rust is relatively clever, and we could have skipped the `cargo build` and `cargo build --example hello` commands; just invoking `cargo run --example hello` will build (or rebuild) anything necessary.

@@ -107,15 +107,15 @@ Of course, we can run this with multiple workers using the `-w` or `--workers` f
worker 1: hello 7
worker 0: hello 8
worker 1: hello 9
Echidnatron%
Echidnatron%

Although you can't easily see this happening, timely dataflow has spun up *two* worker threads and together they have exchanged some data and printed the results as before. However, notice that the worker index is now varied; this is our only clue that different workers exist, and processed different pieces of data. Worker zero introduces all of the data (notice the guard in the code; without this *each* worker would introduce `0 .. 10`), and then it is shuffled between the workers. The only *guarantee* is that records that evaluate to the same integer in the exchange closure go to the same worker. In practice, we (currently) route records based on the remainder of the number when divided by the number of workers.

Finally, let's run with multiple processes. To do this, you use the `-n` and `-p` arguments, which tell each process how many total processes to expect (the `-n` parameter) and which index this process should identify as (the `-p` parameter). You can also use `-h` to specify a host file with names and ports of each of the processes involved, but if you leave it off timely defaults to using the local host.

In one shell, I'm going to start a computation that expects multiple processes. It will hang out waiting for the other processes to start up.

Echidnatron% cargo run --example hello -- -n2 -p0
Echidnatron% cargo run --example hello -- -n2 -p0
Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
Running `target/debug/examples/hello -n2 -p0`

@@ -129,18 +129,18 @@ Now if we head over to another shell, we can type the same thing but with a diff
worker 1: hello 5
worker 1: hello 7
worker 1: hello 9
Echidnatron%
Echidnatron%

Wow, fast! And, we get to see some output too. Only the output for this worker, though. If we head back to the other shell we see the process got moving and produced the other half of the output.

Echidnatron% cargo run --example hello -- -n2 -p0
Echidnatron% cargo run --example hello -- -n2 -p0
Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
Running `target/debug/examples/hello -n2 -p0`
worker 0: hello 0
worker 0: hello 2
worker 0: hello 4
worker 0: hello 6
worker 0: hello 8
Echidnatron%
Echidnatron%

This may seem only slightly interesting so far, but we will progressively build up more interesting tools and more interesting computations, and see how timely dataflow can efficiently execute them for us.
@@ -1,6 +1,6 @@
# When to use Timely Dataflow

Timely dataflow may be a different programming model than you are used to, but if you can adapt your program to it there are several benefits.
Timely dataflow may be a different programming model than you are used to, but if you can adapt your program to it there are several benefits.

* **Data Parallelism**: The operators in timely dataflow are largely "data-parallel", meaning they can operate on independent parts of the data concurrently. This allows the underlying system to distribute timely dataflow computations across multiple parallel workers. These can be threads on your computer, or even threads across computers in a cluster you have access to. This distribution typically improves the throughput of the system, and lets you scale to larger problems with access to more resources (computation, communication, and memory).

@@ -14,7 +14,7 @@ At the same time, dataflow computation is also another way of thinking about you

## Generality

Is timely dataflow always applicable? The intent of this research project is to remove layers of abstraction fat that prevent you from expressing anything your computer can do efficiently in parallel.
Is timely dataflow always applicable? The intent of this research project is to remove layers of abstraction fat that prevent you from expressing anything your computer can do efficiently in parallel.

Under the covers, your computer (the one on which you are reading this text) is a dataflow processor. When your computer *reads memory* it doesn't actually wander off to find the memory, it introduces a read request into your memory controller, an independent component that will eventually return with the associated cache line. Your computer then gets back to work on whatever it was doing, hoping the responses from the controller return in a timely fashion.

@@ -1,6 +1,6 @@
# When not to use Timely Dataflow

There are several reasons not to use timely dataflow, though many of them are *friction* about how your problem is probably expressed, rather than fundamental technical limitations. There are fundamental technical limitations too, of course.
There are several reasons not to use timely dataflow, though many of them are *friction* about how your problem is probably expressed, rather than fundamental technical limitations. There are fundamental technical limitations too, of course.

I've collected a few examples here, but the list may grow with input and feedback.

@@ -4,7 +4,7 @@ Timely dataflow relies on two fundamental concepts: **timestamps** and **dataflo

## Dataflow

Dataflow programming is fundamentally about describing your program as independent components, each of which operate in response to the availability of input data, as well as describing the connections between these components.
Dataflow programming is fundamentally about describing your program as independent components, each of which operate in response to the availability of input data, as well as describing the connections between these components.

The most important part of dataflow programming is the *independence* of the components. When you write a dataflow program, you provide the computer with flexibility in how it executes your program. Rather than insisting on a specific sequence of instructions the computer should follow, the computer can work on each of the components as it sees fit, perhaps even sharing the work with other computers.

@@ -59,7 +59,7 @@ Importantly, we haven't imposed any constraints on how these operators need to r
worker 0: hello 4
worker 0: hello 6
worker 0: hello 8
Echidnatron%
Echidnatron%

What a mess. Nothing in our dataflow program requires that workers zero and one alternate printing to the screen, and you can even see that worker one is *done* before worker zero even gets to printing `hello 4`.

@@ -69,7 +69,7 @@ However, this is only a mess if we are concerned about the order, and in many ca

```rust,ignore
.inspect(|x| {
// we only need to test factors up to sqrt(x)
// we only need to test factors up to sqrt(x)
let limit = (*x as f64).sqrt() as u64;
if *x > 1 && (2 .. limit + 1).all(|i| x % i > 0) {
println!("{} is prime", x);
@@ -85,7 +85,7 @@ However, this is only a mess if we are concerned about the order, and in many ca
Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
Running `target/debug/examples/hello -w1`
cargo run --example hello -- -w1 > output1.txt 59.84s user 0.10s system 99% cpu 1:00.01 total
Echidnatron%
Echidnatron%

And now again with two workers:

@@ -97,10 +97,10 @@ And now again with two workers:

The time is basically halved, from one minute to thirty seconds, which is a great result for those of us who like factoring small numbers. Furthermore, although the 1,262 lines of results of `output1.txt` and `output2.txt` are not in the same order, it takes a fraction of a second to make them so, and verify that they are identical:

Echidnatron% sort output1.txt > sorted1.txt
Echidnatron% sort output2.txt > sorted2.txt
Echidnatron% diff sorted1.txt sorted2.txt
Echidnatron%
Echidnatron% sort output1.txt > sorted1.txt
Echidnatron% sort output2.txt > sorted2.txt
Echidnatron% diff sorted1.txt sorted2.txt
Echidnatron%

---

@@ -43,13 +43,13 @@ The output we get with two workers is now:
worker 0: hello 8 @ (Root, 8)
worker 1: hello 7 @ (Root, 7)
worker 1: hello 9 @ (Root, 9)
Echidnatron%
Echidnatron%

The timestamps are the `(Root, i)` things for various values of `i`. These happen to correspond to the data themselves, but had we provided random input data rather than `i` itself we would still be able to make sense of the output and put it back "in order".

## Timestamps for dataflow operators

Timestamps are not only helpful for dataflow users, but for the operators themselves. With time we will start to write more interesting dataflow operators, and it may be important for them to understand which records should be thought to come before others.
Timestamps are not only helpful for dataflow users, but for the operators themselves. With time we will start to write more interesting dataflow operators, and it may be important for them to understand which records should be thought to come before others.

Imagine, for example, a dataflow operator whose job is to report the "sum so far", where "so far" should be with respect to the timestamp (as opposed to whatever arbitary order the operator receives the records). Such an operator can't simply take its input records, add them to a total, and produce the result. The input records may no longer be ordered by timestamp, and the produced summations may not reflect any partial sum of the input. Instead, the operator needs to look at the timestamps on the records, and incorporate the numbers in order of their timestamps.

@@ -4,7 +4,7 @@ Let's talk about how to create timely dataflows.

This section will be a bit of a tour through the dataflow construction process, ignoring for the moment details about the interesting ways in which you can get data in to and out of your dataflow; those will show up in the "Running Timely Dataflows" section. For now we are going to work with examples with fixed input data and no interactivity to speak of, focusing on what we can cause to happen to that data.

Here is a relatively simple example, taken from `examples/simple.rs`, that turns the numbers zero through nine into a stream, and then feeds them through an `inspect` operator printing them to the screen.
Here is a relatively simple example, taken from `examples/simple.rs`, that turns the numbers zero through nine into a stream, and then feeds them through an `inspect` operator printing them to the screen.

```rust,no_run
extern crate timely;
@@ -34,4 +34,4 @@ There will be more to do to get data in to `input`, and we aren't going to worry

## Other sources

There are other sources of input that are a bit more advanced. Once we learn how to create custom operators, the `source` method will allow us to create a custom operator with zero input streams and one output stream, which looks like a source of data (hence the name). There are also the `Capture` and `Replay` traits that allow us to exfiltrate the contents of a stream from one dataflow (using `capture_into`) and re-load it in another dataflow (using `replay_from`).
There are other sources of input that are a bit more advanced. Once we learn how to create custom operators, the `source` method will allow us to create a custom operator with zero input streams and one output stream, which looks like a source of data (hence the name). There are also the `Capture` and `Replay` traits that allow us to exfiltrate the contents of a stream from one dataflow (using `capture_into`) and re-load it in another dataflow (using `replay_from`).
@@ -1,6 +1,6 @@
# Observing Outputs

Having constructed a minimal streaming computation, we might like to take a peek at the output. There are a few ways to do this, but the simplest by far is the `inspect` operator.
Having constructed a minimal streaming computation, we might like to take a peek at the output. There are a few ways to do this, but the simplest by far is the `inspect` operator.

The `inspect` operator is called with a closure, and it ensures that the closure is run on each record that passes through the operator. This closure can do just about anything, from printing to the screen or writing to a file.

@@ -49,9 +49,9 @@ fn main() {

### Map variants

There are a few variants of `map` with different functionality.
There are a few variants of `map` with different functionality.

For example, the `map_in_place` method takes a closure which receives a mutable reference and produces no output; instead, this method allows you to change the data *in-place*, which can be a valuable way to avoid duplication of resources.
For example, the `map_in_place` method takes a closure which receives a mutable reference and produces no output; instead, this method allows you to change the data *in-place*, which can be a valuable way to avoid duplication of resources.

```rust,no_run
extern crate timely;
@@ -92,7 +92,7 @@ fn main() {

## Filtering

Another fundamental operation is *filtering*, in which a predicate dictates a subset of the stream to retain.
Another fundamental operation is *filtering*, in which a predicate dictates a subset of the stream to retain.

```rust,no_run
extern crate timely;
@@ -115,7 +115,7 @@ Unlike `map`, the predicate passed to the `filter` operator does not receive *ow

## Logical Partitioning

There are two operators for spliting and combining streams, `partition` and `concat` respectively.
There are two operators for spliting and combining streams, `partition` and `concat` respectively.

The `partition` operator takes two arguments, a number of resulting streams to produce, and a closure which takes each record to a pair of the target partition identifier and the output record. The output of `partition` is a list of streams, where each stream contains those elements mapped to the stream under the closure.

@@ -181,7 +181,7 @@ Both `concat` and `concatenate` are efficient operations that exchange no data b

To complement the logical partitioning of `partition`, timely also provides the physical partitioning operator `exchange` which routes records to a worker based on a supplied closure. The `exchange` operator does not change the contents of the stream, but rather the distribution of elements to the workers. This operation can be important if you would like to collect records before printing statistics to the screen, or otherwise do some work that requires a specific data distribution.

Operators that require a specific data distribution will ensure that this occurs as part of their definition. As the programmer, you should not need to invoke `exchange`.
Operators that require a specific data distribution will ensure that this occurs as part of their definition. As the programmer, you should not need to invoke `exchange`.

There are times where `exchange` can be useful. For example, if a stream is used by two operators requiring the same distribution, simply using the stream twice will cause duplicate data exchange as each operator satisfies its requirements. Instead, it may make sense to invoke `exchange` to move the data once, at which point the two operators will no longer require serialization and communication to shuffle their inputs appropriately.

0 comments on commit 92f36f6

Please sign in to comment.
You can’t perform that action at this time.