# Requirements

In [43]:
import Pkg
try
    Pkg.generate("Pipelines")
catch(err)
end
Pkg.activate("Pipelines")
Pkg.instantiate()
Pkg.add("Transducers")
Pkg.add("BenchmarkTools")
using BenchmarkTools
using Transducers

[32m[1m  Activating[22m[39m project at `~/Projects/Julia_good_bad_ugly/source-code/semantics/Pipelines`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/Projects/Julia_good_bad_ugly/source-code/semantics/Pipelines/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Projects/Julia_good_bad_ugly/source-code/semantics/Pipelines/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/Projects/Julia_good_bad_ugly/source-code/semantics/Pipelines/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Projects/Julia_good_bad_ugly/source-code/semantics/Pipelines/Manifest.toml`


# Classic pipelines

The usual way of creating a data processing pipeline is by nesting function calls.

For example, suppose you want the sum of all prime numbers upto 100, you would apply a filter to a range, and compute the sum of the result.

In [36]:
function isprime(n::Integer)
    n > 2 && return all(map(i -> n % i != 0, 2:ceil(sqrt(n))))
    return n == 2
end

isprime (generic function with 1 method)

In [37]:
sum(filter(isprime, 1:100))

1060

# Pipe operator

This however requires us to read from the inside out, which is somewhat hard to do.  Using Julia's pipe operator `|>` the code can be made easier to read.

In [38]:
filter(isprime, 1:100) |> sum

1060

However, you can not easily compose functions such as `map`, `filter` and `reduce` in this way, unless you use, e.g., the `Transducers` package.

# Transducers

Consider for instance the task of computing the square of all prime numbers up to 100.  Using the classic approach, that gets pretty tedious.

In [39]:
sum(
    map(i -> i^2,
        filter(isprime,
            1:100
            )
        )
    )

65796

This is much easier to interprete using `Transformers`.

In [40]:
1:100 |> Filter(isprime) |> Map(i -> i^2) |> sum

65796

# Performance & laziness

An important question for such pipelines is whether they are lazy or not, i.e, do they finish as soon as possible?  A good way to check this is to generate a long vector of positive numbers, and replace the first component by a negative number, map to `Bool` and feed the result to `all`.  If this is lazy, it should be obvious.

In [63]:
all_postive = collect(1:100_000_000);

In [64]:
@benchmark all(map(i -> i > 0, map(i -> i^3, all_postive)))

BenchmarkTools.Trial: 27 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m165.243 ms[22m[39m … [35m299.222 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m184.554 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.84%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m191.224 ms[22m[39m ± [32m 29.029 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m1.13% ± 1.18%

  [39m [39m▁[39m [39m█[39m▁[39m [39m▁[39m [34m▄[39m[39m▄[39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▆[39m█[39m▆[39m█

In [65]:
one_negative = copy(all_postive);
one_negative[1] = -1;

In [66]:
@benchmark all(map(i -> i > 0, map(i -> i^3, one_negative)))

BenchmarkTools.Trial: 26 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m166.015 ms[22m[39m … [35m299.337 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m186.737 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m1.22%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m194.199 ms[22m[39m ± [32m 33.917 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m1.19% ± 1.19%

  [39m▁[39m [39m▁[39m [39m▄[39m [39m [39m [39m [34m█[39m[39m▄[39m [39m▁[32m [39m[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m█[39m▆[39m█[39m▆

It is clear that the composition of these functions is not lazy, although individual functions might be.

In [67]:
@benchmark all_postive |> Map(i -> i^3) |> Map(i -> i > 0) |> all

BenchmarkTools.Trial: 54 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m56.797 ms[22m[39m … [35m430.671 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 29.04%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m65.566 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m95.734 ms[22m[39m ± [32m 77.512 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m14.67% ± 17.46%

  [39m [39m█[34m [39m[39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▇[39m█[34m▁[39m[39m

In [68]:
@benchmark one_negative |> Map(i -> i^3) |> Map(i -> i > 0) |> all

BenchmarkTools.Trial: 10000 samples with 800 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m155.115 ns[22m[39m … [35m159.172 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 99.82%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m208.239 ns               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m301.909 ns[22m[39m ± [32m  2.248 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m10.44% ±  1.41%

  [39m▅[39m▃[34m█[39m[39m▃[39m▂[39m▂[32m▃[39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m▁[39m▁[39m [39m [39m▁
  [39m█[39m█

The functions in `Transducers` are faster than their Julia `Base` counterparts, but more importantly, composition is lazy as illustrated by the last two benchmarks.