In [None]:
import $ivy.`com.typesafe.akka::akka-stream:2.6.4`
repl.pprinter() = repl.pprinter().copy(defaultHeight = 5 )

In [None]:
import java.time._
import scala.concurrent._, duration._
import akka._
import akka.actor._
import akka.stream._
import akka.stream.scaladsl._

# A streaming DSL

[Akka streams](https://doc.akka.io/api/akka/current/akka/stream/index.html) offers a DSL for programming reactive stream processors. These programs are made up from three basic components: sources, flows and sinks.

`Source`s represent data publishers. 

In [None]:
val source: Source[Int, NotUsed] = Source(List(1,2,3,4,5,6,7,8,9))

`Sink`s are data consumers. 

In [None]:
val sink: Sink[Any, Future[Done]]  = 
    Sink.foreach(((a: Any) => println(a)))

We can connect a source and sink in order to obtain a so-called _runnable graph_, i.e. a streaming processor that can be actually run. In DSL terminology, the `RunnableGraph` is the _program_.

In [None]:
val graph1: RunnableGraph[NotUsed] = source.to(sink)

In order to run a graph we need a _materializer_, i.e. the interpreter of the streaming program. The standard materializer offered by akka-stream builts upon actors, so we need an actor system first.

In [None]:
implicit lazy val system: ActorSystem = ActorSystem("akka-stream-primer")

There is no need to explicitly instantiate the materializer, since it's already available implicitly:

In [None]:
implicitly[Materializer]

We can now run our graph:

In [None]:
graph1.run

Between the sources and sinkes we can attach _flows_, intermediate processing steps: 

In [None]:
val graph2 = source.via(Flow[Int].map((i: Int) => i + 1)).to(sink)

In [None]:
graph2.run

### Logging

We can log the activity of each operator in a graph to properly understand the contribution of each step in the transformation pipeline.

In [None]:
import akka.event.Logging

implicit class SourceOps[A, M](S: Source[A, M]){
    def logAll(l: String): Source[A, M] =
    S.log(l).withAttributes(Attributes.logLevels(
        onElement = Logging.WarningLevel,
        onFinish = Logging.WarningLevel,
        onFailure = Logging.DebugLevel))
}

In [None]:
Source(List(1,2,3)).logAll("source")
    .via(Flow[Int].map(_ + 1)).logAll("flow")
    .to(Sink.ignore)
    .run

# Materialized values

The output of a pipeline is called the *materialized value*.

In [None]:
val source = Source(List(1,2,3))

In [None]:
val ignoreS = Sink.ignore
val toListS = Sink.collection[Int,List[Int]]
val toPrintlnS = Sink.foreach(println)
val foldS = Sink.fold[String, Int]("")((acc: String, e: Int) => acc + e)

In [None]:
source.to(ignoreS)
source.to(toListS)
source.to(foldS)

In [None]:
source.toMat(ignoreS)((mvl: NotUsed, mvr: Future[Done]) => mvr)
source.toMat(toListS)(Keep.left)
source.toMat(foldS)((mvl: NotUsed, mvr: Future[String]) => mvr)

Common shortcuts:

In [None]:
source.toMat(foldS)(Keep.right).run
source.runWith(foldS)
source.runFold("")(_+_)

source.to(Sink.foreach(println))
source.runForeach(println)

# Async boundaries 

### Akka streams vs. iterators

Which is the difference between the previous akka stream program and the following `Iterator` program?

In [None]:
List(1,2,3,4).iterator.map(_ + 1).foreach(println)

In both cases, we obtain a streaming processor. However, in akka streams, intermediate processing steps and actions performed over the resulting data, are first-class entities: flows and sinks. They can be defined independently, reused and combined as we wish. There is no such notion in the iterator realm. Moreover, akka streams are compiled into actors, and the graph has the potential to run asyncronously and concurrently, with back-pressure niceties. Iterator programs are run sequentially and syncronously.  

### Exploiting parallelism 

In [None]:
implicit val ec: ExecutionContext = system.dispatcher

In [None]:
Source(1 to 3)
    .mapAsync(1) { i: Int =>
        println(s"A: $i"); Future(i)
    }
    .mapAsync(1) { i: Int =>
        println(s"B: $i"); Future(i)
    }
    .map { i =>
        println(s"C: $i"); Future(i)
    }
    .to(Sink.ignore)
    .run

Alternatively, we can create several substreams in parallel as follows:

In [None]:
Source(1 to 9)
    .flatMapMerge(3, a => {
        println(a); 
        Source(List(-a)).map{b => println(b); b}
    }).runWith(Sink.ignore)

In [None]:
Source(1 to 9)
    .flatMapConcat(a => {
        println(a); 
        Source(List(-a)).map{b => println(b); b}
    }).runWith(Sink.ignore)

# Fan-in, fan-out & additional operators

In [None]:
val source1: Source[Int, NotUsed] = Source(List(1,2,3,4))
val source2: Source[String, NotUsed] = Source(List("hola", "adios"))

In [None]:
source1.map(i => s"num: $i")
    .merge(source2)
    .runForeach(println)

In [None]:
source1.zip(source2)
    .runWith(Sink.foreach(println))

In [None]:
source1.map(_.toString)
    .concat(source2)
    .runWith(Sink.foreach(println))

See https://doc.akka.io/docs/akka/current/stream/stream-substream.html, for an explanation of substreams.

# File IO

In [None]:
import java.nio.file._, akka.util._

In [None]:
val file = Paths.get("Intro.ipynb")

In [None]:
FileIO.fromPath(file)
Framing.delimiter(
    ByteString("\n"), 
    maximumFrameLength = 1500, 
    allowTruncation = true)

In [None]:
val lines: Source[String, Future[IOResult]] = 
    FileIO.fromPath(file)
        .via(Framing.delimiter(ByteString("\n"), maximumFrameLength = 1500, allowTruncation = true))
        .throttle(1, 10.millisecond)
        .map(_.utf8String)
    //    .runWith(Sink.takeLast(3))
    //    .runWith(Sink.foreach(println))

In [None]:
lines.runForeach(println)

In [None]:
FileIO.toPath(Paths.get("linecounts.txt"))

In [None]:
lines.map(_.length)
//    .map(_.toString+"\n")
    .map(i => ByteString(i.toString + "\n"))
    .runWith(FileIO.toPath(Paths.get("linecounts.txt")))