# Chapter 10: Spark Streaming

In this Chapter, we are going to investigate the streaming capabilities of Spark. 

In order to perform the exercises included in this Notebook, it is neccesary to send messages to the port 9999. There is a Python script (`send_messages.py`) that performs this automatically. In order to run that script, open a terminal using the Notebook interface. Then, place the working directory in ~/work/chapter10-spark-streaming, and type the following command: `nohup python send_messages.py &`. Please, take note of the process id shown in the terminal. If you want to finish this process, type `kill -9 <process-id>`.

## Stateless Transformations

In this section, we will see a very easy example of some stateless transformations.

### Conventional Stateless Transformations

The majority of the stateless transformations are almost the same the ones we can applied in conventional RDDs (`map()`, `flatMap()`, `filter()`, ...). In order how to use them, we are going to to an example where we will count the number of occurences of different types of messages ("info", "notice", "error" and "unkonwn") coming from a log streaming input. We will perform that count on batch intervals of 10 seconds.

In [None]:
import org.apache.spark
import org.apache.spark

/**
Returns the type of message of an input line from log data deppending if the lines
contains the "[info]" (type "info"), "[notice]" (type "notice") or "[error]" (type "error") 
keyword. If none of them are found, the returned type is "unknown"
    
@input line input line
@return message type

**/
def getMessageType(line: String): String = {
    
}


val spark = 
val ssc = 
val lines = 
val codes = 
val codesCount = 
codesCount
ssc
ssc

In [None]:
ssc

## Stateful transformations

Stateful operations are those which takes into account the values of the current batch and the previous one.

### `AndWindow` type

The are many equivalent stateless - statuful transformations, where the last ones are characterized by the ending "AndWindow". We will see one of the last examples using this approach.

In [None]:
import org.apache.spark
import org.apache.spark

/**
Returns the type of message of an input line from log data deppending if the lines
contains the "[info]" (type "info"), "[notice]" (type "notice") or "[error]" (type "error") 
keyword. If none of them are found, the returned type is "unknown"
    
@input line input line
@return message type

**/
def getMessageType(line: String): String = {
    
}


val spark = 
val ssc = 
ssc
val lines = 
val codes = 
val codesCount = 

codesCount
ssc
ssc

In [None]:
ssc

### `updateStateByKey` type

`updateStateByKey` function allows to keep some acumulative feautores during batch processing. For example, we are going to perform the last example but mantaining the total numbers of counts.

In [None]:
import org.apache.spark
import org.apache.spark

/**
Returns the type of message of an input line from log data deppending if the lines
contains the "[info]" (type "info"), "[notice]" (type "notice") or "[error]" (type "error") 
keyword. If none of them are found, the returned type is "unknown"
    
@input line input line
@return message type
**/
def getMessageType(line: String): String = {

}


/**
Accumulates an iterative counter
**/
def updateFunction(): Option[Int] = {
}


val spark = 
val ssc = 
ssc
val lines = ssc
val codes = lines
val codesCount = codes
val codesCountCumu = codesCount
codesCountCumu
ssc
ssc

In [None]:
ssc