## Overview of Output Modes

Let us get an overview of output modes supported by Spark Structured Streaming. 
* **Append mode (default)** - This is the default mode, where only the new rows added to the Result Table since the last trigger will be outputted to the sink. This is supported for only those queries where rows added to the Result Table is never going to change. Hence, this mode guarantees that each row will be output only once (assuming fault-tolerant sink). For example, queries with only `select`, `where`, `map`, `flatMap`, `filter`, `join`, etc. will support Append mode.
* **Complete mode** - The whole Result Table will be outputted to the sink after every trigger. This is supported for aggregation queries.
* **Update mode** - (Available since Spark 2.1.1) Only the rows in the Result Table that were updated since the last trigger will be outputted to the sink. More information to be added in future releases.

Launch Pyspark using below commands and run Spark Structured Streaming Code.

**Using Pyspark2**

```
export PYSPARK_PYTHON=python3

pyspark2 \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Pyspark3**

```
export PYSPARK_PYTHON=python3

pyspark3 \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

* Example to read and print on console every 5 seconds. It will print data it have received since the last run every time. This is typically used to select, filter, etc. It will throw error if aggregations are used.

```python
spark.conf.set('spark.sql.shuffle.partitions', '2')

import socket
hostname = socket.gethostname()

log_messages = spark. \
    readStream. \
    format("socket"). \
    option("host", hostname). \
    option("port", 9000). \
    load()

log_messages. \
    writeStream. \
    outputMode("append"). \
    format("console"). \
    option('truncate', 'false'). \
    trigger(processingTime='5 seconds'). \
    start()
```

* Same as above. If we do not specify `outputMode`, then it will be **append** by default.

```python
spark.conf.set('spark.sql.shuffle.partitions', '2')

import socket
hostname = socket.gethostname()

log_messages = spark. \
    readStream. \
    format("socket"). \
    option("host", hostname). \
    option("port", 9000). \
    load()

log_messages. \
    writeStream. \
    format("console"). \
    option('truncate', 'false'). \
    trigger(processingTime='5 seconds'). \
    start()
```

```python
spark.conf.set('spark.sql.shuffle.partitions', '2')

import socket
hostname = socket.gethostname()

log_messages = spark. \
    readStream. \
    format("socket"). \
    option("host", hostname). \
    option("port", 9000). \
    load()

from pyspark.sql.functions import split, count, lit

department_count = log_messages. \
    filter(split(split('value', ' ')[6], '/')[1] == 'department'). \
    select(split(split('value', ' ')[6], '/')[2].alias('department')). \
    groupBy('department'). \
    agg(count(lit(1)).alias('department_count'))

department_count. \
    writeStream. \
    outputMode("append"). \
    format("console"). \
    option('truncate', 'false'). \
    trigger(processingTime='5 seconds'). \
    start()
```

```python
import socket
hostname = socket.gethostname()

log_messages = spark. \
    readStream. \
    format("socket"). \
    option("host", hostname). \
    option("port", 9000). \
    load()

from pyspark.sql.functions import split, count, lit

department_count = log_messages. \
    filter(split(split('value', ' ')[6], '/')[1] == 'department'). \
    select(split(split('value', ' ')[6], '/')[2].alias('department')). \
    groupBy('department'). \
    agg(count(lit(1)).alias('department_count'))

department_count. \
    writeStream. \
    outputMode("complete"). \
    format("console"). \
    option('truncate', 'false'). \
    trigger(processingTime='5 seconds'). \
    start()
```

* Example for outputMode `update`

```python
import socket
hostname = socket.gethostname()

log_messages = spark. \
    readStream. \
    format("socket"). \
    option("host", hostname). \
    option("port", 9000). \
    load()

from pyspark.sql.functions import split, count, lit

department_count = log_messages. \
    filter(split(split('value', ' ')[6], '/')[1] == 'department'). \
    select(split(split('value', ' ')[6], '/')[2].alias('department')). \
    groupBy('department'). \
    agg(count(lit(1)).alias('department_count'))

department_count. \
    writeStream. \
    outputMode("update"). \
    format("console"). \
    option('truncate', 'false'). \
    trigger(processingTime='5 seconds'). \
    start()
```