# Output Transformer (Advanced)

`OutputTransfomer` is in general similar to `Transformer`. And any `Transformer` can be used as `OutputTransformer`. It is important to understand the difference between the operations `transform` and `out_transform`

* `transform` is lazy, Fugue does not ensure the compute immediately. For example, if using `SparkExecutionEngine`, the real compute of `transform` happens only when hitting an action, for example `save`.
* `out_transform` is an action, Fugue ensures the compute happening immediately, regardless of what execution engine is used.
* `transform` outputs a transformed dataframe for the following steps to use
* `out_transform` is the last compute of a branch in the DAG, it outputs nothing.

You may find that `transform().persist()` can be an alternative to `out_transform`, it's in general ok, but you must notice that, the output dataframe of a transformation can be very large, if you persist or checkpoint it, it can take up great portion of memory or disk space. In contrast, `out_transform` does not take any space. Plus, it is a more explicit way to show what you want to do.

A typical use case of output_transform is to save the dataframe in a custom way, for example, pushing to redis.


## Native Approach

In [1]:
from typing import Iterable, Dict, Any, List
import pandas as pd

# schema: *
def push_to_redis_1(df:Iterable[Dict[str,Any]]) -> Iterable[Dict[str,Any]]:
    for row in df:
        print("pushing1", row)
        yield row

def push_to_redis_2(df:Iterable[Dict[str,Any]]) -> None:
    for row in df:
        print("pushing2", row)
        continue

In [2]:
from fugue import FugueWorkflow

with FugueWorkflow() as dag:
    df = dag.df([[0,1],[0,2],[1,3],[1,1]],"a:int,b:int")
    # push_to_redis_1 is a typical transformer, it can be used directly
    # even the output is an iterable, it's guaranteed to go through the entire iteration
    # even push_to_redis_1 has no schema hint, it's still ok to be used by out_transform
    df.out_transform(push_to_redis_1)
    df.partition(by=["a"], presort="b").out_transform(push_to_redis_1)
    # push_to_redis_2 returns nothing, and you can directly use it, without any additional hint
    df.out_transform(push_to_redis_2)
    df.partition(by=["a"], presort="b").out_transform(push_to_redis_2)

pushing1 {'a': 0, 'b': 1}
pushing1 {'a': 0, 'b': 2}
pushing1 {'a': 1, 'b': 3}
pushing1 {'a': 1, 'b': 1}
pushing1 {'a': 0, 'b': 1}
pushing1 {'a': 0, 'b': 2}
pushing1 {'a': 1, 'b': 1}
pushing1 {'a': 1, 'b': 3}
pushing2 {'a': 0, 'b': 1}
pushing2 {'a': 0, 'b': 2}
pushing2 {'a': 1, 'b': 3}
pushing2 {'a': 1, 'b': 1}
pushing2 {'a': 0, 'b': 1}
pushing2 {'a': 0, 'b': 2}
pushing2 {'a': 1, 'b': 1}
pushing2 {'a': 1, 'b': 3}


## Decorator Approach

There is no obvious advantage to use decorator for `OutputTransformer`

In [3]:
from typing import Iterable, Dict, Any, List
import pandas as pd

from fugue.extensions import output_transformer, transformer
from fugue import FugueWorkflow

@transformer("*")
def push_to_redis_1(df:Iterable[Dict[str,Any]]) -> Iterable[Dict[str,Any]]:
    for row in df:
        print("pushing1", row)
        yield row

@output_transformer()
def push_to_redis_2(df:Iterable[Dict[str,Any]]) -> None:
    for row in df:
        print("pushing2", row)
        continue
        
with FugueWorkflow() as dag:
    df = dag.df([[0,1],[0,2],[1,3],[1,1]],"a:int,b:int")
    df.out_transform(push_to_redis_1)
    df.partition(by=["a"], presort="b").out_transform(push_to_redis_1)
    df.out_transform(push_to_redis_2)
    df.partition(by=["a"], presort="b").out_transform(push_to_redis_2)

pushing1 {'a': 0, 'b': 1}
pushing1 {'a': 0, 'b': 2}
pushing1 {'a': 1, 'b': 3}
pushing1 {'a': 1, 'b': 1}
pushing1 {'a': 0, 'b': 1}
pushing1 {'a': 0, 'b': 2}
pushing1 {'a': 1, 'b': 1}
pushing1 {'a': 1, 'b': 3}
pushing2 {'a': 0, 'b': 1}
pushing2 {'a': 0, 'b': 2}
pushing2 {'a': 1, 'b': 3}
pushing2 {'a': 1, 'b': 1}
pushing2 {'a': 0, 'b': 1}
pushing2 {'a': 0, 'b': 2}
pushing2 {'a': 1, 'b': 1}
pushing2 {'a': 1, 'b': 3}


## Interface Approach

Just like the interface approach of `Transformer`, you get all the flexibilities and control over your transformation

In [4]:
from typing import Iterable, Dict, Any, List
import pandas as pd

from fugue.extensions import Transformer, OutputTransformer
from fugue import FugueWorkflow

class Push1(Transformer):
    def get_output_schema(self, df):
        return df.schema
    
    def transform(self, df):
        print("pushing1", self.cursor.key_value_dict)
        return df
    
    
class Push2(OutputTransformer):
    # Notice OutputTransformer has different interface
    def process(self, df):
        print("pushing2", self.cursor.key_value_dict)
        
with FugueWorkflow() as dag:
    df = dag.df([[0,1],[0,2],[1,3],[1,1]],"a:int,b:int")
    df.out_transform(Push1)
    df.partition(by=["a"], presort="b").out_transform(Push1)
    df.out_transform(Push2)
    df.partition(by=["a"], presort="b").out_transform(Push2)  

pushing1 {}
pushing1 {'a': 0}
pushing1 {'a': 1}
pushing2 {}
pushing2 {'a': 0}
pushing2 {'a': 1}
