# This is a demo with an example flow for the MINT transformation pipeline

This notebook presents an abstraction of the MINT Data Transformations architecture and walks thourgh an example through the pipeline we are suggesting; we show a simple CSV --(unit transformations)--> CSV flow.

## This is the semantic description of the interface (control plane) between all of the components in the pipeline (Reader/Writer/Transformation Adapters)


In [None]:
''' These are the general data types we want to support
in the interface between the different adapters. '''
enum ArgType:
    FilePath(Format),
    Graph(SemanticModel),
    NumpyArray(Schema),
    String,
    Number,
    Boolean,
    DateTime

''' Each transformation function represents a transformation
operation (or a transformation library) we would like to encode
(i.e. Unit-Transformation, GDAL, PIHM2Cycles, etc...) '''
class TransFunc:
    id: Str
    description: Str
    inputs: Dict[str, Optional[ArgType]];
    output: Dict[str, Optional[ArgType]];

    def __init__(self, config, data):
        pass

    def validate(self) -> bool:
        pass

    def exec(self) -> dict:
        pass


''' This class inherits the general transformation function class
and specifies the input and output interfaces of this 'Adapter'
(its specification over the 'control plane') '''
class UnitTransformation(TransFunc):
    id: Str = "unit_transformation"
    description: Str = ""
    inputs = {
        "conf": ArgType.FilePath,
        "source_unit": ArgType.String,
        "data": ArgType.Graph
    }
    outputs = {
        "data": ArgType.Graph
    }

    def __init__(self, config, data, target_unit, source_unit):
        pass

    def validate(self) -> bool:
        pass

    def exec(self) -> dict:
        pass

''' The pipeline instance is a collection of TransFunc instances
which eventually would be concatenated (in the 'control plane') '''
pipeline = [
    ReaderPDF,
    UnitTransformation
]
args = {
    "<id>_<index>_<arg_name>": <value>
}
args['unit_transformation_1_arg_name'] = None
pipeline.exec(args)

## Here's an example flow:

## Component 1: Reader Adapter

An instance of a reader adapter can be used as an entry point in the pipeline. It reads an input file `input.csv` file and a `input.yaml` file describing the D-REPR layout of this file. The data are representated in general way in a python object (Graph or NumPY array) and will be used in the next steps in the pipeline.

In [1]:
# {input.csv, input.yaml} --> python_graph_obj

## Component 2: Transformation Adapter

An instance of a transformation adapter does not materialize the data into an output, it just reproduces the data, transformaing its content (the actual data) and performing the needed configrations in the 'control plane' parameters.
Given the pythonic object (graph) representing the data and the rest of the needed configurations in the calling API ('control plane' parameters) we can re-construct the data in a new pythonic object (graph) and prepare it for the next steps in the pipeline.

In [None]:
# python_graph_obj --> python_graph_obj*

## Component 3: Writer Adapter

An instance of a writer adapter can be used as an exit point in the pipeline. It writes an output file `output.csv` based on a given `output.yaml` (D-REPR layout) and an additional configuration file `output.config`

In [1]:
# {python_graph_obj*, output.yaml, output.config} --> {output.csv}