# Core 4.1 Batch Pipelines - Setup Pipeline

In this section, we will show how to write a batch pipeline so that it can be executed via an MLRun Project.

---

### References

Much of the following content is derived from the official documenation:
- [Project workflows and automation](https://docs.mlrun.org/en/latest/projects/workflows.html)
- [Projects](https://docs.mlrun.org/en/latest/projects/project.html)
- [Functions](https://docs.mlrun.org/en/latest/runtimes/functions.html)

---

### Brining It All Together

In previous sections, we covered MLRun Projects and Functions. Now, we will bring them together by creating a batch pipeline. This will allow us to use the MLRun Project to execute several Functions in a DAG using the Python SDK or CLI.

---

### Example Overview

In this example, we will create a project with 3 MLRun functions and a single pipeline that orchestrates them.

---

### Setup

In this example, we will create a project with 3 MLRun functions and a single pipeline that orchestrates them. The pipeline steps will be the following:
- `get-data` - Get iris data from sklearn
- `train-model` - Train model via sklearn
- `deploy-model` - Deploy model to HTTP endpoint

In [1]:
import mlrun

project = mlrun.get_or_create_project("iguazio-academy", context="./")

> 2022-06-23 19:45:53,179 [info] loaded project iguazio-academy from MLRun DB


---

### Add Functions to Project

We have prepared the three pipeline steps outlined above: `get-data`, `train-model`, and `deploy-model`. Like we saw in [1.3 MLRun Projects](../1_mlrun_basics/1.3_mlrun_projects.ipynb), we can add the functions to a project like so:

In [22]:
project.set_function(name='get-data', func='functions/get_data.py', kind='job', image='mlrun/mlrun')
project.set_function(name='train-model', func='functions/train.py', kind='job', image='mlrun/mlrun')
project.set_function(name='deploy-model', func='hub://v2_model_server')

<mlrun.runtimes.serving.ServingRuntime at 0x7fa3ff06ed10>

---

### Write Pipeline

Next, we will define the pipeline that orchestrates the three comoponents. This pipeline will be very simple, however you can create very complex pipelines with branches, conditions, and more.

In [23]:
%%writefile pipelines/training_pipeline.py
from kfp import dsl
import mlrun

@dsl.pipeline(
    name="batch-pipeline-academy",
    description="Example of batch pipeline for Iguazio Academy"
)
def pipeline(label_column: str, test_size=0.2):
    
    # Ingest the data set
    ingest = mlrun.run_function(
        'get-data',
        handler='prep_data',
        params={'label_column': label_column},
        outputs=["iris_dataset"]
    )
    
    # Train a model   
    train = mlrun.run_function(
        "train-model",
        handler="train_model",
        inputs={"dataset": ingest.outputs["iris_dataset"]},
        params={
            "label_column": label_column,
            "test_size" : test_size
        },
        outputs=['model']
    )
    
    # Deploy the model as a serverless function
    deploy = mlrun.deploy_function(
        "deploy-model",
        models=[{"key": "model", "model_path": train.outputs["model"]}]
    )


Overwriting pipelines/training_pipeline.py


---

### Add Pipeline to Project

Once our pipeline is defined, we can add it to our project just like we saw in [1.3 MLRun Projects](../1_mlrun_basics/1.3_mlrun_projects.ipynb):

In [24]:
project.set_workflow(name='train', workflow_path="pipelines/training_pipeline.py")
project.save()

In the next sections we will execute the pipeline via Kubeflow Pipelines and in a local context

---