# Create and Use Functions

Functions are the basic building blocks of MLRun - they are essentially a Python object that knows how to run locally or on a Kubernetes cluster. This section will cover how to create and customize an MLRun Function as well as common parameters across all functions.

**In this section:**
- [Functions Overview](#functions-overview)
- [Functions and Projects](#functions-and-projects)
- [Creating Functions](#creating-functions)
- [Customizing Functions](#customizing-functions)

<a id="functions-overview"></a>
## Functions Overview

MLRun Functions are used to run jobs, deploy models, create pipelines, and more. There are various kinds of MLRun functions with different capabilities, however, there are commonalities across all functions. In general, an MLRun function looks like the following:

![MLRun Function](../_static/images/mlrun_function_diagram.png)

You can read more about MLRun Functions [here](https://docs.mlrun.org/en/latest/runtimes/functions.html). Each parameter and capability will be explained in more detail in the following sections [Creating Functions](#creating-functions) and [Customizing Functions](#customizing-functions).

<a id="functions-and-projects"></a>
## Functions and Projects

The recommended way to create and run a function is via an [MLRun Project](https://docs.mlrun.org/en/latest/projects/project.html). Once you register a function within a project, you can execute it in your local environment or at scale on a Kubernetes cluster.

The relationship between [Functions](https://docs.mlrun.org/en/latest/projects/project.html), [Workflows](https://docs.mlrun.org/en/latest/projects/build-run-workflows-pipelines.html), and [Projects](https://docs.mlrun.org/en/latest/runtimes/functions.html) is as follows:

![MLRun Function](../_static/images/mlrun_concepts_architecture.png)

Once MLRun Functions and Workflows are created and registered into the Project, they are invoked using the Project object. This workflow pairs especially well with [Git](https://docs.mlrun.org/en/latest/projects/use-git-manage-projects.html) and [CI/CD](https://docs.mlrun.org/en/latest/projects/ci-integration.html) integration.

<a id="creating-functions"></a>
## Creating Functions

The recommended way to create an MLRun function is using an MLRun project (discussed in more detail in the [Using set_function](#using-set_function) section). The general flow looks like the following:

```python
project = mlrun.get_or_create_project(...)

fn = project.set_function(...)
```

When creating a function, there are 3 main scenarios:
1. [Single Source File](#single-source-file) - when your code can be contained into a single file
2. [Multiple Source Files](#multiple-source-files) - when your code requires additional files or dependencies
3. [Import Existing Function](#import-existing-function) - when your function already exists elsewhere and you just want to import it

> **Note:** Using the `set_function` method of an MLRun project allows for each of these scenarios in a transparent way. Depending on the source passed in, the project will register the function using some lower level functions. For specific use cases, you also have access to the lower level functions [new_function](https://docs.mlrun.org/en/latest/api/mlrun.run.html?highlight=new_function#mlrun.run.new_function), [code_to_function](https://docs.mlrun.org/en/latest/api/mlrun.run.html?highlight=new_function#mlrun.run.code_to_function), and [import_function](https://docs.mlrun.org/en/latest/api/mlrun.run.html?highlight=set_function#mlrun.run.import_function).

<a id="using-set_function"></a>
### Using set_function

The MLRun project object has a method called `set_function` which is a one-size-fits-all way of creating an MLRun function. This method accepts a variety of sources including Python files, Jupyter notebooks, Git repos, and more.
> Note: The return value of `set_function` will be your MLRun function. You can immediately run it or apply additional configuration like resources, scaling, etc. See [Customizing Functions](#customizing-functions) for more details.

When using `set_function`, there are a number of common parameters across all function types and creation scenarios. Consider the following example:

```python
fn = project.set_function(
    name="my-function",
    tag="latest",
    func="my_function.py",
    image="mlrun/mlrun",
    kind="job",
    handler="train_model",
    requirements=["pandas==1.3.5", "numpy==1.21.6"],
    with_repo=True
)
```

#### Name
`name` specifies the name of your MLRun function within the given project. This will be displayed in the MLRun UI as well as the Kubernetes pod.

#### Tag
`tag` specifies a tag for your function (much like a Docker image). Ommiting this parameter will default to `latest`. This parameter can only be used for `.py/.ipynb` files.

#### Func
`func` specifies what to run with the MLRun function. This can be a number of things including:
- Files (`.py`, `.ipynb`, `.yaml`, etc.)
- URIs (`hub://` prefixed function marketplace URI, `db://` prefixed MLRun DB URI)
- Existing MLRun Function objects
- `None` (for current `.ipynb` file)

#### Image
`image` specifies the Docker image to use when containerizing the piece of code. If you also specify the `requirements` parameter to build a new Docker image, the `image` parameter will be used as the base image

The standard MLRun images are:
- `mlrun/mlrun`: Suits most lightweight components (includes `sklearn`, `pandas`, `numpy` and more)
- `mlrun/ml-models`: Suits most CPU ML/DL workloads (includes `Tensorflow`, `Keras`, `PyTorch` and more)
- `mlrun/ml-models-gpu`: Suits most GPU ML/DL workloads (includes GPU `Tensorflow`, `Keras`, `PyTorch` and more )

Dockerfiles for the MLRun images can be found [here](https://github.com/mlrun/mlrun/tree/development/dockerfiles).

#### Kind
`kind` specifies which runtime the MLRun function used. This can be a number of things including:
- Batch runtimes (`job`, `spark`, `remote-spark`, `dask`, `mpijob`)
- Real-time runtimes (`nuclio`, and `serving`)

See more details on kinds of functions [here](https://docs.mlrun.org/en/latest/concepts/functions-overview.html).

#### Handler
`handler` specifies the default function handler to invoke (e.g. a Python function within your script). This paramater can only be used for `.py/.ipynb` files
> **Note:** The handler can also be overriden when executing the function

#### Requirements
`requirements` specifies any additional Python dependencies needed for the function to run. Using this parameter will result in a new Docker image (using the `image` parameter as a base image). This can be:
- A list of Python dependencies
- Path to a `requirements.txt` file

#### With Repo
`with_repo` specifies whether a function requires additional files or dependencies within a Git repo or archive file. This Git repo or archive file is specified on a project level via `project.set_source(...)`, which the function consumes. If this parameter is ommited, the default is `False`.
> Note: When using `with_repo`, the contents of the Git repo or archive are available in the current working directory of your MLRun function during runtime.

<a id="single-source-file"></a>
### Single Source File

The simplest way to create a function is to use a single file as the source. The code itself is embedded into the MLRun function object. This makes the function quite portable as it does not depend on any external files. You can use any source file supported by MLRun such as Python or Jupyter notebook.

#### Python

This is the simplest way to create a function out of a given piece of code - simply pass in the path to the Python file *relative to your project context directory*

```python
fn = project.set_function(
    name="python",
    func="job.py", 
    kind="job",
    image="mlrun/mlrun",
    handler="handler"
)
```

#### Jupyter Notebook

This is a great way to create a function out of a Jupyter notebook  - simply pass in the path to the Jupyter notebook  *relative to your project context directory*. You can use [MLRun cell tags](https://docs.mlrun.org/en/latest/runtimes/mlrun_code_annotations.html) to specify which parts of the notebook should be included in the function. 
> **Note:** To ensure that the latest changes are included, make sure you save your notebook before creating/updating the function

You can also create an MLRun function out of the current Jupyter notebook you are running in. To do this, simply ommit the `func` parameter in `set_function`. 

<a id="multiple-source-files"></a>
### Multiple Source Files

If your code requires additional files or external libraries, you will need to use a source that supports multiple files such as Git, an archive (zip/tar/etc.), or V3IO file share. This approach (especially using a Git repo) pairs well with MLRun Projects.

To do this, you must:
- Provide `with_repo=True` when creating your function via `project.set_function(...)`
- Set project source via `project.set_source(source=...)`

This instructs MLRun to load source code from the git repo/archive/file share associated with the project. There are two ways to load these additional files:

#### Static
The function is built once. *This is the preferred approach for production workloads*:
```python
project.set_source(source="git://github.com/mlrun/project-archive.git")

fn = project.set_function(
    name="myjob", handler="job_func.job_handler",
    image="mlrun/mlrun", kind="job", with_repo=True,
)

project.build_function(fn)
```

#### Runtime
The function pulls the source code at runtime. *This is a simpler approach during development that allows for making code changes without re-building the image each time*:

```python
project.set_source(
    source="https://s3.us-east-1.wasabisys.com/iguazio/project-archive/project-archive.zip",
    pull_at_runtime=True
)

fn = project.set_function(
    name="nuclio", handler="nuclio_func:nuclio_handler",
    image="mlrun/mlrun", kind="nuclio", with_repo=True,
)
```

<a id="import-existing-function"></a>
### Import Existing Function

If you already have an MLRun function that you would like to import, you can do so from multiple locations such as YAML, function marketplace, and MLRun DB.

#### YAML

MLRun functions can be exported to YAML files via `fn.export()`. These YAML files can then be imported via the following:

```python
fn = project.set_function(name="import", func="function.yaml")
```

#### Function Marketplace

Functions can also be imported from the [MLRun Function Marketplace](https://www.mlrun.org/marketplace) - simply import via the name of the function and the `hub://` prefix:
> Note: By default, the `hub://` prefix points to the official marketplace, however you can also substitute your own repo to create your own marketplace.

```python
fn = project.set_function(name="describe", func="hub://describe")
```

#### MLRun DB

Finally, you can also import functions directly from the MLRun DB. These might be functions that have not been pushed to a git repo, archive, or function marketplace. Import via the name of the function and the `db://` prefix:

```python
fn = project.set_function(name="db", func="db://import")
```

<a id="customizing-functions"></a>
## Customizing Functions

Once you have created your MLRun function, there are many customizations you can add. Some potential customizations include:

#### Environment Variables
Environment variables can be added individually, from a Python dictionary, or a file

```python
# Single variable
fn.set_env(name="MY_ENV", value="MY_VAL")

# Multiple variables
fn.set_envs(env_vars={"MY_ENV" : "MY_VAL", "SECOND_ENV" : "SECOND_VAL"})

# Multiple variables from file
fn.set_envs(file_path="env.txt")
```

#### Memory, CPU, GPU Resources
Adding requests and limits to your function will specify what compute resources are required. It is best practice to define this for each MLRun function

```python
# Requests - lower bound
fn.with_requests(mem="1G", cpu=1)

# Limits - upper bound
fn.with_limits(mem="2G", cpu=2, gpus=1)
```

Additional information can be found [here](https://docs.mlrun.org/en/latest/runtimes/configuring-job-resources.html#cpu-gpu-and-memory-limits-for-user-jobs).

#### Scaling and Auto-Scaling
Scaling behavior can be added to real-time and distributed runtimes including `nuclio`, `serving`, `spark`, `dask`, and `mpijob` 

```python
# Nuclio/serving scaling
fn.spec.replicas = 2
fn.spec.min_replicas = 1
fn.spec.min_replicas = 4
```

Additional information can be found [here](https://docs.mlrun.org/en/latest/runtimes/configuring-job-resources.html#replicas)

#### Mount Persistent Storage
In some instances, you may need to mount a file-system to your container to persist data. This can be done with native K8s PVC's or the V3IO data layer for Iguazio clusters

```python
# Mount persistent storage - V3IO
fn.apply(mlrun.mount_v3io())

# Mount persistent storage - PVC
fn.apply(mlrun.platforms.mount_pvc(pvc_name="data-claim", volume_name="data", volume_mount_path="/data"))
```

Additional information can be found [here](https://docs.mlrun.org/en/latest/runtimes/function-storage.html)

#### Node Selection
Node selection can be used to specify where to run workloads (e.g. specific node groups, instance types, etc.)

```python
fn.with_node_selection(node_selector={"app.iguazio.com/lifecycle" : "non-preemptible"})
```

Additional information can be found [here](https://docs.mlrun.org/en/latest/concepts/node-affinity.html)