# Create and Use Functions

## About Functions

Functions are the basic building blocks of MLRun - they will be used to run jobs, deploy models, create pipelines, and more. You can read more about MLRun Functions [here](https://docs.mlrun.org/en/latest/runtimes/functions.html).

### Common Parameters

The MLRun function is essentially a Python object that knows how to run locally or on a Kubernetes cluster. There are various kinds of MLRun functions with different capabilities, however, there are commonalities across all functions.

In general, an MLRun function looks like the following:

![MLRun Function](../_static/images/mlrun_function_diagram.png)

Additionally, there are various parameters that are common across all functions including:
- **Name:** Name of the function (displayed in the UI and Kubernetes pod)
- **Tag:** Version of the function (can be manual or automatic)
- **Code:** What is actually being run (Python file, Jupyter notebook, Git repo, etc.)
- **Runtime:** How the code is being run (Job, Spark, Dask, Nuclio, etc.)
- **Image:** Docker image (or list of commands for building a new image)
- **Resources:** Memory, CPU, GPU resources. Also scaling/auto-scaling settings for supported runtimes (e.g. Nuclio, Serving, Horovod, etc.)
- **With Repo:** Whether the function depends on the MLRun project for additional files (e.g. Git repo, archive file, etc.)

### Functions and Projects

The recommended way to create and run a function in MLRun is via a project. Once you register a function within a project, you can execute it in your local environment or at scale on a Kubernetes cluster.

![MLRun Function](../_static/images/mlrun_concepts_architecture.png)

An MLRun project object has a method called `set_function` which is a one-size-fits-all way of creating an MLRun function. The `set_function` method accepts a variety of sources including Python files, Jupyter notebooks, Git repos, and more.

Additionally, the `set_function` method has a parameter called `with_repo` which denotes whether a function requires additional files or dependencies such as a Git repo or archive file. This Git repo or archive file is specified on a project level which the function will consume.
> Note: When using `with_repo`, the contents of the Git repo or archive file will be available in the current working directory of your MLRun function during runtime 

## Creating a Function

There are 3 main scenarios for creating a function:
1. [Single Source File](#single-source-file) - when your code can be contained into a single file
2. [Multiple Source Files](#multiple-source-files) - when your code requires additional files or dependencies
3. [Import Existing Function](#import-existing-function) - when your function already exists elsewhere and you just want to import it

Using the `set_function` method of an MLRun project allows for each of these scenarios in a transparent way. Depending on the source passed in, the project will register the function using some lower level functions.
> Note: For specific use cases, you also have access to these lower level functions. These will be discussed in more detail at a later point

When creating a new function, consider the following flow:

![Create function flow](../_static/images/mlrun_function_create_flow.png)

<a id="single-source-file"></a>
### Single Source File (Python/Notebook)

The simplest way to create a function is to use a single file as the source. The code itself will be embeded into the MLRun function object. This makes the function quite portable as it does not depend on any external files. You can use any source file supported by MLRun such as Python or Jupyter notebook.

> Note: The return value of `set_function` will be your MLRun function. You can immediately run it or apply additional configuration like resources, scaling, etc.

```python
import mlrun

single_source_project = mlrun.new_project(name="single-source-project", context="./single_source_project", overwrite=True)
```

#### Python

This is the simplest way to create a function out of a given piece of code - simply pass in the path to the Python file *relative to your project context directory*

```python
python_fn = single_source_project.set_function(
    name="python",
    func="job.py", 
    kind="job",
    image="mlrun/mlrun",
    handler="handler"
)

run = single_source_project.run_function(python_fn)
```

#### Jupyter Notebook

This is a great way to create a function out of a Jupyter notebook  - simply pass in the path to the Jupyter notebook  *relative to your project context directory*. You can use [MLRun cell tags](#) to specify which parts of the notebook should be included in the function. 
> Note: To ensure that the latest changes are included, make sure you save your notebook before creating/updating the function

```python
notebook_fn = single_source_project.set_function(
    name="notebook", 
    func="nb.ipynb", 
    kind="job", 
    image="mlrun/mlrun", 
    handler="handler"
)

run = single_source_project.run_function(notebook_fn)
```

#### Current Notebook

You can also create an MLRun function out of the current Jupyter notebook you are running in. To do this, simply ommit the `func` parameter in `set_function`. You can use [MLRun cell tags](#) to specify which parts of the notebook should be included in the function. 
> Note: To ensure that the latest changes are included, make sure you save your notebook before creating/updating the function

```python
notebook_fn = single_source_project.set_function(
    name="notebook", 
    kind="job", 
    image="mlrun/mlrun", 
    handler="handler"
)

run = single_source_project.run_function(notebook_fn)
```

<a id="multiple-source-files"></a>
### Multiple Source Files (Git/Archive)

If your code requires additional files or external libraries, you will need to use a source that supports multiple files such as Git, an archive (zip/tar/etc.), or V3IO file share. This approach (especially using a Git repo) pairs well with MLRun Projects.

With this approach, you must provide `with_repo=True` into `set_function`. This instructs MLRun to load source code from the git repo/archive/file share associated with the project.

There are two options to load the code:
1. **Static**: The function is built once. *This is the preferred approach for production workloads*
> Build the function once via `mlrun.build_function(fn)`
2. **Runtime**: The function will pull the source code at runtime. *This is is a simpler approach during development that allows for making code changes without re-building the image each time*
> Enable this feature via `project.set_source(source=None, pull_at_runtime=True)`

#### Git Repo

The Git repo is provided when creating the project like so:

```python
git_project = mlrun.load_project(
    name="git-project",
    context="./git_project",
    url="git://github.com/mlrun/project-archive",
    clone=True
)

git_project.set_source(source=None, pull_at_runtime=True)
```

Then, you can use `set_function` with `with_repo=True` create the function. In this case, there is a Python file called `job_func.py` in the root directory of the git repo:

```python
git_fn = git_project.set_function(
    name="git-function",
    func="./job_func.py",
    kind="job",
    image="mlrun/mlrun",
    handler="job_handler",
    with_repo=True
)

run = git_project.run_function(git_fn)
```

#### Archive (.zip/.tar)

Similar to the Git based approach, the archive is supplied when creating the project:

```python
archive_project = mlrun.load_project(
    name="archive-project",
    context="./archive_project",
    url="https://s3.us-east-1.wasabisys.com/iguazio/project-archive/project-archive.zip",
    clone=True
)

archive_project.set_source(source=None, pull_at_runtime=True)
```

Then, you can use `set_function` with `with_repo=True` create the function.
> Note: In this case, there is a Python file called `job_func.py` in the root directory of the archive:

```python
archive_fn = git_project.set_function(
    name="archive-function",
    func="./job_func.py",
    kind="job",
    image="mlrun/mlrun",
    handler="job_handler",
    with_repo=True
)

run = archive_project.run_function(archive_fn)
```

<a id="import-existing-function"></a>
### Import Existing Function (YAML, Function Marketplace, MLRun DB)

If you already have an MLRun function that you would like to import, you can do so from multiple locations such as YAML, function marketplace, and MLRun DB.

```python
import_project = mlrun.new_project(name="import-project", context="./import_project", overwrite=True)
```

#### YAML

MLRun functions can be exported to YAML files via `fn.export()`. These YAML files can then be imported via the following:

```python
yaml_fn = import_project.set_function(name="import", func="function.yaml")

run = import_project.run_function(yaml_fn)
```

#### Function Marketplace

Functions can also be imported from the [MLRun Function Marketplace](https://www.mlrun.org/marketplace) - simply import via the name of the function and the `hub://` prefix:
> Note: By default, the `hub://` prefix will point to the official marketplace, however you can also substitute your own repo to create your own marketplace

```python
marketplace_fn = import_project.set_function(name="describe", func="hub://describe")

run = import_project.run_function(
    function=marketplace_fn,
    inputs={
        "table" : "https://s3.us-east-1.wasabisys.com/iguazio/data/iris/iris_dataset.csv"
    }
)
```

#### MLRun DB

Finally, you can also import functions directly from the MLRun DB. These might be functions that have not been pushed to a git repo, archive, or function marketplace. Import via the name of the function and the `db://` prefix:

```python
mlrun_db_fn = import_project.set_function(name="db", func="db://import")

run = import_project.run_function(mlrun_db_fn)
```