# Passing & logging objects using hints

This tutorial is an example of passing data and logging it without using the decorator. This means that you can take usable 
code from outside the MLRun paradigm, and just create an MLRun function for it. 

You can pass log hints - indicating how to log the returning values from a handler. The log hints are passed 
via the `returns` parameter in the `run` method. A log hint can be passed as a string or a dictionary.<br>
You  can pass type hints into the `inputs` parameter of the run method with the structure: `<key> : <type_hint>`

When you run a handler of the function, it passes the `returns` parameter to the `run` method. 
Your data can be anywhere (S3, Google cloud, etc.).

The typical flow, presented is this section, includes:

1. [Generate data for the demo](#section1)
2. [Write the demo functions](#section2)
3. [Set the MLRun project and function](#section3)
4. [Pass the data to the MLRun function](#section4)
5. [Log the data and result artifacts to MLRun](#section5)

___
<a id="section1"></a>
## Generate data for the demo

In [1]:
import os

# Set the location of the generated parquet data file (make sure you have write permissions):
DATA_PATH = os.path.abspath("./data.parquet")

Some very basic code for generating a `pd.DataFrame` and saving it to a parquet file:

In [2]:
import numpy as np
import pandas as pd


def generate_data(n_features: int, n_samples: int, output_path: str):
    data = np.random.random(size=(n_samples, n_features))
    columns = [f"feature_{i}" for i in np.arange(n_features)]
    
    df = pd.DataFrame(data=data, columns=columns)
    
    df.to_parquet(path=output_path)

### Generate data

Use the function above to generate data with 10 features and 1000 samples:

In [3]:
generate_data(
    n_features=10, 
    n_samples=1000,
    output_path=DATA_PATH,
)
assert os.path.exists(DATA_PATH)

<a id="section2"></a>
## Write the demo functions

This section shows one MLRun function with multiple handlers. It can be located in a separate .py file.

In [4]:
# mlrun: start-code

In [5]:
from typing import Tuple

import numpy as np
import pandas as pd

import mlrun


def pass_as_param(data_path: str):
    assert isinstance(data_path, str)
    
    df = pd.read_parquet(data_path)
    
    print(f"Sum: {df.sum().sum()}")

    
def pass_as_input(data_path: mlrun.DataItem):
    assert isinstance(data_path, mlrun.DataItem)
    
    df = data_path.as_df()
    
    print(f"Sum: {df.sum().sum()}")
    

def pass_as_input_using_type_hint(df: pd.DataFrame):
    assert isinstance(df, pd.DataFrame)
    
    print(f"Sum: {df.sum().sum()}")


def log_with_context(context: mlrun.MLClientCtx, df: pd.DataFrame):
    df = df + np.ones(shape=df.shape)
    s = df.sum().sum()
    
    context.log_dataset(key="context_data", df=df)
    context.log_result(key="context_result", value=s)
    

@mlrun.handler(outputs=["decorator_data: dataset", "decorator_result: result"])
def log_with_decorator(df: pd.DataFrame) -> Tuple[pd.DataFrame, float]:
    df = df + np.ones(shape=df.shape)
    s = df.sum().sum()
    
    return df, s


def log_with_returns(df: pd.DataFrame) -> Tuple[pd.DataFrame, float]:
    df = df + np.ones(shape=df.shape)
    s = df.sum().sum()
    
    return df, s

In [6]:
# mlrun: end-code

___
<a id="section3"></a>
## Set the MLRun project and function

In [7]:
import mlrun

### Create the project

This loads the project if it already exists, otherwise it creates the project:

In [8]:
project = mlrun.get_or_create_project(name="passing-and-logging-with-mlrun", user_project=True)

> 2023-03-07 13:00:12,235 [info] loaded project passing-and-logging-with-mlrun from MLRun DB


### Create the MLRun function

Now, create an MLRun function using the project's `set_function` method:

```{admonition} Note
`set_function` intentionally does not get the `func` keyword argument. As a result, it looks in the current notebook and parses it. Only the code between the cells with the comments `# mlrun: start-code` and `# mlrun: end-code` are parsed.
```

In [9]:
notebook_functions = project.set_function(name="notebook_functions", kind="job", image="mlrun/mlrun")

<a id="section4"></a>
## Pass the data as input using `type_hint`

Data is passed using `inputs` (and not using `params`) in the `run` method of the MLRun Function. 
`inputs` are used for getting local or remote data as `mlrun.DataItem`. Using data items reduces the glue logic required 
for getting data from remote files (like files located in S3) and local files, as well as logged artifacts. It has many 
methods like `local` (download the data item's file to a local temp directory) and `as_df` (parse the data to a 
`pd.DataFrame`). You can read more [here](../store/data-items.html).

You can see the inputs passed under and **inputs** in the run summary table.

```{admonition} Note
Using type hints in the handler's code like in `pass_as_input_using_type_hint` automatically makes MLRun parse the 
`mlrun.DataItem` to the type hinted in the function's header. If the type is not supported in MLRun, the data item 
remains as is, by default.
```

``` {Admonition} Tip
Recommended means for passing data to the MLRun function:
1. Using `inputs` with type hints for most use cases.
2. If the type is not supported in MLRun, use `mlrun.DataItem`. 
3. If MLRun does not support the remote file location, use a regular parameter. 
```

Pass the data as input using `type_hint`:

```python
def pass_as_input_using_type_hint(df: pd.DataFrame):
```

Because **there is a type hint**, MLRun parses it automatically to the type hinted. If the type hinted is not known to MLRun, `mlrun.DataItem` is placed instead.

In [12]:
pass_as_input_using_type_hint_run = notebook_functions.run(
    handler="pass_as_input_using_type_hint",
    inputs={"df": DATA_PATH},
    local=True,
)

> 2023-03-07 13:00:34,580 [info] starting run notebook-functions-pass_as_input_using_type_hint uid=ece9cda8156f40889a9d5bd7c906d50b DB=http://mlrun-api:8080
Sum: 5020.160244750246


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
passing-and-logging-with-mlrun-guyl,...c906d50b,0,Mar 07 13:00:34,completed,notebook-functions-pass_as_input_using_type_hint,v3io_user=guylkind=owner=guylhost=jupyter-guyl-65dfdbf79-tm98j,df,,,





> 2023-03-07 13:00:34,822 [info] run executed, status=completed


<a id="section5"></a>
## Log data and result artifacts to MLRun

Log data using the `returns` keyword argument in `run`.

**Returns** is a keyword argument in the `run` method of a MLRun function. It makes the logging mechanism generic and 
dynamic. And MLRun does not even need to be imported in the user's code in order to log the returning values to MLRun. 

The `returns` uses the decorator under the hood; its value is expected to be the same as the `outputs` of the decorator: 
a list of log-hints.

You can see all the artifacts and results logged under **artifacts** and **results** run summary table.

``` {Admonition} Note
The length of the log-hints list must be equal to the number of returning values. If the function returns 3 objects `a, b, 
c` then the log hint list must be with 3 log hints (for example: `["a", "b: dataset", "c: result"]`.
```

``` {Admonition} Tip
Recommended means for logging data and results artifacts:
1. `returns` &mdash; it is the most generic and doesn't involve changing your code.
2. Use the decorator for further functionalities (for debugging, registering labels and more).
3. Use the context if the configurations set in the default loggers of the decorator are not enough.
```

The code does not need any editing. Notice the `returns` argument used in the next cell:

```python
log_with_returns_run = notebook_functions.run(
    ...
    returns=["returns_data", "returns_result"],
```

Since the default is fine, there is no artifact type in the log-hints (no `:` to be found).

In [15]:
log_with_returns_run = notebook_functions.run(
    handler="log_with_returns",
    inputs={"df": DATA_PATH},
    returns=["returns_data", "returns_result"],
    local=True,
)

> 2023-03-07 13:00:35,771 [info] starting run notebook-functions-log_with_returns uid=d2afb7eabb3c4d20a6671a1078b158db DB=http://mlrun-api:8080


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
passing-and-logging-with-mlrun-guyl,...78b158db,0,Mar 07 13:00:35,completed,notebook-functions-log_with_returns,v3io_user=guylkind=owner=guylhost=jupyter-guyl-65dfdbf79-tm98j,df,,returns_result=15020.160244750246,reutrns_data





> 2023-03-07 13:00:36,173 [info] run executed, status=completed
