Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Package] Implement mlrun.package #3333

Merged
merged 36 commits into from Jun 14, 2023
Merged

[Package] Implement mlrun.package #3333

merged 36 commits into from Jun 14, 2023

Conversation

guy1992l
Copy link
Member

@guy1992l guy1992l commented Mar 26, 2023

ML-3474

Summary

Implementation of mlrun.package - a sub-package in MLRun for packing returning outputs, logging them to MLRun and unpacking inputs, parsing data items to their required type.

Motivation

  • Full auto experiment tracking and reproducibility.
    • Each function and step in a workflow is logged.
    • Pipelines can be reran from a chosen step with full reproducibility as each step is logged automatically.
  • Better and faster learning curve to use MLRun
    • No need to know about Artifacts, DataItems and context (MLClientCtx).
    • MLRun is integrated into existing code without changing it, making passing objects between runtimes a transparent pythonic experience.
  • Uniformity of artifacts between users and projects
    • Users can pass artifacts and use them as input in every function without editing it.
    • Artifacts can be imported and exported between projects.

Main Features

Passing objects between runtimes

In order to log artifacts and result, users will require to simply return them from the handler. As most function based code is doing so, using MLRun won't require further changes to your code, only to specify log hints in the returns argument of run and the outputs will be logged according to them.

Parsing inputs won't require any special handling for getting the artifact into it's original type. Users will simply need to add type hints to their inputs and MLRun will take care of the rest.

Moreover, if a packaged input is passed, then a type hint is not required as each package is labeled by the manager with the original type, artifact type and packager of it.

Implementation of a DefaultPackager

Users are welcome to implement the Packager abstract class but a much more convenient and flexible implementation is added via the DefaultPackager. This class has two adventages:

  • It is implementing all the required abstract methods of Packager and add a simple logic that should be used by users to implement their own packagers with an easier and more automatic experience.
  • It is used as the default packager (hence it's name) and will pickle objects that does not have a builtin packager in MLRun, making the mechanism more robust.

Log arbitrary number of objects

Users can specify in the log hint key a prefix of python unpacking operator:

  • "*" for a list
  • "**" for a dictionary

This will note MLRun that the return object is in fact an arbitrary number of objects needed to be logged separately as their own artifact.

Clearing mechanism

Each packager can specify what outputs to clean at the end of the run to reduce garbage produced in each run.

Custom packagers

As written in the DefaultPackager section, users can write their own packagers and add them to their project for MLRun to use them for logging outputs and parsing inputs of their special python objects. Each custom packager is added to a project with a is_mandatory flag to specify whether it is a must be collected packager for a run or not. Mandatory packager will raise error in case it failed to be collected.

Priority

Packagers have a priority from 1 (most important) to 10 (least important) so new custom packagers for same types can be added on top of current ones.

Added Packagers

  • builtins
    • int
    • float
    • bool
    • str
    • dict
    • list
    • tuple
    • set
    • frozenset
    • bytes
    • bytearray
  • pathlib
    • Path (and all inheriting classes)
  • numpy
    • numpy.ndarray
    • numpy.number (and all inheriting classes)
    • list[numpy.ndarray]
    • dict[str, numpy.ndarray]
  • pandas
    • pandas.Series
    • padnas.DataFrame

Copy link
Member

@theSaarco theSaarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a very good start. I have comments, but in general the logical flow seems fine, and makes sense. I think even at this stage in the process you must provide some tests, or at least a testing framework that will have more tests added later - otherwise we have a huge chunk of code that is practically not verified in any way.

# See the License for the specific language governing permissions and
# limitations under the License.
#
class ArtifactTypes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class ArtifactTypes:
class ArtifactType(Enum):

Better to have these classes defined as Enum. Same for LogHintKeys.

Copy link
Member Author

@guy1992l guy1992l Mar 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enums require doing x.value on them to get the string value, and because users may define their own types to do so (for example: pack_imageset for artifact type "imageset") I decided to remove the Enum and leave it as a collection of commonly used options. Same for log hints.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you don't have to use the string representation for comparison, for example, you can do -
assert my_value == ArtifactType.SOME_TYPE, for example (the Enum class actually creates a singleton per included value). You only need string comparison if you serialize the value.
Also, what do you mean by "their own types"? I thought these types are only for artifacts, so users are not expected to just add a new artifact type out of the blue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The value is serialized in the function yaml and the runtime.
  • We have Artifact, DirArtifact, DatasetArtifact and ModelArtifact (the plots are basically artifacts with body....) so the user can decide to add a pack_image_dataset to special handle how to log images datasets, so he could use a log hint like: "my_imageset: image_dataset" for example. It's not really an artifact type by technical MLRun definition but it can be seem like so by a user of MLRun.

@@ -20,7 +20,7 @@
"code_to_function",
"import_function",
"handler",
"ArtifactType",
"ArtifactTypes",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't agree with this. See https://stackoverflow.com/questions/1335959/singular-or-plural-for-enumerations - when using an enum you should use singular terms. Only if referring bit-fields or values that can be or-ed together, then you should use plural.
Having said that, we don't really follow this everywhere, but that's a different story - we should go over the code-base and correct where things are not done properly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change that 👍

@@ -83,6 +83,7 @@ class ArtifactSpec(ModelObj):
"size",
"db_key",
"extra_data",
"package_instructions",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"package_instructions",
"packaging_instructions",

Sounds more correct to me, I think...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me, changed

mlrun/package/context_handler.py Show resolved Hide resolved
)
)
continue
parsed_args.append(argument)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer using an else: here, rather then a continue, but that may be just my inner c++ talking...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

mlrun/package/packagers_manager.py Outdated Show resolved Hide resolved
mlrun/package/packagers_manager.py Outdated Show resolved Hide resolved
self._packagers: typing.List[Packager] = []

# Collect the builtin standard packagers:
self._collect_packagers()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to log here the list of available packagers after the collection, will ease up debugging etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

results: dict,
) -> typing.Union[Artifact, str, int, float, None]:
"""
Look for an extra data item (artifact of result) by given key. If not found, None is returned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Look for an extra data item (artifact of result) by given key. If not found, None is returned.
Look for an extra data item (artifact or result) by given key. If not found, None is returned.

Copy link
Contributor

@Tankilevitch Tankilevitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good and detailed PR @guy1992l !
Well done with the test coverage!
Added some comments and suggestions

mlrun/artifacts/base.py Outdated Show resolved Hide resolved
mlrun/model.py Outdated Show resolved Hide resolved
mlrun/package/common.py Outdated Show resolved Hide resolved
mlrun/package/common.py Outdated Show resolved Hide resolved
mlrun/package/common.py Outdated Show resolved Hide resolved
Comment on lines 577 to 581
if missing_arguments:
logger.warn(
f"Missing {arguments_type} for {cls.__name__}: {', '.join(missing_arguments)}. The packager won't "
f"handle the object."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth adding a call to action for what the user should do? like to add the missing arguments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! adding it 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you decided to fail instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it logs a warning message and skip the object as mandatory arguments are missing. the context handler will ignore the exception to not drop the run.

Comment on lines 46 to 55
class StrPackager(DefaultPackager):
PACKABLE_OBJECT_TYPE = str
DEFAULT_ARTIFACT_TYPE = ArtifactType.RESULT

class ArchiveFormats(Enum):
ZIP = "zip"
TAR_GZ = "tar.gz"

@classmethod
def pack_file(cls, obj: str, key: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC it is like a best effort where the strpackager will try to resolve whether the returned string is a file and if so it will zip it and if not it will fall back to the defaultPackager and will just pickle it? or am I wrong? either way I think it would be nice to add some more description to this packager

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the user can pass the artifact type to be 'file', 'directory', 'result' and 'object' for strings. But same as above: All of the packagers are yet to be implemented. The current ones are just for idea. I waited for full review before I proceed to implement them.

mlrun/package/packagers_manager.py Outdated Show resolved Hide resolved
Comment on lines 98 to 100
Collect the provided packagers. Packagers passed as module paths will be imported and validated to be of type
`Packager`. If needed to import all packagers from a module, use the module path with a "*" at the end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth stating what will be order the packagers in * will be added in

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order they are collected is being printed in [debug]. I think for now this is enough. If user would like to specify the order, he can use the specific module path and not the *.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the order will be in the order that they are defined in the module level right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

@Tankilevitch Tankilevitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, minor stuff left mainly on tests side

Comment on lines 297 to 346

@pytest.mark.parametrize(
"test_tuple",
[
(
"*list_",
[0.12111, 0.56111],
{"list_0": 0.12, "list_1": 0.56},
),
(
"*set_",
{0.12111, 0.56111},
{"set_0": 0.12, "set_1": 0.56},
),
(
"*",
(0.12111, 0.56111),
{"0": 0.12, "1": 0.56},
),
(
"*error",
0.12111,
"The log hint key '*error' has an iterable unpacking prefix ('*')",
),
(
"**dict_",
{"a": 0.12111, "b": 0.56111},
{"dict_a": 0.12, "dict_b": 0.56},
),
("**", {"a": 0.12111, "b": 0.56111}, {"a": 0.12, "b": 0.56}),
(
"**error",
0.12111,
"The log hint key '**error' has a dictionary unpacking prefix ('**')",
),
],
)
def test_arbitrary_log_hint(
test_tuple: Tuple[
str, Union[List[float], Dict[str, float]], Union[Dict[str, float], str]
]
):
"""
Test the arbitrary log hint key prefixes "*" and "**".

:param test_tuple: A tuple of the key to use in the log hint, the object to pack and the expected results that
should be packed. A string means an error should be raised.
"""
# Prepare the test:
key, obj, expected_results = test_tuple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.mark.parametrize(
"test_tuple",
[
(
"*list_",
[0.12111, 0.56111],
{"list_0": 0.12, "list_1": 0.56},
),
(
"*set_",
{0.12111, 0.56111},
{"set_0": 0.12, "set_1": 0.56},
),
(
"*",
(0.12111, 0.56111),
{"0": 0.12, "1": 0.56},
),
(
"*error",
0.12111,
"The log hint key '*error' has an iterable unpacking prefix ('*')",
),
(
"**dict_",
{"a": 0.12111, "b": 0.56111},
{"dict_a": 0.12, "dict_b": 0.56},
),
("**", {"a": 0.12111, "b": 0.56111}, {"a": 0.12, "b": 0.56}),
(
"**error",
0.12111,
"The log hint key '**error' has a dictionary unpacking prefix ('**')",
),
],
)
def test_arbitrary_log_hint(
test_tuple: Tuple[
str, Union[List[float], Dict[str, float]], Union[Dict[str, float], str]
]
):
"""
Test the arbitrary log hint key prefixes "*" and "**".
:param test_tuple: A tuple of the key to use in the log hint, the object to pack and the expected results that
should be packed. A string means an error should be raised.
"""
# Prepare the test:
key, obj, expected_results = test_tuple
@pytest.mark.parametrize(
"key,obj,expected_results,expected_error",
[
(
"*list_",
[0.12111, 0.56111],
{"list_0": 0.12, "list_1": 0.56},
),
(
"*set_",
{0.12111, 0.56111},
{"set_0": 0.12, "set_1": 0.56},
),
(
"*",
(0.12111, 0.56111),
{"0": 0.12, "1": 0.56},
),
(
"*error",
0.12111,
"The log hint key '*error' has an iterable unpacking prefix ('*')",
),
(
"**dict_",
{"a": 0.12111, "b": 0.56111},
{"dict_a": 0.12, "dict_b": 0.56},
),
("**", {"a": 0.12111, "b": 0.56111}, {"a": 0.12, "b": 0.56}),
(
"**error",
0.12111,
"The log hint key '**error' has a dictionary unpacking prefix ('**')",
),
],
)
def test_arbitrary_log_hint(
key: str, object: ... , expected_results: Dict[str, float] , expected_error: str = None
):
"""
Test the arbitrary log hint key prefixes "*" and "**".
:param key: key to use in the log hint
:param object: the object to pack
:param expected_results: expected results that should be packed. empty if error is expected
:param expected_error: error message if expected error, empty otherwise
"""

Comment on lines 23 to 26
@pytest.mark.parametrize(
"test_tuple",
[
("some_key", {LogHintKey.KEY: "some_key"}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Comment on lines 27 to 29
@pytest.mark.parametrize(
"test_tuple",
[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -29,13 +30,14 @@ class SomeClass:


@pytest.mark.parametrize(
"typing_type_test",
"test_tuple",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Member

@theSaarco theSaarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very impressive framework, with a lot of options.
I do have some comments and thoughts around naming of some items, and one functionality concern around the usage when handling Unions.

pass


class MLRunPackagePackagerCollectionError(MLRunPackageError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class MLRunPackagePackagerCollectionError(MLRunPackageError):
class MLRunPackageCollectionError(MLRunPackageError):

Package-packager seems like a redundancy here...

if is_mandatory:
raise error
else:
self._context.logger.debug(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a difference between the extended packagers which are not mandatory, but are there by default due to MLRun code, and packagers that the user added to the project and declared as non-mandatory. In the former case I would indeed keep this as debug, but if the user actually went through the effort of registering the packagers in the project and they're missing, then I would change this to a warning at least.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you meant, I'll edit it

mlrun/package/packager.py Show resolved Hide resolved
pass

@classmethod
def future_clear(cls, path: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def future_clear(cls, path: str):
def add_future_clearing_path(cls, path: str):

Indicates better that you're just adding a path to clear, not replacing the existing list.

cls._CLEARING_PATH_LIST.append(path)

@classmethod
def get_clearing_path_list(cls) -> List[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_clearing_path_list(cls) -> List[str]:
def get_future_clearing_path_list(cls) -> List[str]:

To align with everywhere else where we reference them as "future" paths.


@classmethod
def pack_dataset(cls, obj: pd.Series, key: str, fmt: str = "parquet"):
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, its just to get the layout done and ready for the next commits

Comment on lines 166 to 168
return set(
list(
itertools.chain(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have to go through list and then set over the chained results? As chain() returns an iterator, can't you directly create a set from it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completely right!

Comment on lines 439 to 448
while len(possible_type_hints) > 0:
for type_hint in possible_type_hints:
for packager in self._packagers:
if packager.is_packable(
object_type=type_hint, artifact_type=artifact_type
):
return packager
# Reduce the type hint list and continue:
possible_type_hints = TypeHintUtils.reduce_type_hint(
type_hint=possible_type_hints
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop got me a little concerned about how this really works with Union objects, and how to really support them. The issue is - let's say I have an object whose declared type is Union[X, Y] and I have a packager that supports only the X type. Given the current implementation, this packager will be chosen to handle this object since X is one of the types in the hint, but on runtime the actual object may very well be of type Y which is not supported.
So, I think there are some options here:

  1. Don't support Union (probably not a good thing)
  2. Support packaging a Union only if the packager supports all the types that the type hint translates to (instead of the current any) - of course, a packager that supports ... will work as well since it supports all types

WDYT?

Copy link
Member Author

@guy1992l guy1992l Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, your concern is totally justified, but the solutions suggested are not my favorite:

  1. Agree it won't be a good thing.
  2. IMO a packager should focus on a single type, not a list of types. If needed, a user can write a specific packager with the exact Union[X, Y] as the packing type.

The more I thought about it the more I wanted to go back to my previous solution. We will try to unpack as all hinted types in the union and only if all unpacking failed we will raise an error. There are two scenarios where this would apply:

  1. The user is using an old artifact or a path as its input, so it was not packaged and we don't have the original type - makes sense to simply try all options until we hit one (the user's code should support whatever was working as the type hint suggested is a union...)
  2. The user did used a packaged artifact, but he used a type hint that the original object type is not equal or part of (in case of a union). In this case, we warn the user and will try again to match whatever works.

WDYT?

Comment on lines 49 to 51
@classmethod
def is_packable(cls, object_type: Type, artifact_type: str = None) -> bool:
return object_type is cls.PACKABLE_OBJECT_TYPE and artifact_type == "result"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this (or a similar) implementation be included in the base class (Packager)? I would expect the default implementation to be:

return object_type is cls.PACKABLE_OBJECT_TYPE and artifact_type in cls.get_supported_artifact_types()

It doesn't support ... as supported types, but we don't expect most packagers to support it, and if you want to support it then override the implementation (like the default-packager does). Having this in Packager will remove the need to implement it per packager.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added :)

pass


class PackagerB(DefaultPackager):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end of the day, I'm a bit confused as to when I should inherit from Packager and when I need to extend DefaultPackager. Should I extend DefaultPackager when I want to support all types and implement a new packaging format? Or is there another reasoning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default packager gives you quality of life features like already implementing logging as result and object (pickling) and also implements for you automatically all the abstract methods. Preferably a user would like to inherit from the DefaultPackager, but we don't force him to. He can start from scratch by inheriting Packager.

Copy link
Member

@quaark quaark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RunDB Mock Changes look good :)

Copy link
Member

@theSaarco theSaarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good and very comprehensive. I still have some minor issues with some sdks and function signatures and stuff - please take a look.

# Whether to treat returned tuples from functions as a tuple and not as multiple returned items. If True, all
# returned values will be packaged together as the tuple they are returned in. Default is False to enable
# logging multiple returned items.
"pack_tuples": False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this possibly be confusing if someone set it to True? I mean, this has to be in-sync with your returns
(or outputs in the decorator) configuration for handlers, and if you modify this then existing code will break.
I'm just wondering why this configuration is needed, is there a use-case for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is synced. If you mark it as True you must provide 1 log hint, otherwise the other would be ignored... It makes sense. The use-case is someone who return a tuple and want to log it. Not common as usually you return lists but I made it as a conf option to cover all possibilities. I can delete it as well if you think it is redundant :)

@@ -467,6 +470,19 @@ def show(self, format=None):
else:
logger.error(f"unsupported show() format {suffix} for {self.url}")

def is_artifact(self) -> Union[str, bool]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def is_artifact(self) -> Union[str, bool]:
def get_artifact_type(self) -> Union[str, None]:

I would expect a is_XXX function to return bool. This signature is more coherent - if the item is not an artifact then return None.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


* `str` - A string in the format of '{key}:{artifact_type}'. If a string was given without ':' it
will indicate the key and the artifact type will be according to the returned value type. The
artifact types can be one of: "dataset", "directory", "file", "object", "plot" and "result".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a little more explanation on how result is different than the other types of artifacts. Also - what happens if I make more than one output as result - is this allowed? And if not, is it checked?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course it is allowed. I need to rewrite this doc string, I'll make it clearer and updated.

* None - Do not log the output.

The list length must be equal to the total amount of returned values from the function. Default is
None - meaning no outputs will be logged.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one place when changing the pack_tuples config will break existing decorators, since the list length changes due to it...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it won't break, non-matching lengths between returned values and log hints is printed as a warning and simply ignore what cannot be logged.


import mlrun

@mlrun.handler(outputs=["my_array", None, "my_multiplier"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe give another example with dict outputs, or at least one with key:type annotation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

:param kwargs: The keyword arguments dictionary passed to the function.
:param type_hints: An ordered dictionary of the expected types of arguments.

:returns: The parsed args (kwargs are parsed inplace).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are args returned, while kwargs are parsed inplace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because kwargs is a dictionary and is passed by reference. Changing it will change the original, where args is a tuple, so it cannot be edited.

``is_packable`` and ``repr`` methods. An ellipses (`...`) means any type.
* ``PRIORITY`` - The priority of this packager among the rest of the packagers. Should be an integer between 1-10
where 1 is the highest priority and 10 is the lowest. If not set, a default priority of 5 is set for MLRun
builtin packagers and 3 for user custom packagers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there's no validation whatsoever, right? I can set my PRIORITY to 42, -200, 3.1415926535 or whatever else, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, yeah. It won't do anything wrong... I can verify it if you think its important.


mlrun.mlconf.packagers.pack_tuple = True

Or more correctly, cast your returned tuple to a ``list`` like so::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend this way more than using the one above, for reasons I mentioned in other comments...
Could also do return [my_tuple].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, its for extreme user who really want to log tuples for some reason... I can remove it in a minute.

@guy1992l guy1992l closed this Jun 13, 2023
@guy1992l guy1992l reopened this Jun 13, 2023
@liranbg liranbg merged commit 9050b38 into mlrun:development Jun 14, 2023
13 of 14 checks passed
yaelgen pushed a commit to yaelgen/mlrun that referenced this pull request Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants