New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Package] Implement mlrun.package
#3333
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a very good start. I have comments, but in general the logical flow seems fine, and makes sense. I think even at this stage in the process you must provide some tests, or at least a testing framework that will have more tests added later - otherwise we have a huge chunk of code that is practically not verified in any way.
mlrun/package/constants.py
Outdated
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
class ArtifactTypes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class ArtifactTypes: | |
class ArtifactType(Enum): |
Better to have these classes defined as Enum
. Same for LogHintKeys
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enums require doing x.value
on them to get the string value, and because users may define their own types to do so (for example: pack_imageset
for artifact type "imageset"
) I decided to remove the Enum
and leave it as a collection of commonly used options. Same for log hints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you don't have to use the string representation for comparison, for example, you can do -
assert my_value == ArtifactType.SOME_TYPE
, for example (the Enum
class actually creates a singleton per included value). You only need string comparison if you serialize the value.
Also, what do you mean by "their own types"? I thought these types are only for artifacts, so users are not expected to just add a new artifact type out of the blue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The value is serialized in the function yaml and the runtime.
- We have Artifact, DirArtifact, DatasetArtifact and ModelArtifact (the plots are basically artifacts with body....) so the user can decide to add a
pack_image_dataset
to special handle how to log images datasets, so he could use a log hint like:"my_imageset: image_dataset"
for example. It's not really an artifact type by technical MLRun definition but it can be seem like so by a user of MLRun.
mlrun/__init__.py
Outdated
@@ -20,7 +20,7 @@ | |||
"code_to_function", | |||
"import_function", | |||
"handler", | |||
"ArtifactType", | |||
"ArtifactTypes", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't agree with this. See https://stackoverflow.com/questions/1335959/singular-or-plural-for-enumerations - when using an enum you should use singular terms. Only if referring bit-fields or values that can be or-ed together, then you should use plural.
Having said that, we don't really follow this everywhere, but that's a different story - we should go over the code-base and correct where things are not done properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change that 👍
mlrun/artifacts/base.py
Outdated
@@ -83,6 +83,7 @@ class ArtifactSpec(ModelObj): | |||
"size", | |||
"db_key", | |||
"extra_data", | |||
"package_instructions", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"package_instructions", | |
"packaging_instructions", |
Sounds more correct to me, I think...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works for me, changed
mlrun/package/context_handler.py
Outdated
) | ||
) | ||
continue | ||
parsed_args.append(argument) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer using an else:
here, rather then a continue
, but that may be just my inner c++ talking...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed.
mlrun/package/packagers_manager.py
Outdated
self._packagers: typing.List[Packager] = [] | ||
|
||
# Collect the builtin standard packagers: | ||
self._collect_packagers() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to log here the list of available packagers after the collection, will ease up debugging etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
mlrun/package/packagers_manager.py
Outdated
results: dict, | ||
) -> typing.Union[Artifact, str, int, float, None]: | ||
""" | ||
Look for an extra data item (artifact of result) by given key. If not found, None is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look for an extra data item (artifact of result) by given key. If not found, None is returned. | |
Look for an extra data item (artifact or result) by given key. If not found, None is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good and detailed PR @guy1992l !
Well done with the test coverage!
Added some comments and suggestions
if missing_arguments: | ||
logger.warn( | ||
f"Missing {arguments_type} for {cls.__name__}: {', '.join(missing_arguments)}. The packager won't " | ||
f"handle the object." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth adding a call to action for what the user should do? like to add the missing arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea! adding it 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you decided to fail instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it logs a warning message and skip the object as mandatory arguments are missing. the context handler will ignore the exception to not drop the run.
class StrPackager(DefaultPackager): | ||
PACKABLE_OBJECT_TYPE = str | ||
DEFAULT_ARTIFACT_TYPE = ArtifactType.RESULT | ||
|
||
class ArchiveFormats(Enum): | ||
ZIP = "zip" | ||
TAR_GZ = "tar.gz" | ||
|
||
@classmethod | ||
def pack_file(cls, obj: str, key: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC it is like a best effort where the strpackager will try to resolve whether the returned string is a file and if so it will zip it and if not it will fall back to the defaultPackager and will just pickle it? or am I wrong? either way I think it would be nice to add some more description to this packager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically the user can pass the artifact type to be 'file', 'directory', 'result' and 'object' for strings. But same as above: All of the packagers are yet to be implemented. The current ones are just for idea. I waited for full review before I proceed to implement them.
mlrun/package/packagers_manager.py
Outdated
Collect the provided packagers. Packagers passed as module paths will be imported and validated to be of type | ||
`Packager`. If needed to import all packagers from a module, use the module path with a "*" at the end. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth stating what will be order the packagers in *
will be added in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order they are collected is being printed in [debug]. I think for now this is enough. If user would like to specify the order, he can use the specific module path and not the *
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the order will be in the order that they are defined in the module level right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, minor stuff left mainly on tests side
|
||
@pytest.mark.parametrize( | ||
"test_tuple", | ||
[ | ||
( | ||
"*list_", | ||
[0.12111, 0.56111], | ||
{"list_0": 0.12, "list_1": 0.56}, | ||
), | ||
( | ||
"*set_", | ||
{0.12111, 0.56111}, | ||
{"set_0": 0.12, "set_1": 0.56}, | ||
), | ||
( | ||
"*", | ||
(0.12111, 0.56111), | ||
{"0": 0.12, "1": 0.56}, | ||
), | ||
( | ||
"*error", | ||
0.12111, | ||
"The log hint key '*error' has an iterable unpacking prefix ('*')", | ||
), | ||
( | ||
"**dict_", | ||
{"a": 0.12111, "b": 0.56111}, | ||
{"dict_a": 0.12, "dict_b": 0.56}, | ||
), | ||
("**", {"a": 0.12111, "b": 0.56111}, {"a": 0.12, "b": 0.56}), | ||
( | ||
"**error", | ||
0.12111, | ||
"The log hint key '**error' has a dictionary unpacking prefix ('**')", | ||
), | ||
], | ||
) | ||
def test_arbitrary_log_hint( | ||
test_tuple: Tuple[ | ||
str, Union[List[float], Dict[str, float]], Union[Dict[str, float], str] | ||
] | ||
): | ||
""" | ||
Test the arbitrary log hint key prefixes "*" and "**". | ||
|
||
:param test_tuple: A tuple of the key to use in the log hint, the object to pack and the expected results that | ||
should be packed. A string means an error should be raised. | ||
""" | ||
# Prepare the test: | ||
key, obj, expected_results = test_tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.mark.parametrize( | |
"test_tuple", | |
[ | |
( | |
"*list_", | |
[0.12111, 0.56111], | |
{"list_0": 0.12, "list_1": 0.56}, | |
), | |
( | |
"*set_", | |
{0.12111, 0.56111}, | |
{"set_0": 0.12, "set_1": 0.56}, | |
), | |
( | |
"*", | |
(0.12111, 0.56111), | |
{"0": 0.12, "1": 0.56}, | |
), | |
( | |
"*error", | |
0.12111, | |
"The log hint key '*error' has an iterable unpacking prefix ('*')", | |
), | |
( | |
"**dict_", | |
{"a": 0.12111, "b": 0.56111}, | |
{"dict_a": 0.12, "dict_b": 0.56}, | |
), | |
("**", {"a": 0.12111, "b": 0.56111}, {"a": 0.12, "b": 0.56}), | |
( | |
"**error", | |
0.12111, | |
"The log hint key '**error' has a dictionary unpacking prefix ('**')", | |
), | |
], | |
) | |
def test_arbitrary_log_hint( | |
test_tuple: Tuple[ | |
str, Union[List[float], Dict[str, float]], Union[Dict[str, float], str] | |
] | |
): | |
""" | |
Test the arbitrary log hint key prefixes "*" and "**". | |
:param test_tuple: A tuple of the key to use in the log hint, the object to pack and the expected results that | |
should be packed. A string means an error should be raised. | |
""" | |
# Prepare the test: | |
key, obj, expected_results = test_tuple | |
@pytest.mark.parametrize( | |
"key,obj,expected_results,expected_error", | |
[ | |
( | |
"*list_", | |
[0.12111, 0.56111], | |
{"list_0": 0.12, "list_1": 0.56}, | |
), | |
( | |
"*set_", | |
{0.12111, 0.56111}, | |
{"set_0": 0.12, "set_1": 0.56}, | |
), | |
( | |
"*", | |
(0.12111, 0.56111), | |
{"0": 0.12, "1": 0.56}, | |
), | |
( | |
"*error", | |
0.12111, | |
"The log hint key '*error' has an iterable unpacking prefix ('*')", | |
), | |
( | |
"**dict_", | |
{"a": 0.12111, "b": 0.56111}, | |
{"dict_a": 0.12, "dict_b": 0.56}, | |
), | |
("**", {"a": 0.12111, "b": 0.56111}, {"a": 0.12, "b": 0.56}), | |
( | |
"**error", | |
0.12111, | |
"The log hint key '**error' has a dictionary unpacking prefix ('**')", | |
), | |
], | |
) | |
def test_arbitrary_log_hint( | |
key: str, object: ... , expected_results: Dict[str, float] , expected_error: str = None | |
): | |
""" | |
Test the arbitrary log hint key prefixes "*" and "**". | |
:param key: key to use in the log hint | |
:param object: the object to pack | |
:param expected_results: expected results that should be packed. empty if error is expected | |
:param expected_error: error message if expected error, empty otherwise | |
""" |
@pytest.mark.parametrize( | ||
"test_tuple", | ||
[ | ||
("some_key", {LogHintKey.KEY: "some_key"}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
tests/package/utils/test_pickler.py
Outdated
@pytest.mark.parametrize( | ||
"test_tuple", | ||
[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
@@ -29,13 +30,14 @@ class SomeClass: | |||
|
|||
|
|||
@pytest.mark.parametrize( | |||
"typing_type_test", | |||
"test_tuple", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very impressive framework, with a lot of options.
I do have some comments and thoughts around naming of some items, and one functionality concern around the usage when handling Unions.
mlrun/package/errors.py
Outdated
pass | ||
|
||
|
||
class MLRunPackagePackagerCollectionError(MLRunPackageError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class MLRunPackagePackagerCollectionError(MLRunPackageError): | |
class MLRunPackageCollectionError(MLRunPackageError): |
Package-packager seems like a redundancy here...
mlrun/package/context_handler.py
Outdated
if is_mandatory: | ||
raise error | ||
else: | ||
self._context.logger.debug( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a difference between the extended packagers which are not mandatory, but are there by default due to MLRun code, and packagers that the user added to the project and declared as non-mandatory. In the former case I would indeed keep this as debug, but if the user actually went through the effort of registering the packagers in the project and they're missing, then I would change this to a warning
at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you meant, I'll edit it
mlrun/package/packager.py
Outdated
pass | ||
|
||
@classmethod | ||
def future_clear(cls, path: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def future_clear(cls, path: str): | |
def add_future_clearing_path(cls, path: str): |
Indicates better that you're just adding a path to clear, not replacing the existing list.
mlrun/package/packager.py
Outdated
cls._CLEARING_PATH_LIST.append(path) | ||
|
||
@classmethod | ||
def get_clearing_path_list(cls) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_clearing_path_list(cls) -> List[str]: | |
def get_future_clearing_path_list(cls) -> List[str]: |
To align with everywhere else where we reference them as "future" paths.
|
||
@classmethod | ||
def pack_dataset(cls, obj: pd.Series, key: str, fmt: str = "parquet"): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, its just to get the layout done and ready for the next commits
return set( | ||
list( | ||
itertools.chain( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have to go through list
and then set
over the chained results? As chain()
returns an iterator, can't you directly create a set from it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
completely right!
mlrun/package/packagers_manager.py
Outdated
while len(possible_type_hints) > 0: | ||
for type_hint in possible_type_hints: | ||
for packager in self._packagers: | ||
if packager.is_packable( | ||
object_type=type_hint, artifact_type=artifact_type | ||
): | ||
return packager | ||
# Reduce the type hint list and continue: | ||
possible_type_hints = TypeHintUtils.reduce_type_hint( | ||
type_hint=possible_type_hints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop got me a little concerned about how this really works with Union
objects, and how to really support them. The issue is - let's say I have an object whose declared type is Union[X, Y]
and I have a packager that supports only the X
type. Given the current implementation, this packager will be chosen to handle this object since X
is one of the types in the hint, but on runtime the actual object may very well be of type Y
which is not supported.
So, I think there are some options here:
- Don't support
Union
(probably not a good thing) - Support packaging a
Union
only if the packager supports all the types that the type hint translates to (instead of the current any) - of course, a packager that supports...
will work as well since it supports all types
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, your concern is totally justified, but the solutions suggested are not my favorite:
- Agree it won't be a good thing.
- IMO a packager should focus on a single type, not a list of types. If needed, a user can write a specific packager with the exact
Union[X, Y]
as the packing type.
The more I thought about it the more I wanted to go back to my previous solution. We will try to unpack as all hinted types in the union and only if all unpacking failed we will raise an error. There are two scenarios where this would apply:
- The user is using an old artifact or a path as its input, so it was not packaged and we don't have the original type - makes sense to simply try all options until we hit one (the user's code should support whatever was working as the type hint suggested is a union...)
- The user did used a packaged artifact, but he used a type hint that the original object type is not equal or part of (in case of a union). In this case, we warn the user and will try again to match whatever works.
WDYT?
@classmethod | ||
def is_packable(cls, object_type: Type, artifact_type: str = None) -> bool: | ||
return object_type is cls.PACKABLE_OBJECT_TYPE and artifact_type == "result" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this (or a similar) implementation be included in the base class (Packager
)? I would expect the default implementation to be:
return object_type is cls.PACKABLE_OBJECT_TYPE and artifact_type in cls.get_supported_artifact_types()
It doesn't support ...
as supported types, but we don't expect most packagers to support it, and if you want to support it then override the implementation (like the default-packager does). Having this in Packager
will remove the need to implement it per packager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added :)
pass | ||
|
||
|
||
class PackagerB(DefaultPackager): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the end of the day, I'm a bit confused as to when I should inherit from Packager
and when I need to extend DefaultPackager
. Should I extend DefaultPackager
when I want to support all types and implement a new packaging format? Or is there another reasoning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default packager gives you quality of life features like already implementing logging as result and object (pickling) and also implements for you automatically all the abstract methods. Preferably a user would like to inherit from the DefaultPackager
, but we don't force him to. He can start from scratch by inheriting Packager
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RunDB Mock Changes look good :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good and very comprehensive. I still have some minor issues with some sdks and function signatures and stuff - please take a look.
# Whether to treat returned tuples from functions as a tuple and not as multiple returned items. If True, all | ||
# returned values will be packaged together as the tuple they are returned in. Default is False to enable | ||
# logging multiple returned items. | ||
"pack_tuples": False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this possibly be confusing if someone set it to True
? I mean, this has to be in-sync with your returns
(or outputs
in the decorator) configuration for handlers, and if you modify this then existing code will break.
I'm just wondering why this configuration is needed, is there a use-case for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is synced. If you mark it as True you must provide 1 log hint, otherwise the other would be ignored... It makes sense. The use-case is someone who return a tuple and want to log it. Not common as usually you return lists but I made it as a conf option to cover all possibilities. I can delete it as well if you think it is redundant :)
mlrun/datastore/base.py
Outdated
@@ -467,6 +470,19 @@ def show(self, format=None): | |||
else: | |||
logger.error(f"unsupported show() format {suffix} for {self.url}") | |||
|
|||
def is_artifact(self) -> Union[str, bool]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def is_artifact(self) -> Union[str, bool]: | |
def get_artifact_type(self) -> Union[str, None]: |
I would expect a is_XXX
function to return bool
. This signature is more coherent - if the item is not an artifact then return None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
mlrun/package/__init__.py
Outdated
|
||
* `str` - A string in the format of '{key}:{artifact_type}'. If a string was given without ':' it | ||
will indicate the key and the artifact type will be according to the returned value type. The | ||
artifact types can be one of: "dataset", "directory", "file", "object", "plot" and "result". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a little more explanation on how result
is different than the other types of artifacts. Also - what happens if I make more than one output as result
- is this allowed? And if not, is it checked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course it is allowed. I need to rewrite this doc string, I'll make it clearer and updated.
mlrun/package/__init__.py
Outdated
* None - Do not log the output. | ||
|
||
The list length must be equal to the total amount of returned values from the function. Default is | ||
None - meaning no outputs will be logged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one place when changing the pack_tuples
config will break existing decorators, since the list length changes due to it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it won't break, non-matching lengths between returned values and log hints is printed as a warning and simply ignore what cannot be logged.
mlrun/package/__init__.py
Outdated
|
||
import mlrun | ||
|
||
@mlrun.handler(outputs=["my_array", None, "my_multiplier"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe give another example with dict outputs
, or at least one with key:type
annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
:param kwargs: The keyword arguments dictionary passed to the function. | ||
:param type_hints: An ordered dictionary of the expected types of arguments. | ||
|
||
:returns: The parsed args (kwargs are parsed inplace). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are args
returned, while kwargs
are parsed inplace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because kwargs
is a dictionary and is passed by reference. Changing it will change the original, where args
is a tuple, so it cannot be edited.
``is_packable`` and ``repr`` methods. An ellipses (`...`) means any type. | ||
* ``PRIORITY`` - The priority of this packager among the rest of the packagers. Should be an integer between 1-10 | ||
where 1 is the highest priority and 10 is the lowest. If not set, a default priority of 5 is set for MLRun | ||
builtin packagers and 3 for user custom packagers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But there's no validation whatsoever, right? I can set my PRIORITY
to 42
, -200
, 3.1415926535
or whatever else, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, yeah. It won't do anything wrong... I can verify it if you think its important.
|
||
mlrun.mlconf.packagers.pack_tuple = True | ||
|
||
Or more correctly, cast your returned tuple to a ``list`` like so:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend this way more than using the one above, for reasons I mentioned in other comments...
Could also do return [my_tuple]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, its for extreme user who really want to log tuples for some reason... I can remove it in a minute.
ML-3474
Summary
Implementation of
mlrun.package
- a sub-package in MLRun for packing returning outputs, logging them to MLRun and unpacking inputs, parsing data items to their required type.Motivation
Artifact
s,DataItem
s and context (MLClientCtx
).Main Features
Passing objects between runtimes
In order to log artifacts and result, users will require to simply return them from the handler. As most function based code is doing so, using MLRun won't require further changes to your code, only to specify log hints in the
returns
argument ofrun
and the outputs will be logged according to them.Parsing inputs won't require any special handling for getting the artifact into it's original type. Users will simply need to add type hints to their inputs and MLRun will take care of the rest.
Moreover, if a packaged input is passed, then a type hint is not required as each package is labeled by the manager with the original type, artifact type and packager of it.
Implementation of a
DefaultPackager
Users are welcome to implement the
Packager
abstract class but a much more convenient and flexible implementation is added via theDefaultPackager
. This class has two adventages:Packager
and add a simple logic that should be used by users to implement their own packagers with an easier and more automatic experience.Log arbitrary number of objects
Users can specify in the log hint key a prefix of python unpacking operator:
"*"
for a list"**"
for a dictionaryThis will note MLRun that the return object is in fact an arbitrary number of objects needed to be logged separately as their own artifact.
Clearing mechanism
Each packager can specify what outputs to clean at the end of the run to reduce garbage produced in each run.
Custom packagers
As written in the
DefaultPackager
section, users can write their own packagers and add them to their project for MLRun to use them for logging outputs and parsing inputs of their special python objects. Each custom packager is added to a project with ais_mandatory
flag to specify whether it is a must be collected packager for a run or not. Mandatory packager will raise error in case it failed to be collected.Priority
Packagers have a priority from 1 (most important) to 10 (least important) so new custom packagers for same types can be added on top of current ones.
Added Packagers
builtins
int
float
bool
str
dict
list
tuple
set
frozenset
bytes
bytearray
pathlib
Path
(and all inheriting classes)numpy
numpy.ndarray
numpy.number
(and all inheriting classes)list[numpy.ndarray]
dict[str, numpy.ndarray]
pandas
pandas.Series
padnas.DataFrame