Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Project] Retain producers of exported artifacts #5283

Merged
merged 8 commits into from Mar 19, 2024

Conversation

TomerShor
Copy link
Member

Previously, when artifacts where imported either via load_project or import_artifact, their producer property (+tree) were overridden by the registering project information.
In case artifacts were created by runs, overriding their producer causes them to lose some information.

To fix this, we retain the registered artifacts' producers according to the following logic:

  • If artifact was originally created by a project, we can re-assign it to the new producer, since they are going to belong to this new project anyway.
  • If the artifact was originally registered by a run (producer type is run), we retain the producer, and check if the artifact already exists in the system:
    • If it does - then there's no need to re-register it.
    • If not - register it with the original producer.

Resolves https://iguazio.atlassian.net/browse/ML-4134

Copy link
Member

@alonmr alonmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, also missing test for when we change the producer to the new project producer

mlrun/projects/project.py Outdated Show resolved Hide resolved
mlrun/projects/project.py Show resolved Hide resolved
mlrun/projects/project.py Outdated Show resolved Hide resolved
mlrun/projects/project.py Outdated Show resolved Hide resolved
mlrun/projects/project.py Outdated Show resolved Hide resolved
mlrun/projects/project.py Outdated Show resolved Hide resolved
producer = self._resolve_artifact_producer(item)
if producer.name != self.metadata.name:
# the artifact producer is retained, so log it only if it doesn't already exist
if existing_artifact := self._resolve_existing_artifact(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the artifact changed and we want to update it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question actually.
As per @theSaarco we shouldn't re-register artifacts if we retain their producer, but indeed the spec etc can change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question indeed.
Of course this is only expected if we're re-importing on the same env, since it's a run-id that already exists in the system. Even then, I tend to think it's a strange situation in which the same run-id had generated a different version of the same artifact. It seems more likely that the same artifact was modified after the run was done, in which case it would be probably better to maintain the manual changes done to the artifact.
I guess if the user wants to "reset" the artifact, it will need to be deleted before the import.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, that makes sense. I suggest adding a log that says the artifact is already registered so we're skipping it.

tests/projects/test_project.py Outdated Show resolved Hide resolved
Copy link
Member

@theSaarco theSaarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Some minor comments, mostly on style.

mlrun/projects/project.py Outdated Show resolved Hide resolved
mlrun/projects/project.py Show resolved Hide resolved
@@ -88,9 +88,10 @@ class ArtifactSpec(ModelObj):
"db_key",
"extra_data",
"unpackaging_instructions",
"producer",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change needed? This means the producer will now be in the result of base_dict() for artifacts - is this intended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When setting an artifact on the project it is using the base_dict(), so the producer property was not exported because it was a part of the _extra_fields.
This is a crucial part to maintain the producers when loading a project - to actually have it on the exported spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this means that this fix will not work for any project exported prior to this PR, right?
Also, need to make sure there are no BC issues there (I don't think there are, but needs to be verified).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It obviously won't fix projects that were exported before this fix, so I'm not sure what BC issues we should check here..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious that projects exported before the fix won't be fixed - it's the situation because the export doesn't contain the information needed for the fix.
As for BC, just to make sure that now that artifact export/import contains the producer field in the base export fields, that artifacts that were exported before won't fail importing due to it (again, I'm pretty sure they won't fail, just something to pay attention to).

mlrun/projects/project.py Outdated Show resolved Hide resolved
@TomerShor TomerShor marked this pull request as ready for review March 14, 2024 06:31
Copy link
Member

@alonmr alonmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Couple of logging requests

mlrun/projects/project.py Show resolved Hide resolved
producer = self._resolve_artifact_producer(item)
if producer.name != self.metadata.name:
# the artifact producer is retained, so log it only if it doesn't already exist
if existing_artifact := self._resolve_existing_artifact(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, that makes sense. I suggest adding a log that says the artifact is already registered so we're skipping it.

Copy link
Member

@liranbg liranbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really minor

mlrun/projects/project.py Outdated Show resolved Hide resolved
@liranbg liranbg merged commit 3a28f9d into mlrun:development Mar 19, 2024
10 checks passed
@TomerShor TomerShor deleted the bugfix/ML-3081-1 branch March 20, 2024 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants