You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I really like the idea and structure of metaflow. For my use case, it looks like it could simultaneously solve a lot of different problems. That said, is there any way to disable versioning and archiving of specific artifacts? If I can guarantee that my upstream data source is versioned and archived appropriately, then I don't necessarily want duplication of all artifacts (because of the storage overhead).
I could just remove certain artifacts after a time, but this would require the cleaning tool to know what should and shouldn't be archived. It would be nicer if there was some syntax to declare an artifact as transient, or at the very least a call we can make at the end of a flow to dispose of artifacts that shouldn't be versioned.
The text was updated successfully, but these errors were encountered:
One way you can achieve this (if your data is stored in s3) is by using the metaflow.s3 client to access your data within steps and passing pointers to the s3 location using the self assignments across steps.
@step
def step_a(self):
with S3(s3root=self.path_to_data_in_s3) as s3:
res = s3.get('object')
...
url = s3.put('example_object', message)
self.path_to_example_url_in_s3 = url
...
Currenly, we store the artifacts so that you can replicate the state of your workflow at anytime in the future. Making artifacts transient will modify that behaviour.
I really like the idea and structure of metaflow. For my use case, it looks like it could simultaneously solve a lot of different problems. That said, is there any way to disable versioning and archiving of specific artifacts? If I can guarantee that my upstream data source is versioned and archived appropriately, then I don't necessarily want duplication of all artifacts (because of the storage overhead).
I could just remove certain artifacts after a time, but this would require the cleaning tool to know what should and shouldn't be archived. It would be nicer if there was some syntax to declare an artifact as transient, or at the very least a call we can make at the end of a flow to dispose of artifacts that shouldn't be versioned.
The text was updated successfully, but these errors were encountered: