Skip to content

Conversation

@rgferrari
Copy link

@rgferrari rgferrari commented Dec 1, 2021

add the flag append_only to the pipeline to avoid unprotecting the files while persist flag is true.

Fixes #6562.

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Link to the dvc.org PR: iterative/dvc.org#3056

add the flag append_only to the pipeline to avoid unprotecting the files while persist flag is true.

Fixes iterative#6562.
@rgferrari rgferrari requested a review from a team as a code owner December 1, 2021 16:08
@rgferrari rgferrari requested a review from skshetry December 1, 2021 16:08
rgferrari added a commit to rgferrari/dvc.org that referenced this pull request Dec 2, 2021
Copy link
Contributor

@daavoo daavoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rgferrari ! Thanks for the contribution.

Could you please include some tests for this new flag? You can use existing tests for the persist flag https://github.com/iterative/dvc/search?p=1&q=persist

The new flag should be also added to some additional places (i.e. dvc/stage/serialize.py, dvc/stage/utils.py, dvc/command/stage.py . . .). Similarly to tests, you can use the search URL above to look for persist usages.

Copy link
Contributor

@pmrowla pmrowla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another general question I had is regarding whether or not it should actually be named append-only? The current naming convention seems to only apply to a persistent directory (where "append" is referring to adding new files to that directory)

If I have a single file persistent output, and I want to append to it (meaning write data to the end of that single file), this flag would prevent me from doing that

for out in self.outs:
if (out.persist or out.checkpoint or out.live) and not force:
if (
(out.persist and not out.append_only)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change would make it so that instead of being unprotected, the output is removed entirely (meaning it would be treated as a non-persistent output)

Copy link
Collaborator

@skshetry skshetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @rgferrari for the contribution. The title of your PR is a bit misleading as it's not adding a new flag to the dvc repro, but to the dvc.yaml and .dvc files.

The discussion on #6562 seems to focus on making this the default or adding a new CLI flag to the dvc repro, so I'd really like it if we move the discussion there ahead and then only commit to the new flag vs the new options in dvc.yaml, as we usually don't break compatibility on dvc.yaml.

Could you please share your workflow, the issues that you are having, and your proposal there?

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label Dec 6, 2021
@rgferrari
Copy link
Author

Hi everyone, I appreciate all the feedbacks.

Regarding @pmrowla 's question about the name, yes, it really doesn't make sense if you apply it to files. Do you think a name like keep_protect would make more sense?

About the 'if' change, you are right, it deletes the folder and write it again, I didn't notice that. Nevertheless I didn't find any other part of the code linking the persist option to unprotecting it, gotta search more about it.

Now, about @skshetry 's comment. The PR's title is indeed wrong, sorry, I'll change it latter to "a new option to dvc.yaml...". My proposal was to create a dvc.yaml option because, as @dberenbaum said, it makes more sense to set a dvc.yaml option only one time rather than have to write it again on the command line at every execution.

@rgferrari rgferrari changed the title pipeline: add a new flag to dvc repro to avoid unprotecting files dvc.yaml: added a new option to dvc.yaml to avoid unprotecting files Dec 20, 2021
@shcheklein
Copy link
Member

@iterative/dvc what is the progress here?

rgferrari and others added 3 commits January 10, 2022 17:12
add the flag keep_protect to the dvc.yaml options to avoid unprotecting the files while persist flag is true.
@dberenbaum
Copy link
Contributor

@iterative/dvc Pinging about this PR. Could anyone take a look at this?

@dberenbaum dberenbaum removed the awaiting response we are waiting for your reply, please respond! :) label Apr 18, 2022
Copy link
Collaborator

@skshetry skshetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, for such a long wait on this PR. We were expecting a lot of data management stuff to be clear after our refactoring with dvc-data.

Unfortunately, I have to decline this PR and would like to continue more discussion on this in #6562. The reason for this is that we already have a few options for modifying the data-management behaviors like persist and cache which themselves are not in a good state (for example, cache: false still caches). This option feels like one more flag to modify behavior, and we may need more to change the behavior that dvc-data provides.

Another reason is that after we commit to this PR, as we usually don't break dvc.yaml file format, we have to support these flags for a long time. We'd like to encourage discussion on #6562, and see if we can even make them the default.

Thanks for contributing to DVC, and again sorry for the very long silence in your PR. πŸ™

@daavoo
Copy link
Contributor

daavoo commented Aug 9, 2022

Closing per #7072 (review)

@daavoo daavoo closed this Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Keep persistent outputs protected by default?

7 participants