Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more git hooks (dvc pull/push/status) #1551

Open
AlJohri opened this issue Jan 30, 2019 · 11 comments

Comments

@AlJohri
Copy link
Contributor

commented Jan 30, 2019

can we add more git hooks for dvc pull/push/status when git pull/push/status are run? ideally, most git commands would have a hook. this will make it much less likely for someone to forget to dvc push after running a pipeline

@shcheklein

This comment has been minimized.

Copy link
Member

commented Mar 30, 2019

@shcheklein

This comment has been minimized.

Copy link
Member

commented Mar 30, 2019

Another question:

is there anyway to have git hooks automatically install on pull or clone?
just thinking of first time developer experience, right now its quite a few steps

@efiop efiop added the p1-important label Mar 30, 2019

@efiop

This comment has been minimized.

Copy link
Member

commented Mar 30, 2019

Good point @shcheklein ! Added p1 priority, we'll try to take a look at it next week.

@efiop efiop added the c8-full-day label Apr 25, 2019

@efiop efiop added this to To do in Weekly tasks via automation Apr 25, 2019

@pared

This comment has been minimized.

Copy link
Collaborator

commented Apr 26, 2019

After some quick research I have following conclusions:

  1. git pull - we can use post-merge hook to do dvc pull
  2. git push - I would like to discuss this one. It seems to me that we want to execute dvc push after successfull git push. However, all "post push hooks" (pre-receive, update, post-receive) are executed on push-side repository. We could use pre-push but with that we have to consider that dvc might take some time to push everything to cache, and then, in the end, git fails for some reason. It would be some inconvenience for the user. Though it does not seem as some repository breaking problem, as simple git reset and dvc pull should bring repository to original state.
  3. git status - it seems that we will be unable to implement this one, none of hooks is triggered upon status
  4. git clone - we can use post-checkout hook to trigger dvc checkout
  5. git checkout - i would consider creating issue for that one, because we can checkout whole branch or file, and in case of file we would need some logic to handle only checked out file, if git checks out stage file
@efiop

This comment has been minimized.

Copy link
Member

commented Apr 27, 2019

@pared

  1. git pull - we can use post-merge hook to do dvc pull

But not every pull involves merge. It might do rebase instead.

  1. git push - I would like to discuss this one. It seems to me that we want to execute dvc push after successfull git push. However, all "post push hooks" (pre-receive, update, post-receive) are executed on push-side repository. We could use pre-push but with that we have to consider that dvc might take some time to push everything to cache, and then, in the end, git fails for some reason. It would be some inconvenience for the user. Though it does not seem as some repository breaking problem, as simple git reset and dvc pull should bring repository to original state.

pre-push should be good enough 👍

  1. git clone - we can use post-checkout hook to trigger dvc checkout
  2. git checkout - i would consider creating issue for that one, because we can checkout whole branch or file, and in case of file we would need some logic to handle only checked out file, if git checks out stage file

We already have post-checkot hook that simply runs dvc checkout(without arguments). Would it be possible to make it handle specific stage files as well?

@pared

This comment has been minimized.

Copy link
Collaborator

commented Apr 29, 2019

@efiop

But not every pull involves merge. It might do rebase instead.

Well we can use post-rewrite, according to the documentation it is triggered after amend or rebase, but git is supposed to pass argument indicating whether it is amend or rebase, so we could dvc pull upon rebase only.

Would it be possible to make it handle specific stage files as well?

According to documentation, git is supposed to pass flag whether it was branch or file, but it does not explicitly point which file. So i have been too fast with saying that it is possible. To use that information we would need mechanism to find checked out files. I think it is better to keep dvc checkout as it is.

@efiop

This comment has been minimized.

Copy link
Member

commented Apr 29, 2019

@pared

Well we can use post-rewrite, according to the documentation it is triggered after amend or rebase, but git is supposed to pass argument indicating whether it is amend or rebase, so we could dvc pull upon rebase only.

But that one is going to be called not only on git pull but also on regular amend and rebase. Not sure if it is something we actually want. What do you think?

According to documentation, git is supposed to pass flag whether it was branch or file, but it does not explicitly point which file. So i have been too fast with saying that it is possible. To use that information we would need mechanism to find checked out files. I think it is better to keep dvc checkout as it is.

Nice! Would it make sense to make our checkout hook not perform dvc checkout if it sees that it is a file that you are checking out? I imagine you git checkout-ing specific file and then our dvc checkout(without args) gets triggered and even though it has confirmation prompts, you might accidentally checkout some file that you didn't want.

So for now, only git push hook is the most straightforward that we could easily add, right?

Looks like even though some git hooks do suffice for some of our purposes, the support would still be far from perfect. Maybe having simple aliases like git push -> git push + dvc push would feel better? I don't quite like aliases on dvc's side(e.g. dvc push, or some other command, that would call git push) because they would bring a lot of hustle caused by passing through git options. On the other hand, things like aliases and/or git plugins would have to be handled and installed explicitly by a user, so it would add even more complication. I've been opposing this a lot since the early days, but considering everything (including requests for --autogitadd/commit feature), maybe dvc indeed should handle those. E.g.
dvc config core.autoscm true -> would make dvc push automatically git push, dvc status automatically git status and so on? What do you guys think?

@pared

This comment has been minimized.

Copy link
Collaborator

commented Apr 29, 2019

But that one is going to be called not only on git pull but also on regular amend and rebase. Not sure if it is something we actually want. What do you think?

I must agree thats a bit of a bummer. We could solve amend vs rebase problem by reading flag passed by git, but problem "Was it pull or rebase" will remain unsolved. Both hooks (post-rewrite, post-merge) contain functionality that we need, but those details make them hard to use in our case.

Nice! Would it make sense to make our checkout hook not perform dvc checkout if it sees that it is a file that you are checking out?

I think we could do that, but then there is one problem left: what if checked out file was stage? In such case we should probably validate our stages against their outputs checksums.

So for now, only git push hook is the most straightforward that we could easily add, right?

I agree.

dvc config core.autoscm true -> would make dvc push automatically git push, dvc status automatically git status and so on?

I think that this idea is the best in terms of versatility. We tried the other way around (git hooks) and we already have problems with covering our commands with avialable hooks. I think that handling and debugginig this on dvc side with autoscm flag will be easier than handling issues about different OS-s that will come after introducing aliases.

@efiop

This comment has been minimized.

Copy link
Member

commented Apr 30, 2019

For the record: agreed privately that we would implement git push hook for now to provide at least some quick aid, but then we would think about autoscm and stuff like that.

@efiop

This comment has been minimized.

Copy link
Member

commented May 12, 2019

@AlJohri pre-push hook was released as a part of 0.40.1. Please feel free to give it a try 🙂

@efiop efiop added p3-nice-to-have and removed p1-important labels Jun 18, 2019

@AlJohri

This comment has been minimized.

Copy link
Contributor Author

commented Aug 25, 2019

I'm still interested in adding a git pull hook.

But that one is going to be called not only on git pull but also on regular amend and rebase. Not sure if it is something we actually want. What do you think?

I must agree thats a bit of a bummer. We could solve amend vs rebase problem by reading flag passed by git, but problem "Was it pull or rebase" will remain unsolved. Both hooks (post-rewrite, post-merge) contain functionality that we need, but those details make them hard to use in our case.

It seems like we can at least just implement it for the post-merge hook? For post-rewrite, I understand we can't easily identify between actual rebase vs pull --rebase but this only affects users who are running git pull --rebase, right?

I think the majority of users just use git pull without --rebase so can we can have it automatically dvc pull for at least these users?

@efiop efiop unassigned pared Aug 26, 2019

@efiop efiop removed this from In progress in Weekly tasks Aug 26, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.