Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more git hooks (dvc pull/push/status) #1551

Closed
AlJohri opened this issue Jan 30, 2019 · 13 comments
Closed

more git hooks (dvc pull/push/status) #1551

AlJohri opened this issue Jan 30, 2019 · 13 comments
Labels
enhancement Enhances DVC feature request Requesting a new feature good first issue p3-nice-to-have It should be done this or next sprint

Comments

@AlJohri
Copy link

AlJohri commented Jan 30, 2019

can we add more git hooks for dvc pull/push/status when git pull/push/status are run? ideally, most git commands would have a hook. this will make it much less likely for someone to forget to dvc push after running a pipeline

@shcheklein
Copy link
Member

A new reason to prioritize this https://discordapp.com/channels/485586884165107732/485586884165107734/561380967311212554

@shcheklein
Copy link
Member

Another question:

is there anyway to have git hooks automatically install on pull or clone?
just thinking of first time developer experience, right now its quite a few steps

@efiop efiop added the p1-important Important, aka current backlog of things to do label Mar 30, 2019
@efiop
Copy link
Member

efiop commented Mar 30, 2019

Good point @shcheklein ! Added p1 priority, we'll try to take a look at it next week.

@pared
Copy link
Contributor

pared commented Apr 26, 2019

After some quick research I have following conclusions:

  1. git pull - we can use post-merge hook to do dvc pull
  2. git push - I would like to discuss this one. It seems to me that we want to execute dvc push after successfull git push. However, all "post push hooks" (pre-receive, update, post-receive) are executed on push-side repository. We could use pre-push but with that we have to consider that dvc might take some time to push everything to cache, and then, in the end, git fails for some reason. It would be some inconvenience for the user. Though it does not seem as some repository breaking problem, as simple git reset and dvc pull should bring repository to original state.
  3. git status - it seems that we will be unable to implement this one, none of hooks is triggered upon status
  4. git clone - we can use post-checkout hook to trigger dvc checkout
  5. git checkout - i would consider creating issue for that one, because we can checkout whole branch or file, and in case of file we would need some logic to handle only checked out file, if git checks out stage file

@efiop
Copy link
Member

efiop commented Apr 27, 2019

@pared

  1. git pull - we can use post-merge hook to do dvc pull

But not every pull involves merge. It might do rebase instead.

  1. git push - I would like to discuss this one. It seems to me that we want to execute dvc push after successfull git push. However, all "post push hooks" (pre-receive, update, post-receive) are executed on push-side repository. We could use pre-push but with that we have to consider that dvc might take some time to push everything to cache, and then, in the end, git fails for some reason. It would be some inconvenience for the user. Though it does not seem as some repository breaking problem, as simple git reset and dvc pull should bring repository to original state.

pre-push should be good enough 👍

  1. git clone - we can use post-checkout hook to trigger dvc checkout
  2. git checkout - i would consider creating issue for that one, because we can checkout whole branch or file, and in case of file we would need some logic to handle only checked out file, if git checks out stage file

We already have post-checkot hook that simply runs dvc checkout(without arguments). Would it be possible to make it handle specific stage files as well?

@pared
Copy link
Contributor

pared commented Apr 29, 2019

@efiop

But not every pull involves merge. It might do rebase instead.

Well we can use post-rewrite, according to the documentation it is triggered after amend or rebase, but git is supposed to pass argument indicating whether it is amend or rebase, so we could dvc pull upon rebase only.

Would it be possible to make it handle specific stage files as well?

According to documentation, git is supposed to pass flag whether it was branch or file, but it does not explicitly point which file. So i have been too fast with saying that it is possible. To use that information we would need mechanism to find checked out files. I think it is better to keep dvc checkout as it is.

@efiop
Copy link
Member

efiop commented Apr 29, 2019

@pared

Well we can use post-rewrite, according to the documentation it is triggered after amend or rebase, but git is supposed to pass argument indicating whether it is amend or rebase, so we could dvc pull upon rebase only.

But that one is going to be called not only on git pull but also on regular amend and rebase. Not sure if it is something we actually want. What do you think?

According to documentation, git is supposed to pass flag whether it was branch or file, but it does not explicitly point which file. So i have been too fast with saying that it is possible. To use that information we would need mechanism to find checked out files. I think it is better to keep dvc checkout as it is.

Nice! Would it make sense to make our checkout hook not perform dvc checkout if it sees that it is a file that you are checking out? I imagine you git checkout-ing specific file and then our dvc checkout(without args) gets triggered and even though it has confirmation prompts, you might accidentally checkout some file that you didn't want.

So for now, only git push hook is the most straightforward that we could easily add, right?

Looks like even though some git hooks do suffice for some of our purposes, the support would still be far from perfect. Maybe having simple aliases like git push -> git push + dvc push would feel better? I don't quite like aliases on dvc's side(e.g. dvc push, or some other command, that would call git push) because they would bring a lot of hustle caused by passing through git options. On the other hand, things like aliases and/or git plugins would have to be handled and installed explicitly by a user, so it would add even more complication. I've been opposing this a lot since the early days, but considering everything (including requests for --autogitadd/commit feature), maybe dvc indeed should handle those. E.g.
dvc config core.autoscm true -> would make dvc push automatically git push, dvc status automatically git status and so on? What do you guys think?

@pared
Copy link
Contributor

pared commented Apr 29, 2019

But that one is going to be called not only on git pull but also on regular amend and rebase. Not sure if it is something we actually want. What do you think?

I must agree thats a bit of a bummer. We could solve amend vs rebase problem by reading flag passed by git, but problem "Was it pull or rebase" will remain unsolved. Both hooks (post-rewrite, post-merge) contain functionality that we need, but those details make them hard to use in our case.

Nice! Would it make sense to make our checkout hook not perform dvc checkout if it sees that it is a file that you are checking out?

I think we could do that, but then there is one problem left: what if checked out file was stage? In such case we should probably validate our stages against their outputs checksums.

So for now, only git push hook is the most straightforward that we could easily add, right?

I agree.

dvc config core.autoscm true -> would make dvc push automatically git push, dvc status automatically git status and so on?

I think that this idea is the best in terms of versatility. We tried the other way around (git hooks) and we already have problems with covering our commands with avialable hooks. I think that handling and debugginig this on dvc side with autoscm flag will be easier than handling issues about different OS-s that will come after introducing aliases.

@efiop
Copy link
Member

efiop commented Apr 30, 2019

For the record: agreed privately that we would implement git push hook for now to provide at least some quick aid, but then we would think about autoscm and stuff like that.

@efiop
Copy link
Member

efiop commented May 12, 2019

@AlJohri pre-push hook was released as a part of 0.40.1. Please feel free to give it a try 🙂

@efiop efiop added p3-nice-to-have It should be done this or next sprint and removed p1-important Important, aka current backlog of things to do labels Jun 18, 2019
@AlJohri
Copy link
Author

AlJohri commented Aug 25, 2019

I'm still interested in adding a git pull hook.

But that one is going to be called not only on git pull but also on regular amend and rebase. Not sure if it is something we actually want. What do you think?

I must agree thats a bit of a bummer. We could solve amend vs rebase problem by reading flag passed by git, but problem "Was it pull or rebase" will remain unsolved. Both hooks (post-rewrite, post-merge) contain functionality that we need, but those details make them hard to use in our case.

It seems like we can at least just implement it for the post-merge hook? For post-rewrite, I understand we can't easily identify between actual rebase vs pull --rebase but this only affects users who are running git pull --rebase, right?

I think the majority of users just use git pull without --rebase so can we can have it automatically dvc pull for at least these users?

@efiop efiop unassigned pared Aug 26, 2019
@efiop efiop added research p2-medium Medium priority, should be done, but less important and removed c8-full-day p3-nice-to-have It should be done this or next sprint labels Aug 26, 2019
@amin-nejad
Copy link

+1 for adding a git pull hook. (Just commenting to show there's still interest in this)

@casperdcl
Copy link
Contributor

Related: fds https://dagshub.com/blog/fds-fast-data-science-with-git-and-dvc/

@daavoo daavoo removed the research label Aug 3, 2022
@dberenbaum dberenbaum added p3-nice-to-have It should be done this or next sprint and removed p2-medium Medium priority, should be done, but less important labels Sep 27, 2023
@mattseddon mattseddon closed this as not planned Won't fix, can't repro, duplicate, stale Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC feature request Requesting a new feature good first issue p3-nice-to-have It should be done this or next sprint
Projects
None yet
Development

No branches or pull requests

9 participants