Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brancher refactor #1709

Merged
merged 8 commits into from
Mar 21, 2019
Merged

Brancher refactor #1709

merged 8 commits into from
Mar 21, 2019

Conversation

ei-grad
Copy link
Contributor

@ei-grad ei-grad commented Mar 11, 2019

This pull-request implements the access to Stage objects via the Git interface to give the possibility to work with them directly without executing the git checkout.

Fixes #1688, fixes #1552 and fixes #1009.

Todo:

  • Resolve XXX: comments
  • In metrics.show move except Exception back
  • Use oserr.CODEs in raised IOError's
  • Pass branch to Stage.load in collect, write the test for it
  • Clarify the dvc/stage.py:495 comment about checks priority
  • Relative path in dvc/repo/metrics/show.py:135 logging
  • Remove the checkout argument in brancher()
  • Should the Stage.is_stage_file(fname) be checked for Git?
  • Fix tests.test_install.TestInstall on py27
  • Rebase

@ei-grad ei-grad force-pushed the brancher-refactor branch 5 times, most recently from 1e12751 to 986e72a Compare March 12, 2019 11:03
@ei-grad ei-grad changed the title [WIP] Brancher refactor Brancher refactor Mar 12, 2019
dvc/repo/metrics/show.py Outdated Show resolved Hide resolved
@ei-grad ei-grad force-pushed the brancher-refactor branch 2 times, most recently from 955f6aa to fb293c4 Compare March 12, 2019 11:39
dvc/scm/git.py Outdated Show resolved Hide resolved
dvc/repo/__init__.py Outdated Show resolved Hide resolved
dvc/repo/__init__.py Outdated Show resolved Hide resolved
dvc/repo/checkout.py Outdated Show resolved Hide resolved
dvc/stage.py Outdated Show resolved Hide resolved
Copy link
Member

@dmpetrov dmpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great change 🎉
Looks good to me but please wait for @efiop feedback.

dvc/repo/__init__.py Outdated Show resolved Hide resolved
dvc/repo/metrics/show.py Outdated Show resolved Hide resolved
dvc/repo/metrics/show.py Outdated Show resolved Hide resolved
dvc/stage.py Outdated Show resolved Hide resolved
dvc/stage.py Outdated Show resolved Hide resolved
dvc/repo/__init__.py Outdated Show resolved Hide resolved
dvc/scm/git.py Outdated Show resolved Hide resolved
dvc/scm/git.py Outdated Show resolved Hide resolved
Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! 👍 Put a few minor comments and questions. First two are important, last two are less important and simple:

  1. Check the metrics behavior change (see the comment). This is a pretty big one as far as I understand.
  2. May be I'm missing something, but I don't see that the brancher includes "working tree" implementation. It should include it, right? If we want to read the current workspace content.
  3. I would move brancher into a separate file, close to repo itself. It operates with repo, it's not a simple "util". It probably should be part of the repo since it actively changes its state.
  4. The same for tree - I think they deserve to be split into two files and moved to the relevant locations.

Overall, good progress! I like that we almost don't touch any external code, just introducing a layer of abstraction. Really good stuff.

@ei-grad
Copy link
Contributor Author

ei-grad commented Mar 21, 2019

oops, got some unrelated commits...

@ei-grad ei-grad force-pushed the brancher-refactor branch 2 times, most recently from 9cf43e1 to 7d7ed42 Compare March 21, 2019 03:30
Copy link
Member

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outstanding work! 🎉 A few small comments down below:

dvc/utils/fs.py Outdated Show resolved Hide resolved
dvc/scm/git.py Outdated Show resolved Hide resolved
@ei-grad ei-grad force-pushed the brancher-refactor branch 2 times, most recently from 253a302 to ed56623 Compare March 21, 2019 15:20
from dvc.repo.tree import WorkingTree

self.tree = WorkingTree()
yield "Working Tree"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks tests that expect previous behaviour of yielding "" if it is not from any branch. Could we continue doing that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Not this one actually, but the same thing in line 16. I did run the tests many times, but not after the line 16 change which I added at the last moment :-/.

I think it is better to clarify in the docstring that what the brancher yeilds is not a branch names (instead it is just the abstract name of the currently selected tree), and adjust the tests for this behavior. What do you think? @efiop @shcheklein

Copy link
Contributor Author

@ei-grad ei-grad Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIR it is currently used only to display the name for metrics in different branches in dvc metrics show -a output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... and in dvc push output, where it is reduntant for sure...

Copy link
Contributor Author

@ei-grad ei-grad Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think it is still ok to leave the "Working Tree" string for iteration with --all-branches, but yield an empty string when there is no things to iterate over.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ei-grad "index" is even more confusing IMHO :) Empty string feels nice, but maybe I'm just too used to it 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the docstring to about what the brancher could yeild.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, this is about printing a tree name when we do some output, right? First, I would make a tree itself return the name, no need then to yield anything and pass it around, we will be able to access it from the repo, the same way we access the current object. Is it possible? Second, in the FsTree we can return "Working Tree" or "*" or "workspace", or whatever :) Btw, i'm more or less fine with the current solution as well, it's not ideal but it's not hard to fix anyway. As for the empty string. @efiop does it mean we will have to do something like if not branch: name "Working Tree" further along, in multiple place? I would try to avoid that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shcheklein No, I don't see why we even need "Working Tree" anywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dvc metrics show -a when we print output we need a way to distinguish "workspace" from an actual branch. May be some other places, @ei-grad should know better than me by now :)

Abstract access to filesystem and its implementation to access files and
particulary Stage objects in different git branches without `git
checkout` execution.
Uncommited files wouldn't be accessible as git objects.

It needs `git clean -fd` to amend "Working Tree" metrics collection
which took its part when the git tree is dirty.
readlines() returns entries with \n on the end
Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor issue with a branch name to yield. Nice to fix. Otherwise it looks great!!!

Copy link
Member

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants