Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCM: allow multiple DVC repos inside single SCM repo #3257

Merged
merged 1 commit into from
Feb 25, 2020

Conversation

pared
Copy link
Contributor

@pared pared commented Jan 30, 2020

  • ❗ Have you followed the guidelines in the Contributing to DVC list?

  • πŸ“– Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.

  • ❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addressed. Please review them carefully and fix those that actually improve code or fix bugs.

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Fixes #2349

dvc/config.py Outdated Show resolved Hide resolved
@pared
Copy link
Contributor Author

pared commented Feb 3, 2020

@dmpetrov I noticed that in #2706 you mentioned that managing subrepos from "master" repo might be a viable use case. In this issue, I introduced DvcIgnore that would ignore subrepos when working from master repo. Do you think we should allow dvc pull and other "data sync" commands to work for subrepos?

@pared pared changed the title [WIP] SCM: allow multiple DVC repos inside single SCM repo SCM: allow multiple DVC repos inside single SCM repo Feb 4, 2020
@pared pared marked this pull request as ready for review February 4, 2020 11:43
dvc/command/init.py Outdated Show resolved Hide resolved
dvc/repo/__init__.py Outdated Show resolved Hide resolved
dvc/repo/init.py Outdated Show resolved Hide resolved
dvc/repo/init.py Outdated Show resolved Hide resolved
tests/func/test_add.py Outdated Show resolved Hide resolved
dvc/ignore.py Show resolved Hide resolved
dvc/scm/__init__.py Outdated Show resolved Hide resolved
@skshetry
Copy link
Member

skshetry commented Feb 5, 2020

get and import won't work for the dvc subrepo, right? (as it's Git-repo based, not dvc-repo based, see: #3277).

Copy link
Member

@dmpetrov dmpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested it a bit - looks good to me.
A couple comments are inline. The major one - if we need "subrepo" in config.

dvc/command/init.py Outdated Show resolved Hide resolved
dvc/config.py Outdated Show resolved Hide resolved
@dmpetrov
Copy link
Member

dmpetrov commented Feb 10, 2020

Do you think we should allow dvc pull and other "data sync" commands to work for subrepos?

@pared No, commands should not go to nested repos.

PS: we can even think about not supporting nested repos if it simplifies implementation or docs.

dvc/repo/__init__.py Outdated Show resolved Hide resolved
@pared
Copy link
Contributor Author

pared commented Feb 12, 2020

@dmpetrov ok, so in the end some kind of flag will have to be introduced to config (take note of this discussion: #3257 (comment))

I think we should go with the second option I mentioned there, which is introducing no-scm config option when --no-scm is used. That will break backward compatibility for existing no-scm repos
but the fix will be relatively easy (writing no-scm=True in config). Basing on user activity on discord and in issues, I would not expect too much of such cases.
Having subrepo as default behavior makes more sense to me, since this subject is recurring.

@efiop
Copy link
Contributor

efiop commented Feb 14, 2020

@pared Also something to note about no-scm, is that it will require changing our tests, because base dvc fixture doesn't have git in it by default, it is init-ed with no_scm=True and then when git is applied on top.

@pared pared force-pushed the 2349_many_dvc_roots branch 2 times, most recently from f3e99bc to 0928109 Compare February 21, 2020 12:19
dvc/scm/__init__.py Outdated Show resolved Hide resolved
dvc/config.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good from my end. Just a question above ^ Thanks

@efiop
Copy link
Contributor

efiop commented Feb 24, 2020

@pared Looks good! How about docs though? Could you please create a doc PR before we merge this?

dvc/repo/__init__.py Outdated Show resolved Hide resolved
dvc/repo/__init__.py Outdated Show resolved Hide resolved
dvc/repo/__init__.py Outdated Show resolved Hide resolved
@efiop efiop merged commit 8d04d6d into iterative:master Feb 25, 2020
@pared
Copy link
Contributor Author

pared commented Feb 25, 2020

PR for docs: iterative/dvc.org#1022

class DvcIgnoreFilter(object):
def __init__(self, tree):
self.tree = tree
self.ignores = {DvcIgnoreDirs([".git", ".hg", ".dvc"])}
self.ignores = {
DvcIgnoreDirs([".git", ".hg", ".dvc"]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.hg ? Do we support Mercurial somehow now??? Or is this legacy stuff? Thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to traverse through .git or .hg folders. People might use hg for code versioning and use --no-scm with DVC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... Why only do this for .hg? There's many many possible files or dirs the user may want to gitignore and they can just add it to .gitignore/ or .git/info/exclude

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgeorpinel add .hg to .gitignore? Maybe I'm missing something.

An alternative would be to add .hg to .dvcignore file, yes, but since we know about .hg, we are skipping it automatically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add .hg to .gitignore?

Haha sorry I wasn't thinking right πŸ˜‹

to add .hg to .dvcignore

Yeah, this! Since we don't support Mercurial in general why treat it specially? I'd let the user handle it, like he could want dvcignore any other files or directories for a bunch of reasons.

Copy link
Contributor

@jorgeorpinel jorgeorpinel Mar 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DVC has --no-scm, so they could be using any version control out there

I don't think this is the case @skshetry. It's just kind of a legacy option name AFAIK. Refer to #2901.

Do people really use .hg folders for their work? Unless they do (which they shouldn't)

You'd be surprised.
Why shouldn't they? Will the directory name police pul them over? πŸ˜›

I realized that there might be indeed a stronger motivation to include more defaults

@shcheklein I don't even see this. DVC ignores .git/ automatically because it integrates with Git. I don't see a reason to treat any other file/dir name specially.
Your questions about .gitignore vs. .dvcignore are interesting though; I think there may be some issues already about that.


In summary, I vote to remove the 7 characters we're talking about, but I realize this is probably a moot point since no one even uses Mercurial anymore haha.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgeorpinel, I think something good thing to do is in middle. If people have a certain directories such as __pycache__ and venv or similar kinds of directories, it'd be good to make them aware that .dvcignore exists, at least a warning that traversing will be slow.

Copy link
Contributor

@jorgeorpinel jorgeorpinel Mar 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree about improving .dvcignore awareness. (I think the warning may exist BTW?)

For docs, I opened iterative/dvc.org/issues/1033.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least a warning that traversing will be slow.

it's a really good idea. Please create an issue for this in the DVC core repo.

I still don't see a reason to include .hg though. To my mind if we follow this logic it should be done properly like you described for example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #3436

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for multiple .dvc roots in a single git repo
6 participants