Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce index #6300

Merged
merged 6 commits into from Aug 5, 2021
Merged

introduce index #6300

merged 6 commits into from Aug 5, 2021

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Jul 9, 2021

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

@skshetry skshetry requested review from efiop and pared July 9, 2021 16:25
@skshetry skshetry requested a review from a team as a code owner July 9, 2021 16:25
@efiop efiop changed the title [wip] introduce index [WIP] introduce index Jul 9, 2021
@pared
Copy link
Contributor

pared commented Jul 13, 2021

Related #6039

dvc/repo/index.py Outdated Show resolved Hide resolved
dvc/output.py Show resolved Hide resolved
dvc/repo/index.py Outdated Show resolved Hide resolved
dvc/repo/index.py Outdated Show resolved Hide resolved
Copy link
Member

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Let's clean it up, review and merge it πŸ™‚

@pared
Copy link
Contributor

pared commented Jul 19, 2021

I just noted that I didn't wrote: LGTM so far.

@skshetry skshetry self-assigned this Jul 20, 2021
@skshetry skshetry added this to In progress in DVC 13 July - 26 July 2021 via automation Jul 20, 2021
@skshetry skshetry added the refactoring Factoring and re-factoring label Jul 20, 2021
@skshetry skshetry moved this from In progress to Review in progress in DVC 13 July - 26 July 2021 Jul 20, 2021
@efiop efiop added this to In progress in DVC 27 Jul - 10 Aug via automation Jul 27, 2021
@efiop efiop moved this from Review in progress to Done in DVC 13 July - 26 July 2021 Jul 27, 2021
dvc/repo/index.py Outdated Show resolved Hide resolved
dvc/repo/index.py Outdated Show resolved Hide resolved
@skshetry skshetry force-pushed the introduce-index branch 2 times, most recently from 42ae5ac to cb5b342 Compare August 4, 2021 08:41
@skshetry skshetry requested review from efiop and pared August 4, 2021 13:21
@skshetry skshetry changed the title [WIP] introduce index introduce index Aug 4, 2021
@skshetry skshetry moved this from In progress to Review in progress in DVC 27 Jul - 10 Aug Aug 4, 2021
Comment on lines +265 to +285


if __name__ == "__main__":
from funcy import log_durations

from dvc.repo import Repo

repo = Repo()
index = Index(repo, repo.fs)
print(index)
with log_durations(print, "collecting stages"):
# pylint: disable=pointless-statement
print("no of stages", len(index.stages))
with log_durations(print, "building graph"):
index.build_graph()
with log_durations(print, "calculating hash"):
print(index.identifier)
with log_durations(print, "updating"):
index2 = index.update(index.stages)
with log_durations(print, "calculating hash"):
print(index2.identifier)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if __name__ == "__main__":
from funcy import log_durations
from dvc.repo import Repo
repo = Repo()
index = Index(repo, repo.fs)
print(index)
with log_durations(print, "collecting stages"):
# pylint: disable=pointless-statement
print("no of stages", len(index.stages))
with log_durations(print, "building graph"):
index.build_graph()
with log_durations(print, "calculating hash"):
print(index.identifier)
with log_durations(print, "updating"):
index2 = index.update(index.stages)
with log_durations(print, "calculating hash"):
print(index2.identifier)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll keep these as it is quite helpful in checking bugs/performance issues quickly, unless there is some strong reason against this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry This is not suitable for the core codebase, just create a test instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry Looks like we also still have these in parsing and in ui. They really should not be there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I cannot run test on a real project, with this, I can do that by just python dvc/repo/index.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, because it's not merged yet. Like I said above, it acts as a quick litmus test, especially for UI where I can quickly visually inspect if there's any issues.

The functionality makes sense to me - to quickly check if our subsystems work correctly. I only have a problem with code snippets scattered around the codebase, that no one even knows exist and could be ran. How about we create a command (dvc test? dvc doctor? dvc debug?), where we will collect all such self-checks, so that there is one place to test if dvc subsystems are operating correctly?

Copy link
Member Author

@skshetry skshetry Aug 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For having a central way of running all these, it requires certain pre-conditions/post-conditions (eg: this needs to always be run inside a DVC repo). And, having a central way is not the point of this, tests: end-to-end or unit is there for that.

but it still looks to me as a code snippet that someone forgot to remove.

It is very much intentional. They use it for showing example usages and demonstration, not to add quick visual inspection.

no one even knows exist and could be ran

It's okay to not know they exist, as they are very local to the subsystem. Whenever someone happens to work in this module, they'll take a note of if __name__ == "main". When refactoring, they'll be able to grep it or find usages of it to fix. And, as it's not critical to us, it's okay if it breaks occasionally. Though the parsing/UI has never been broken, they still work.

one place to test

The parsing one is a test and uses doctest, so we could run it together with the tests. But there is not much benefit on running others, as they serve different purpose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry Are there any other examples besides rich? I feel like it is a very specific library, I would love to see some other examples in serious projects (cli tools would be great). Just trying to understand when and how this is used.

CC @iterative/dvc , guys, wdyt about this practice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd completely agree with moving this to tests/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so looks like this was indeed a practice that wasn't aware of. Thanks to @isidentical and @skshetry for pointing out that this is used in things like cpython, doctests, rich. However, I wasn't able to find a single case in any of the projects that I've looked through (e.g. numpy, pyinstaller, duliwch, boto, moto, scipy, spacy, etc), so this still feels like a pretty cryptic practice to me, so I'm not personally 100% sold on it, it feels like these snippets belong somewhere else. But there doesn't seem to be much opposition to it and we already have a few of these in the codebase, so let's keep it for now, and get back to discussing this practice in general on retro or elsewhere.

@efiop efiop merged commit 6c33fbb into iterative:master Aug 5, 2021
DVC 27 Jul - 10 Aug automation moved this from Review in progress to Done Aug 5, 2021
@skshetry skshetry deleted the introduce-index branch August 5, 2021 11:25
@skshetry skshetry mentioned this pull request Aug 17, 2021
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Factoring and re-factoring
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants