Skip to content

Conversation

@karajan1001
Copy link
Contributor

wait for iterative/scmrepo#145 to merge first.
fix: #8451

nearly five times faster in my local workspace for an exp show, no primary time costing method in exp show for now.

image

  1. Change the call of scm.describe from individual revision to a collection of them. Accelerate the run speed of exp show

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

@karajan1001 karajan1001 added performance improvement over resource / time consuming tasks A: experiments Related to dvc exp labels Oct 19, 2022
@karajan1001 karajan1001 requested a review from skshetry October 19, 2022 09:11
@karajan1001 karajan1001 self-assigned this Oct 19, 2022
@karajan1001 karajan1001 marked this pull request as draft October 19, 2022 09:11
@karajan1001 karajan1001 requested a review from pmrowla October 19, 2022 10:30
@karajan1001 karajan1001 changed the title [WIP] exp show: Use batch call on scm.describe exp show: Use batch call on scm.describe Oct 20, 2022
@karajan1001 karajan1001 marked this pull request as ready for review October 20, 2022 06:34
@karajan1001 karajan1001 removed the request for review from skshetry October 20, 2022 06:41
Comment on lines 153 to 167
rev_tag = repo.scm.describe(rev_set, base="refs/tags")
rev_head = repo.scm.describe(rev_set, base="refs/heads")
exact_name = repo.experiments.get_exact_name(rev_set)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rev_set should remove matches after each call (once we hit a tag for a revision, we no longer need to check branches, and so on) so that we can skip additional describe/get_exact_name calls if we already have matches for all of our revs

so maybe something like

names = {}
for base in ("refs/tags/", "refs/heads/"):
    if rev_set:
        names.update(
            (rev, ref[len(base):])
            for rev, ref in scm.describe(rev_set, base=base)).items()
            if ref is not None
        )
        rev_set.difference_update(names.keys())
# update names for exps
...

if rev == "workspace":
continue
if rev_tag[rev]:
name = rev_tag[rev].replace("refs/tags/", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is addressed in the other suggested change, but this should really be

name = rev_tag[rev][10:]  # 10 == len('refs/tags/')

replace() is slower and it would also remove ocurrences later in the tag. (technically you could have a ref that looks like refs/tags/my-tag-refs/tags where the tag is named my-tag-refs/tags)

@karajan1001 karajan1001 requested a review from pmrowla October 20, 2022 08:25
branch = self.repo.scm.describe(baseline, base="refs/heads")
branch = branches[tag]
if branch:
tag = branch.split("/")[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This split should be removed (same bug as #7933)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will make the output like:

refs/heads/main:
	exp-f88ff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will make the output like:

refs/heads/main:
	exp-f88ff

We should only keep / for names from tags or branches.

Copy link
Contributor

@pmrowla pmrowla Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you call describe to get the name you have to remove the refs/.../ prefix (using len(base)) and not rely on split.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output should be

main:
	exp-f88ff

Copy link
Contributor

@pmrowla pmrowla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than including the additional split bugfix

fix: iterative#8451

1. Change the call of scm.describe from individual revision to a
   collection of them. Accelerate the run speed of `exp show`
2. Bump scmrepo to 0.1.2
@karajan1001 karajan1001 enabled auto-merge (rebase) October 20, 2022 08:58
@karajan1001 karajan1001 merged commit d56376c into iterative:main Oct 20, 2022
@karajan1001 karajan1001 deleted the fix8451 branch October 20, 2022 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A: experiments Related to dvc exp performance improvement over resource / time consuming tasks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

exp show: dvc experiment show slow due to the duplicated call on scm.describe

2 participants