Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stage: fix used_cache warning on no checksum #3365

Merged
merged 1 commit into from
Feb 21, 2020

Conversation

pared
Copy link
Contributor

@pared pared commented Feb 19, 2020

  • ❗ Have you followed the guidelines in the Contributing to DVC list?

  • πŸ“– Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.

  • ❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addressed. Please review them carefully and fix those that actually improve code or fix bugs.

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

# TODO it will always call get, need to write it somewhere
checksum = self.info.get(self.remote.PARAM_CHECKSUM)
if not checksum:
checksum = self.remote.get_checksum(self.path_info)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This property should only return what is already computed, it shouldn't start computing it all over again.

See dvc/outpt/base.py:get_used_cache, where the issue rose. We check if not self.info there, while we should probably check if not self.checksum. Push shouldn't push uncommited changes, as it doesn't even have the cache for them in .dvc/cache.

@pared pared force-pushed the 3359_md5_problem branch 3 times, most recently from 90cebdc to e4b4506 Compare February 20, 2020 11:18
@pared pared changed the title [WIP] 3359 md5 problem Stage: fix used_cache warning on no checksum Feb 20, 2020
@pared pared marked this pull request as ready for review February 20, 2020 11:24
@pared pared requested a review from efiop February 20, 2020 11:24
dvc/output/base.py Outdated Show resolved Hide resolved
"\n"
"You can also use `dvc commit {stage}` to associate "
"existing '{out}' with `{stage}`.".format(
out=self, stage=self.stage.relpath
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
out=self, stage=self.stage.relpath
out=self, stage=self.stage

same as above πŸ™‚

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK it will result in dvc commit 'Stage file.dvc' ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pared Oh, but then the upper warning is broken too, right?

Copy link
Contributor Author

@pared pared Feb 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop
I wouldn't say so: the upper message is pointing user to which stage particular output is associated to, while the lower one suggests command that can fix your situation. So in the upper one, its not crucial for stage to be printed just as relpath.

Comment on lines 186 to 209
def test_warn_on_no_out_checksum(tmp_dir, dvc, tmp_path_factory, caplog):
tmp_dir.dvc_gen("file", "file content")

local_remote = tmp_path_factory.mktemp("local_remote")
with dvc.config.edit() as conf:
from dvc.utils import fspath

conf["remote"]["local_remote"] = {"url": fspath(local_remote)}
conf["core"]["remote"] = "local_remote"

stage_content = load_stage_file("file.dvc")
first(stage_content["outs"])["md5"] = None
dump_stage_file("file.dvc", stage_content)

with caplog.at_level(logging.WARNING, logger="dvc"):
dvc.push()

assert first(caplog.messages) == (
"Output 'file'(Stage: 'file.dvc') is missing version info. "
"Cache for it will not be collected. "
"Use `dvc repro` to get your pipeline up to date.\n"
"You can also use `dvc commit file.dvc` to associate existing 'file' "
"with `file.dvc`."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's narrow it down to get_used_cache unit test for output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants