Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc exp run: Unnecessary "Collecting and computing hashes" after chaging cmd in dvc.yaml #10305

Open
aljeshishe opened this issue Feb 16, 2024 · 0 comments

Comments

@aljeshishe
Copy link

Bug Report

Description

Unnecessary "Collecting and computing hashes" after changing cmd in dvc.yaml

Reproduce

mkdir repro 
cd repro
git init 
dvc init
mkdir 1
mkdir 2
cd 1
dd if=/dev/random of=./data  bs=10GB count=1
dvc add data
cd ../2
dd if=/dev/random of=./data  bs=10GB count=1
dvc add data 
cd ..
git add .dvc
git commit -m init

cat <<EOF > dvc.yaml
stages:
    target:
        cmd:
            - sleep 1
        deps:
            - 1/data

EOF
dvc exp run 

cat <<EOF > dvc.yaml
stages:
    target:
        cmd:
            - sleep 1
        deps:
            - 2/data

EOF
dvc exp run 

Expected

I expect that hash is already calculated for all files and there is no Collecting and computing hashes stage

Environment information

Output of dvc doctor:

dvc doctor
DVC version: 3.44.0 (pip)
-------------------------
Platform: Python 3.10.12 on Linux-5.15.0-1053-aws-x86_64-with-glibc2.31
Subprojects:
	dvc_data = 3.11.0
	dvc_objects = 5.0.0
	dvc_render = 1.0.1
	dvc_task = 0.3.0
	scmrepo = 3.1.0
Supports:
	http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3)
Config:
	Global: /home/ubuntu/.config/dvc
	System: /etc/xdg/dvc

Additional Information (if any):
Some debugging showed following:

  1. 2/data could not be located in dvc.lock/deps in StageLoader.fill_from_lock on 2nd dvc exp run
  2. _compute_hash_info_from_meta(None) is called
  3. _compute_hash_info_from_meta returns md5-dos2unix
  4. That is why dvc doesn't use precomputed hash in following code:
def hash_file(
    path: "AnyFSPath",
    fs: "FileSystem",
    name: str,
    state: Optional["StateBase"] = None,
    callback: Optional["Callback"] = None,
    info: Optional[dict] = None,
) -> Tuple["Meta", "HashInfo"]:
    if state:
        meta, hash_info = state.get(path, fs, info=info)
        if hash_info and hash_info.name == name:
            return meta, hash_info
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant