Skip to content

exp run: data gets re-imported on every call #6490

@daavoo

Description

@daavoo

Bug Report

Description

When a pipeline uses an imported data file (with dvc import), the data gets cloned(?) and hashed every time dvc exp run is called.

Reproduce

  1. dvc import git@github.com:iterative/dataset-registry.git use-cases/cats-dogs
  2. dvc stage add -n foo -d cats-dogs echo foo
  3. dvc exp run

Expected

When using dvc repro the imported data doesn't get re-hashed. I would expect dvc exp run to behave the same.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.6.3 (pip)
---------------------------------
Platform: Python 3.9.6 on macOS-10.16-x86_64-i386-64bit
Supports:
        gdrive (pydrive2 = 1.9.1),
        http (requests = 2.26.0),
        https (requests = 2.26.0)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s1s1
Caches: local
Remotes: None
Workspace directory: apfs on /dev/disk1s1s1
Repo: dvc, git

Additional Information (if any):

$ dvc repro -v
2021-08-25 11:11:55,186 DEBUG: Computed stage: 'cats-dogs.dvc' md5: '5a135b297ee3c96465ce4b320f44fb8b'
'cats-dogs.dvc' didn't change, skipping
Stage 'foo' didn't change, skipping
Data and pipelines are up to date.
$ dvc exp run -v
2021-08-25 11:12:15,672 DEBUG: Detaching HEAD at 'HEAD'               
2021-08-25 11:12:15,690 DEBUG: Stashing workspace
2021-08-25 11:12:15,700 DEBUG: No changes to stash
2021-08-25 11:12:15,749 DEBUG: Creating external repo git@github.com:iterative/dataset-registry.git@ca140591a21c6d75a7057d1e2eb3f51d3115c5f5
2021-08-25 11:12:15,749 DEBUG: erepo: git clone 'git@github.com:iterative/dataset-registry.git' to a temporary dir
Computing file/dir hashes (only done once)  
. . .                                                                         

Metadata

Metadata

Assignees

Labels

A: experimentsRelated to dvc expp2-mediumMedium priority, should be done, but less importantresearch

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions