Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc exp run --queue: deleting file 'params.yaml' for each command call #7842

Closed
Tracked by #8972
alimagadovk opened this issue Jun 3, 2022 · 6 comments · Fixed by #9305
Closed
Tracked by #8972

dvc exp run --queue: deleting file 'params.yaml' for each command call #7842

alimagadovk opened this issue Jun 3, 2022 · 6 comments · Fixed by #9305
Assignees
Labels
A: experiments Related to dvc exp bug Did we break something? p1-important Important, aka current backlog of things to do

Comments

@alimagadovk
Copy link

Bug Report

dvc exp run --queue: deleting file 'params.yaml' for each command call

Description

dvc exp run --queue deletes file 'params.yaml' for each trial to add a new experiment and the experiment don't put into queue. It works correctly only if command git add params.yaml was called before.

Reproduce

Example files:
test.py

import numpy as np
import json
import yaml

params = yaml.safe_load(open('params.yaml'))["test"]

precision = np.random.random()
recall = params['value']
accuracy = np.random.random()

rows = {'precision': precision,
        'recall': recall,
        'accuracy': accuracy}

with open(params['metrics_path'], 'w') as outfile:
    json.dump(rows, outfile)

fpr = 10*np.random.random((1,10)).tolist()
tpr = 10*np.random.random((1,10)).tolist()

with open('plot.json', 'w') as outfile2:
    json.dump(
      {
        "roc": [ {"fpr": f, "tpr": t} for f, t in zip(fpr, tpr) ]
      }, 
      outfile2
      )

params.yaml

test:
  metrics_path: scores.json
  value: 2

dvc.yaml

stages:
  test:
    cmd: python test.py
    deps:
    - test.py
    params:
    - test.metrics_path
    - test.value
    metrics:
    - scores.json:
        cache: false
    plots:
    - plot.json:
        cache: false
        x: fpr
        y: tpr
  1. dvc exp run --queue -S test.value=101

Expected

New experiments adds to queue without 'params.yaml' deleting.

Environment information

Ubuntu 20.04

Output of dvc doctor:

$ dvc doctor

DVC version: 2.10.2 (pip)
---------------------------------
Platform: Python 3.9.7 on Linux-5.13.0-40-generic-x86_64-with-glibc2.31
Supports:
        webhdfs (fsspec = 2022.3.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p6
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/nvme0n1p6
Repo: dvc, git

Additional Information (if any):

@pmrowla
Copy link
Contributor

pmrowla commented Jun 3, 2022

probably related to #6930 (untracked files being removed on post-run apply)

@karajan1001 karajan1001 added awaiting response we are waiting for your reply, please respond! :) A: experiments Related to dvc exp labels Jun 4, 2022
@karajan1001 karajan1001 removed the awaiting response we are waiting for your reply, please respond! :) label Jun 12, 2022
@daavoo daavoo added the bug label Aug 2, 2022
@daavoo
Copy link
Contributor

daavoo commented Aug 2, 2022

Only occurs if --set-param modifies an untracked params file and --temp or --queue flag is passed.

The untracked params file gets added scm.add on _update_params and stashed to stash_rev.

When using WorkspaceExecutor, init_git applies a merge on the current git repo, "restoring" the params file in the user workspace.

When using TmpDirExecutor, the merge is applied on the temp dir git repo, so the params file is never "restored" in the user workspace.

@dberenbaum
Copy link
Contributor

It seems like the behavior is expected then. However, maybe we can clarify in the docs that experiment files should be tracked by either Git or DVC?

@behrica
Copy link

behrica commented Sep 8, 2022

I stumbled over this today and found it at least "unexpected" and did not know how to fix it, until I came here.

@dberenbaum
Copy link
Contributor

Yes, I think I was incorrect to say it is expected behavior. In fact, I ran into it myself and was confused recently (see #8256). Should we be applying the merge on top of both the tmp dir and the workspace @pmrowla?

@pmrowla
Copy link
Contributor

pmrowla commented Sep 9, 2022

Applying the merge on the workspace isn't the correct behavior, that would put -S modified params into the workspace and not the original params file.

I think the fix for this should just be to use stash_workspace(include_untracked=True) here

with self.scm.stash_workspace() as workspace:

When we leave that context it is supposed to restore the workspace to the state before any DVC exp related changes/operations were done. include_untracked was omitted because we aren't supposed to be touching untracked files at all in exps, but the bug is we actually can modify untracked files in the params file case.

@omesser omesser added bug Did we break something? and removed bug labels Dec 14, 2022
@daavoo daavoo self-assigned this Mar 13, 2023
@daavoo daavoo linked a pull request Mar 13, 2023 that will close this issue
@dberenbaum dberenbaum added the p1-important Important, aka current backlog of things to do label Mar 13, 2023
daavoo added a commit that referenced this issue Apr 4, 2023
Only if they are currently untracked.

Closes #7842
daavoo added a commit that referenced this issue Apr 4, 2023
Only if they are currently untracked and `--temp` or `--queue` is passed.

Closes #7842
daavoo added a commit that referenced this issue Apr 5, 2023
Only if they are currently untracked and `--temp` or `--queue` is passed.

Closes #7842
daavoo added a commit that referenced this issue Apr 5, 2023
Only if they are currently untracked and `--temp` or `--queue` is passed.

Closes #7842
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp bug Did we break something? p1-important Important, aka current backlog of things to do
Projects
No open projects
Archived in project
7 participants