Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp list: does not show new experiments #9260

Open
mstrupp opened this issue Mar 28, 2023 · 6 comments
Open

exp list: does not show new experiments #9260

mstrupp opened this issue Mar 28, 2023 · 6 comments
Labels
A: experiments Related to dvc exp bug Did we break something? p2-medium Medium priority, should be done, but less important

Comments

@mstrupp
Copy link

mstrupp commented Mar 28, 2023

Bug Report

Description

When the terminal is killed while dvc exp run is executing, the ref .git/refs/exps/exec/EXEC_BASELINE is not removed.
Then when a git commit is made, git might pack the references to optimize performance.
Now, dvc exp list is stuck with the list of experiments before the commit and will not update when new experiments are run.

This also affects the experiments table in the vscode extension.

Reproduce

  1. git init
  2. dvc init
  3. dvc stage add -n prepare -d prepare.py python prepare.py
  4. create file prepare.py and write a program that takes some time (e.g. time.sleep(10))
  5. git add .
  6. git commit -m "commit 1"
  7. dvc exp run
  8. while running: Kill the terminal (not via ctrl+c but by closing the terminal)
  9. edit prepare.py (to make dvc exp run execute the pipeline again)
  10. git add .
  11. git commit -m "commit 2"
  12. git pack-refs --all: when committing, git sometimes does "git pack-refs" for optimization. It can happen right here. To simulate the automatic packing, run git pack-refs --all
  13. dvc exp run
  14. dvc exp list

Expected

dvc exp list should show the experiment from 13. Instead, it returns nothing.
It only shows the experiment with dvc exp list -A

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.38.1 (exe)
---------------------------------
Platform: Python 3.10.9 on Windows-10-10.0.19045-SP0
Subprojects:

Supports:
        azure (adlfs = 2022.11.2, knack = 0.10.1, azure-identity = 1.12.0),
        gdrive (pydrive2 = 1.15.0),
        gs (gcsfs = 2022.11.0),
        hdfs (fsspec = 2022.11.0, pyarrow = 10.0.1),
        http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
        oss (ossfs = 2023.1.0),
        s3 (s3fs = 2022.11.0, boto3 = 1.24.59),
        ssh (sshfs = 2022.6.0),
        webdav (webdav4 = 0.9.8),
        webdavs (webdav4 = 0.9.8),
        webhdfs (fsspec = 2022.11.0)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: None
Workspace directory: NTFS on C:\
Repo: dvc, git
@daavoo
Copy link
Contributor

daavoo commented Mar 28, 2023

Hi @mstrupp , could you try upgrading to the latest DVC version?

@daavoo daavoo added the awaiting response we are waiting for your reply, please respond! :) label Mar 28, 2023
@mstrupp
Copy link
Author

mstrupp commented Mar 28, 2023

Hi @daavoo, thank you for the response. I upgraded dvc but the problem still exists.

$ dvc doctor
DVC version: 2.51.0 (pip)
-------------------------
Platform: Python 3.10.8 on Windows-10-10.0.19045-SP0
Subprojects:
        dvc_data = 0.44.1
        dvc_objects = 0.21.1
        dvc_render = 0.3.1
        dvc_task = 0.2.0
        scmrepo = 0.1.17
Supports:
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: None
Workspace directory: NTFS on C:\
Repo: dvc, git
Repo.site_cache_dir: C:\ProgramData\iterative\dvc\Cache\repo\5db899e06b13bbca5a630f6ac0c2cbfd

@daavoo daavoo added A: experiments Related to dvc exp research and removed awaiting response we are waiting for your reply, please respond! :) labels Mar 28, 2023
@pmrowla
Copy link
Contributor

pmrowla commented Apr 4, 2023

The workaround here would be to remove the exec ref with

git update-ref -d refs/exps/exec/EXEC_BASELINE

The issue is that we have logic to account for when HEAD has moved during experiment execution, where exp show will then show experiments derived from EXEC_BASELINE instead of HEAD. We could consider updating the logic to check and see if there is also an active workspace run (and cleanup the ref when there is not), but this would also introduced additional overhead into every dvc command that uses resolve_rev.

@dberenbaum
Copy link
Contributor

@pmrowla Is it needed for anything besides exp list and exp show? Can we do it only in those commands?

@daavoo daavoo added bug Did we break something? and removed research labels Apr 4, 2023
@pmrowla
Copy link
Contributor

pmrowla commented Apr 5, 2023

@dberenbaum it's needed for every DVC command that has any kind of parameter that can be set to (or defaults to) HEAD (so any diff/show command)

Should also note that if we drop checkpoints support we could also consider just dropping this behavior as well. HEAD is still moved for regular experiments but we restore it shortly afterwards when the experiment run ends. The main issue here is that for checkpoints, HEAD is moved to the most recently generated checkpoint commit. (We may not actually be able to drop this entirely though since tools like vscode could still try to run DVC commands before HEAD is restored at the end of a regular exp run)

@mstrupp
Copy link
Author

mstrupp commented Apr 14, 2023

Thanks for the suggested workaround @pmrowla.

Unfortunaly, the user doesn't realize when the problem occurs and the workaround should be applied. DVC happily shows the experiments before EXEC_BASELINE. The user expects to see the new experiments but never realizes why they are not shown.

@dberenbaum dberenbaum added the p2-medium Medium priority, should be done, but less important label Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp bug Did we break something? p2-medium Medium priority, should be done, but less important
Projects
None yet
Development

No branches or pull requests

4 participants