Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Flag commit hash as "dirty" in UI if git repo has local changes #1912

Closed
krasch opened this issue Oct 8, 2019 · 4 comments
Closed

[FR] Flag commit hash as "dirty" in UI if git repo has local changes #1912

krasch opened this issue Oct 8, 2019 · 4 comments
Labels
area/tracking Tracking service, tracking client APIs, autologging enhancement New feature or request priority/backlog We believe it is useful, but don’t see it being prioritized in the next few months.

Comments

@krasch
Copy link

krasch commented Oct 8, 2019

Describe the proposal

Add a flag in UI (and database, if necessary) that identifies a commit hash as "dirty" if the git repository has local changes.

Motivation

Version: 1.3.0 (installed via pip)

Steps:

  1. have a clean git repository, no un-commited local changes
  2. mlflow run .
  3. change a file, do not commit
  4. mlflow run .

In this situation, the first run is reproducible, the second one is not. In the UI both runs are displayed with the same (i.e. the most recent) commit hash. This gives the impression that both experiments where run with exactly the same code, which is not true. As far as I could find, there seems to be no way to identify the "dirty" second run. Please let me know if I missed something.

Proposed Changes

Other tools such as omniboard for sacred or neptune show a flag that identifies the "dirty" run as such. This does not make the run reproducible, but it does at least alert the user that this is the case.

@krasch krasch added the enhancement New feature or request label Oct 8, 2019
@ssYkse
Copy link

ssYkse commented Oct 10, 2019

I agree with the issue, but not necessarily with the proposed solution. The reason for this is, that the second run is still not reproducible - we know its 'dirty', but we still don't know what changed.
I suggest that running in a 'dirty' environment, mlflow shall refuse to run:

mlflow run .
>>> dirty git, aborting.

If you insist on running this, maybe run

mlflow run . --ignore-git-warning

or something similar.

@krasch
Copy link
Author

krasch commented Oct 10, 2019

I agree, it does not solve the actual reproducibility issues. I like your proposal. I see two possible ways that one could go with the flag

  1. --ignore-git-warning (as you proposed)
  2. --enforce-git-clean (for example sacred does this)

I do prefer (1), however I do see a risk that this might annoy some users, because of perceived overhead.

@smurching smurching added the area/tracking Tracking service, tracking client APIs, autologging label Jul 1, 2020
@smurching smurching added the priority/backlog We believe it is useful, but don’t see it being prioritized in the next few months. label Jul 13, 2020
@ryan-duve
Copy link
Contributor

ryan-duve commented Aug 16, 2021

@smurching Would the project maintainers still be receptive to an attempt to solve the problem as stated in the original post? Having an indicator that an MLflow run was started with "working tree clean" would be helpful when attempting to reproduce results.

Edit: Adding @harupy in case there's another way to answer this question. I am willing to take a crack at this issue if it's something that would be considered to merge in.

@chenmoneygithub
Copy link
Collaborator

I don't know if we need this, as a comparison, we frequently start a python console in a local branch with changed file for quick testing, and there is no prompt that "this console is based on uncommitted changed", but everything works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tracking Tracking service, tracking client APIs, autologging enhancement New feature or request priority/backlog We believe it is useful, but don’t see it being prioritized in the next few months.
Projects
None yet
Development

No branches or pull requests

5 participants