Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make MLFlow version detection more robust and handles mlflow-skinny #29957

Merged
merged 3 commits into from Apr 8, 2024

Conversation

helloworld1
Copy link
Contributor

@helloworld1 helloworld1 commented Mar 29, 2024

What does this PR do?

MLFlow can be provided from either "mlflow" or "mlflow-skinny" package. See https://pypi.org/project/mlflow/.
This PR updated the version check of mlflow to include "mlflow-skinny" package.
Also make the version check not crashing and fallback to synchronous metrics logging if version is not detected.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker @pacman100 @cchen-dialpad

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey This seems to have been fixed by #29918

@helloworld1
Copy link
Contributor Author

Thanks @ArthurZucker for reviewing. The fix #29917 indeed will address the mlflow-skinny problem. In my PR, I made the code more robust in case MLFlow version detection failed and the job will not crash and instead default to not use async metric logging.
I have rebased the code on main branch.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Not sure we want a fallback here, the MLflowCallback should raise an error if mlflow is not available no?

        if not is_mlflow_available():
            raise RuntimeError("MLflowCallback requires mlflow to be installed. Run `pip install mlflow`.")
        import mlflow

will raise an error. We can update the is_mlflow_available to take into account mlflow-skinny and that will allow us to do something like if importlib.metadata.version(self._ml_flow) instead of using the get_mlflow_version!

@helloworld1
Copy link
Contributor Author

@ArthurZucker The is_mlflow_available() is used to check if "mlflow" can be imported regardless which version is install. The version check is used to decide if the async logging is available. I refactor the code to make it is_mlflow_async_log_available() function so it will be very clear on its purpose.

@ArthurZucker
Copy link
Collaborator

Thanks! Again, I think we should also fix the is_mlflow_availabl as it does not take into account skinny. By doing this, it will be easier to then test the version of self._ml_flow which will directly be either mlfow or skinny .
Does that make sense?

I am fine with the current changes, but we should go for minimal!

@helloworld1
Copy link
Contributor Author

helloworld1 commented Apr 2, 2024

Thanks! Again, I think we should also fix the is_mlflow_availabl as it does not take into account skinny. By doing this, it will be easier to then test the version of self._ml_flow which will directly be either mlfow or skinny . Does that make sense?

I am fine with the current changes, but we should go for minimal!

Thanks for reviewing again. The current is_mlflow_available() is already taking account of mlflow or mlflow-skinny without change needed. I have tested this function with both mlflow or skinny. And it works as long as import mlflow can work.

def is_mlflow_available():
    if os.getenv("DISABLE_MLFLOW_INTEGRATION", "FALSE").upper() == "TRUE":
        return False
    return importlib.util.find_spec("mlflow") is not None

The issue addressing here is the version check is the "Async logging" requires mlflow or mlfow-skinny 2.8.0 or higher. And the version checking requires checking mlflow or mlflow-skinny package version. The function is_mlflow_async_log_available() is a refactor of the current logic of version check. It is already pretty minimal I think and I don't think it is able to be combined with is_mlflow_available() since they serves different purposes.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I am still a bit confused because at this point you already have access to self._mlflow, which is as you said either mlflow or mlflow-skinny's mlflow.
Thus you should only have to do self._mlflow.__version__.
Again I don't understand how yoyu could not determine the mlflow version it's installed if you can go here. A bad install maybe?
Anyways, we should not have to re-import

# https://github.com/mlflow/mlflow/releases/tag/v2.8.0
if packaging.version.parse(get_mlflow_version()) >= packaging.version.parse("2.8.0"):
self._async_log = True
self._async_log = is_mlflow_async_log_available()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self._async_log = is_mlflow_async_log_available()
self._async_log = packaging.version.parse(self._mlflow.__version__) >= packaging.version.parse("2.8.0"):

should be the only thing you have to do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArthurZucker Got it. Thanks for the suggestion. It is definitely much simpler and cleaner. I have tested working correctly for both mlfow and mlflow-skinny and updated PR.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much much cleaner thanks

@ArthurZucker
Copy link
Collaborator

Unrelated failing tests, merging

@ArthurZucker ArthurZucker merged commit 836e88c into huggingface:main Apr 8, 2024
16 of 20 checks passed
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Apr 12, 2024
…uggingface#29957)

* Make MLFlow version detection more robust and handles mlflow-skinny

* Make function name more clear and refactor the logic

* Further refactor
ArthurZucker pushed a commit that referenced this pull request Apr 22, 2024
…29957)

* Make MLFlow version detection more robust and handles mlflow-skinny

* Make function name more clear and refactor the logic

* Further refactor
itazap pushed a commit that referenced this pull request May 14, 2024
…29957)

* Make MLFlow version detection more robust and handles mlflow-skinny

* Make function name more clear and refactor the logic

* Further refactor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants