Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug with passing capture_* args to neptune callback #29041

Merged

Conversation

AleksanderWWW
Copy link
Contributor

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@AleksanderWWW
Copy link
Contributor Author

AleksanderWWW commented Feb 15, 2024

What does this PR do

This PR aims to fix a bug that appears when using NeptuneCallback with run=None and at least one of the capture_* params set.

Cause of the problem

This is due to the fact, that in one of the methods those params have hardcoded values, but the kwargs passed to the constructor are also passed to that method. Therefore a TypeError like this occurs during evaluation:
TypeError: neptune.metadata_containers.run.Run() got multiple values for keyword argument 'capture_stdout'

Solution

Before invoking the problematic method we want to make sure that the capture_* params are not present in the callback's state by temporarily removing them. Once the method returns, the original kwargs state is restored.

@amyeroberts
Copy link
Collaborator

Thanks for opening this PR @AleksanderWWW! Let us know when you want a review 🤗

@AleksanderWWW
Copy link
Contributor Author

AleksanderWWW commented Feb 15, 2024

Thank you @amyeroberts !

Let me wait for my colleagues at Neptune to share their thoughts on my proposal 😃

capture_traceback=False,
)
self._is_monitoring_run = False
with preserve_neptune_kwargs(callback=self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend to override the params as a part of def _initialize_run:

    def _initialize_run(self, **additional_neptune_kwargs):
        ...

        self._stop_run_if_exists()

        try:
            run_params = additional_neptune_kwargs.copy()
            run_params.update(self._init_run_kwargs)
            self._run = init_run(**run_params)
            self._run_id = self._run["sys/id"].fetch()
        except (NeptuneMissingProjectNameException, NeptuneMissingApiTokenException) as e:
            raise NeptuneMissingConfiguration() from e

I'm not sure if I haven't done any typo etc. but I don't see a need to "preserve" these params.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually by use case mentioned here: neptune-ai/neptune-client#1663 let's override with the user-provided parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But setting those capture_* to False is only required in one invocation of _initialize_run. I'm not sure I understand

@@ -1187,6 +1186,9 @@ class NeptuneCallback(TrainerCallback):
trial_params_key = "trial_params"
trainer_parameters_key = "trainer_parameters"
flat_metrics = {"train/epoch"}
params_to_preserve = frozenset(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this, right?

@AleksanderWWW
Copy link
Contributor Author

Hey @amyeroberts
It seems that the integration_utils.py module has become very long and rather hard to navigate.
It's difficult to find what you're looking for and making changes in more than one place in one PR is a nightmare.
Would it be possible to split it into multiple modules separating different integrations?

@AleksanderWWW AleksanderWWW marked this pull request as ready for review February 28, 2024 16:27
@amyeroberts
Copy link
Collaborator

Hi @AleksanderWWW, what kind of structure would you propose for the files?

@AleksanderWWW
Copy link
Contributor Author

AleksanderWWW commented Mar 1, 2024

@amyeroberts
I thought of something similar to what pytorch_lightning has: https://github.com/Lightning-AI/pytorch-lightning/tree/master/src/lightning/pytorch/loggers

The root could be the already existing integrations package, but each provider (e.g. Neptune, CometML, etc.) has its own module.
Does it make sense?


And btw. the PR is ready for review from the code owners' side 🚀

@amyeroberts
Copy link
Collaborator

@AleksanderWWW OK, sounds like a good idea. We'd have to make all objects importable from the integrations module for backwards compatibility. If you'd like to open a PR I'd be very happy to review :)

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on a fix!

Just a question about how this works

self._run = init_run(**self._init_run_kwargs, **additional_neptune_kwargs)
run_params = additional_neptune_kwargs.copy()
run_params.update(self._init_run_kwargs)
self._run = init_run(**self._init_run_kwargs, **run_params)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to me like it won't work. Because we're updating run_params with self._init_run_kwargs we're guaranteeing that all the keys in self._init_run_kwargs are in run_params i.e. we're passing the same argument twice to init_run, which I'd expect to trigger an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are absolutely correct!. In fact, it should be just **run_params in the init_run call. My bad! Fixing rn :) Thank you for noticing

@AleksanderWWW
Copy link
Contributor Author

If you'd like to open a PR I'd be very happy to review :)

Will do :)

@AleksanderWWW
Copy link
Contributor Author

The failing CI test ModelUtilsTest.test_use_safetensors doesn't seem to have anything to do with my changes. Is it possible to maybe rerun it?

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@AleksanderWWW
Copy link
Contributor Author

AleksanderWWW commented Mar 4, 2024

@amyeroberts will you merge it once the checks pass? 🚀

@amyeroberts
Copy link
Collaborator

@AleksanderWWW A fix has been merged into main which should resolve this. Could you try rebasing to include this and trigger an new CI run?

@AleksanderWWW
Copy link
Contributor Author

@amyeroberts Done 😃

@amyeroberts amyeroberts merged commit 8f3f8e6 into huggingface:main Mar 5, 2024
20 checks passed
@AleksanderWWW AleksanderWWW deleted the neptune/fix-passing-capture-args branch March 5, 2024 15:18
damithsenanayake pushed a commit to damithsenanayake/transformers that referenced this pull request Mar 7, 2024
…29041)

* Fix bug with passing capture_* args to neptune callback

* ruff happy?

* instantiate (frozen)set only once

* code review

* code review 2

* ruff happy?

* code review
itazap pushed a commit that referenced this pull request May 14, 2024
* Fix bug with passing capture_* args to neptune callback

* ruff happy?

* instantiate (frozen)set only once

* code review

* code review 2

* ruff happy?

* code review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants