Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/on schedule job callback #78

Merged

Conversation

Gabriel2409
Copy link
Contributor

Hi @marrrcin
Following issue #76 and your comment on PR #77,
here is a proposed implementation for the --on-job-scheduled parameter.

To use it, you could for example create the following function:

#src/mymodule/myfuncs.py
def mycallback(job):
    click.echo(job.name)

and then run kedro azureml run --on-job-scheduled mymodule.myfuncs:mycallback

I took inspiration from the way uvicorn allows you to launch an application (uvicorn main:app) so the function is specified after a colon.

Note that I am not really happy with the way I implemented the unit tests so if you have better ideas, that would be great.

Copy link
Contributor

@marrrcin marrrcin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution, I like it!

Added my comments in the review.

ml_client.jobs.stream.side_effect = ValueError()

runner = CliRunner()
result = runner.invoke(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you modify this test so that the callback is also mocked?
Pseudo-code:

with #... other patches,
    patch("dummy.module:callback") as cb:
        # ...

and then assert that cb.assert_called_once()?

CHANGELOG.md Outdated
@@ -2,6 +2,8 @@

## [Unreleased]

- Added possibility to use custom callbacks with the azureml pipeline job as an argument by [@Gabriel2409](https://github.com/Gabriel2409)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change to:

- [Feature 🚀] Added `--on-job-scheduled` argument to `kedro azureml run` to plug-in custom behaviour after Azure ML job is scheduled [@Gabriel2409](https://github.com/Gabriel2409)

@Gabriel2409
Copy link
Contributor Author

I added the changes to the tests.
After further consideration, I removed the check on the number of arguments because using inspect to check the signature seems to lead to many edge cases where it fails when it should not (functions with default parameters for ex) and it was also hard to test because pytest Mock signature contains two arguments (args and kwargs) so I had to overwrite the signature and it made things too complex.
In fact, I completely removed the use of inspect and I instead use callable to check if the function is callable (inspect.isfunction returns False on Mock objects)

Still regarding the testing, having a module in a helpers folder seems to be the easiest solution: I just need to patch the existing function to ensure it is called exactly once. Moreover, it allows me to test all the cases in dynamic_import_job_schedule_func_from_str` which is what the last case is about.

Indeed, using a dummy module meant that I had to patch the importlib.import_module function which made the run fail (I suspect it might be used under the hood elsewhere), which is why I abandoned the idea (It made the testing code a lot more complex)

Please tell me if this is acceptable for you.
Thank you!

# extra checks to make sure the tests actually checks the usecase
# and that we are not always in the case of an invalid string
if ":" in on_job_scheduled:
module_str, attr_str = on_job_scheduled.split(":")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not test the code for dynamic_import_job_schedule_func_from_str at all 🤔 You're effectively validating the arguments passed to the test, not to the downstream invocation (runner.invoke).
If you want to test this properly, the I would suggest:

  1. Leave the cases as they are right now:
    "on_job_scheduled",
    (
        "bad_str_format",
        "nonexistant_module:func",
        "tests.helpers.on_job_scheduled_helper:nonexistant_attr",
        "tests.helpers.on_job_scheduled_helper:existing_attr",
    ),
  1. Run the runner.invoke(...)
  2. Check the result.exception against one of the cases that you've specified:
assert result.exit_code != 0
assert result.exception, "Exception should have been raised"
# Rest of the checks
if module_str == "nonexistant_module":
    # ... rest of the checks based on the result.execption

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this is indeed a better solution.

My reasonning was that the only reason the test could fail was because of the --on-job-scheduled arg and I made sure to validate the arguments before the test. Also when checking the code coverage, the function was 100% covered so I figured it was enough.

But I agree with you it is not as robust as checking the error message after invoking the runner. Failure could happen because of something else. I don't know why I didn't think of that as it is the way it is done in another test 🤦‍♂️

- ``--load-versions`` specifies a particular dataset version (timestamp) for loading (similar behavior as Kedro)
- ``--on-job-scheduled path.to.module:my_function`` specifies a callback function to be called on the azureml pipeline job start (example below)

.. code:: python
Copy link
Contributor

@marrrcin marrrcin Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not display in the docs after rendering (see the screen) 👇🏻

You're probably missing a blank line after .. code:: python, see the previous usages in the same file above.

image

https://kedro-azureml--78.org.readthedocs.build/en/78/source/03_quickstart.html

@Gabriel2409
Copy link
Contributor Author

I updated the code based on your remarks.

@marrrcin marrrcin merged commit 1a7f8dd into getindata:develop Sep 29, 2023
9 of 10 checks passed
@marrrcin
Copy link
Contributor

Merged, thanks for this feature!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants