Feature/on schedule job callback #78

Gabriel2409 · 2023-09-27T09:36:34Z

Hi @marrrcin
Following issue #76 and your comment on PR #77,
here is a proposed implementation for the --on-job-scheduled parameter.

To use it, you could for example create the following function:

#src/mymodule/myfuncs.py
def mycallback(job):
    click.echo(job.name)

and then run kedro azureml run --on-job-scheduled mymodule.myfuncs:mycallback

I took inspiration from the way uvicorn allows you to launch an application (uvicorn main:app) so the function is specified after a colon.

Note that I am not really happy with the way I implemented the unit tests so if you have better ideas, that would be great.

marrrcin

Thanks for your contribution, I like it!

Added my comments in the review.

marrrcin · 2023-09-27T09:54:48Z

tests/test_cli.py

+        ml_client.jobs.stream.side_effect = ValueError()
+
+        runner = CliRunner()
+        result = runner.invoke(


Can you modify this test so that the callback is also mocked?
Pseudo-code:

with #... other patches, patch("dummy.module:callback") as cb: # ...

and then assert that cb.assert_called_once()?

marrrcin · 2023-09-27T09:58:03Z

CHANGELOG.md

@@ -2,6 +2,8 @@

 ## [Unreleased]

+- Added possibility to use custom callbacks with the azureml pipeline job as an argument by [@Gabriel2409](https://github.com/Gabriel2409)


Please change to:

- [Feature 🚀] Added `--on-job-scheduled` argument to `kedro azureml run` to plug-in custom behaviour after Azure ML job is scheduled [@Gabriel2409](https://github.com/Gabriel2409)

Gabriel2409 · 2023-09-28T09:14:34Z

I added the changes to the tests.
After further consideration, I removed the check on the number of arguments because using inspect to check the signature seems to lead to many edge cases where it fails when it should not (functions with default parameters for ex) and it was also hard to test because pytest Mock signature contains two arguments (args and kwargs) so I had to overwrite the signature and it made things too complex.
In fact, I completely removed the use of inspect and I instead use callable to check if the function is callable (inspect.isfunction returns False on Mock objects)

Still regarding the testing, having a module in a helpers folder seems to be the easiest solution: I just need to patch the existing function to ensure it is called exactly once. Moreover, it allows me to test all the cases in dynamic_import_job_schedule_func_from_str` which is what the last case is about.

Indeed, using a dummy module meant that I had to patch the importlib.import_module function which made the run fail (I suspect it might be used under the hood elsewhere), which is why I abandoned the idea (It made the testing code a lot more complex)

Please tell me if this is acceptable for you.
Thank you!

marrrcin · 2023-09-28T10:16:20Z

tests/test_cli.py

+        # extra checks to make sure the tests actually checks the usecase
+        # and that we are not always in the case of an invalid string
+        if ":" in on_job_scheduled:
+            module_str, attr_str = on_job_scheduled.split(":")


This does not test the code for dynamic_import_job_schedule_func_from_str at all 🤔 You're effectively validating the arguments passed to the test, not to the downstream invocation (runner.invoke).
If you want to test this properly, the I would suggest:

Leave the cases as they are right now:

"on_job_scheduled", ( "bad_str_format", "nonexistant_module:func", "tests.helpers.on_job_scheduled_helper:nonexistant_attr", "tests.helpers.on_job_scheduled_helper:existing_attr", ),

Run the runner.invoke(...)

Check the result.exception against one of the cases that you've specified:

assert result.exit_code != 0 assert result.exception, "Exception should have been raised" # Rest of the checks if module_str == "nonexistant_module": # ... rest of the checks based on the result.execption

You're right, this is indeed a better solution.

My reasonning was that the only reason the test could fail was because of the --on-job-scheduled arg and I made sure to validate the arguments before the test. Also when checking the code coverage, the function was 100% covered so I figured it was enough.

But I agree with you it is not as robust as checking the error message after invoking the runner. Failure could happen because of something else. I don't know why I didn't think of that as it is the way it is done in another test 🤦‍♂️

marrrcin · 2023-09-28T10:21:18Z

docs/source/03_quickstart.rst

+- ``--load-versions`` specifies a particular dataset version (timestamp) for loading (similar behavior as Kedro)
+- ``--on-job-scheduled  path.to.module:my_function`` specifies a callback function to be called on the azureml pipeline job start (example below)
+
+.. code:: python


This does not display in the docs after rendering (see the screen) 👇🏻

You're probably missing a blank line after .. code:: python, see the previous usages in the same file above.

https://kedro-azureml--78.org.readthedocs.build/en/78/source/03_quickstart.html

…nvalid args

Gabriel2409 · 2023-09-28T11:42:05Z

I updated the code based on your remarks.

marrrcin · 2023-09-29T07:01:18Z

Merged, thanks for this feature!

Gabriel2409 added 7 commits September 27, 2023 10:13

added possibility to provide own callback function on job scheduled

6f50913

fixed default value of --on-job-scheduled

8079665

added unit tests

96571d7

updated changelog

1cc1a3e

removed unused import

3255d1a

fixed formatting/flake8

6c61c5e

fixed sort

9c7cf8c

marrrcin reviewed Sep 27, 2023

View reviewed changes

Gabriel2409 added 4 commits September 27, 2023 13:03

updated documentation

fb943de

updated changelog

41ec1a7

Removed nb parameters check in dynamic_import_job_schedule_func_from_str

76643ed

improved unit tests for on-job-scheduled arg

9d3a00b

marrrcin suggested changes Sep 28, 2023

View reviewed changes

marrrcin reviewed Sep 28, 2023

View reviewed changes

Gabriel2409 added 2 commits September 28, 2023 13:29

added space to documentation

da0e2f0

modified test related to failure when using --on-job-scheduled with i…

d2d0906

…nvalid args

marrrcin approved these changes Sep 29, 2023

View reviewed changes

marrrcin merged commit 1a7f8dd into getindata:develop Sep 29, 2023
9 of 10 checks passed

Gabriel2409 mentioned this pull request Sep 29, 2023

Retrieving pipeline id in cicd pipelines #76

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/on schedule job callback #78

Feature/on schedule job callback #78

Gabriel2409 commented Sep 27, 2023

marrrcin left a comment

marrrcin Sep 27, 2023

marrrcin Sep 27, 2023

Gabriel2409 commented Sep 28, 2023

marrrcin Sep 28, 2023

Gabriel2409 Sep 28, 2023

marrrcin Sep 28, 2023 •

edited

Loading

Gabriel2409 commented Sep 28, 2023

marrrcin commented Sep 29, 2023

		@@ -2,6 +2,8 @@

		## [Unreleased]

		- Added possibility to use custom callbacks with the azureml pipeline job as an argument by [@Gabriel2409](https://github.com/Gabriel2409)

Feature/on schedule job callback #78

Feature/on schedule job callback #78

Conversation

Gabriel2409 commented Sep 27, 2023

marrrcin left a comment

Choose a reason for hiding this comment

marrrcin Sep 27, 2023

Choose a reason for hiding this comment

marrrcin Sep 27, 2023

Choose a reason for hiding this comment

Gabriel2409 commented Sep 28, 2023

marrrcin Sep 28, 2023

Choose a reason for hiding this comment

Gabriel2409 Sep 28, 2023

Choose a reason for hiding this comment

marrrcin Sep 28, 2023 • edited Loading

Choose a reason for hiding this comment

Gabriel2409 commented Sep 28, 2023

marrrcin commented Sep 29, 2023

marrrcin Sep 28, 2023 •

edited

Loading