Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Nuclio] Delete Nuclio function when removing function of Nuclio runtime #5462

Merged
merged 13 commits into from
May 1, 2024

Conversation

rokatyy
Copy link
Member

@rokatyy rokatyy commented Apr 25, 2024

This PR addresses a bug where deleting an MLRun function from the Nuclio runtime only removed the function from the database, leaving the associated Nuclio pod running.

Since an MLRun function can be mapped to multiple Nuclio functions with different tags, the deletion process first checks if the function is associated with Nuclio. Then, it retrieves all tags belonging to this function, generates Nuclio function names, and sends requests to the Nuclio API to delete each Nuclio function mapped to the given MLRun function.

Jira - https://iguazio.atlassian.net/browse/ML-3432
https://iguazio.atlassian.net/browse/ML-6293

@rokatyy rokatyy marked this pull request as ready for review April 25, 2024 09:03
Copy link
Member

@alonmr alonmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! I agree we can start with this and discuss if we want to tackle to monitoring flow as well.
Also, please add a couple of unit tests.

server/api/api/endpoints/functions.py Outdated Show resolved Hide resolved
server/api/api/endpoints/functions.py Outdated Show resolved Hide resolved
server/api/api/endpoints/functions.py Outdated Show resolved Hide resolved
server/api/api/endpoints/functions.py Show resolved Hide resolved
Comment on lines 195 to 208
if len(functions) > 0:
if functions[0].get("kind") in RuntimeKinds.nuclio_runtimes():
async with server.api.utils.clients.async_nuclio.Client(
auth_info
) as client:
tasks = []
for function in functions:
nuclio_name = mlrun.runtimes.nuclio.function.get_fullname(
name, project, function.get("metadata", {}).get("tag")
)
tasks.append(
client.delete_function(name=nuclio_name, project_name=project)
)
await asyncio.gather(*tasks)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few things here are not very clear

  1. the heuristic if checking the first function is nuclio and assuming rest are nuclio - perhaps worth a comment why do oyu assume it
  2. gather on tasks - if you have 500 functions, do you want to send 500 requests in parallel? why not group those requests by 10ens and send them all (remember not to fail on first but send all requests, log what fails and if one failed, then at the end - fail the operation
  3. will you wait for functions deletion? Id assume that is a long-process - thus, needs to be a background job.
  4. as stated on (2) - gather will fail if one deletion has failed. I suggest to wait for all before bailing

since (3) requires a change on response, I suggest moving this operation to functions_v2.py and have this function return background job
then you update
httpdb (sdk) to expect background job (and use v2)
open ticket for ui to use /v2/... endpoint and expect background job

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liranbg reworked

server/api/utils/background_tasks/kinds.py Outdated Show resolved Hide resolved
server/api/api/endpoints/functions.py Outdated Show resolved Hide resolved
server/api/api/utils.py Outdated Show resolved Hide resolved
server/api/api/utils.py Outdated Show resolved Hide resolved
server/api/api/utils.py Outdated Show resolved Hide resolved
tests/api/api/test_functions.py Outdated Show resolved Hide resolved
tests/api/api/test_functions.py Outdated Show resolved Hide resolved
Copy link
Member

@quaark quaark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!
One comment though I suggest we use the other Background Task handler

server/api/api/utils.py Outdated Show resolved Hide resolved
Copy link
Member

@quaark quaark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! One minor fix for SDK side

mlrun/db/httpdb.py Outdated Show resolved Hide resolved
Copy link
Member

@quaark quaark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

mlrun/db/httpdb.py Outdated Show resolved Hide resolved
server/api/api/utils.py Outdated Show resolved Hide resolved
mlrun/db/httpdb.py Outdated Show resolved Hide resolved
server/api/api/utils.py Outdated Show resolved Hide resolved
@liranbg liranbg enabled auto-merge (squash) May 1, 2024 21:17
@liranbg liranbg disabled auto-merge May 1, 2024 21:17
@liranbg liranbg merged commit a0e7738 into mlrun:development May 1, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants