-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Nuclio] Delete Nuclio function when removing function of Nuclio runtime #5462
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job! I agree we can start with this and discuss if we want to tackle to monitoring flow as well.
Also, please add a couple of unit tests.
if len(functions) > 0: | ||
if functions[0].get("kind") in RuntimeKinds.nuclio_runtimes(): | ||
async with server.api.utils.clients.async_nuclio.Client( | ||
auth_info | ||
) as client: | ||
tasks = [] | ||
for function in functions: | ||
nuclio_name = mlrun.runtimes.nuclio.function.get_fullname( | ||
name, project, function.get("metadata", {}).get("tag") | ||
) | ||
tasks.append( | ||
client.delete_function(name=nuclio_name, project_name=project) | ||
) | ||
await asyncio.gather(*tasks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few things here are not very clear
- the heuristic if checking the first function is nuclio and assuming rest are nuclio - perhaps worth a comment why do oyu assume it
- gather on tasks - if you have 500 functions, do you want to send 500 requests in parallel? why not group those requests by 10ens and send them all (remember not to fail on first but send all requests, log what fails and if one failed, then at the end - fail the operation
- will you wait for functions deletion? Id assume that is a long-process - thus, needs to be a background job.
- as stated on (2) - gather will fail if one deletion has failed. I suggest to wait for all before bailing
since (3) requires a change on response, I suggest moving this operation to functions_v2.py and have this function return background job
then you update
httpdb (sdk) to expect background job (and use v2)
open ticket for ui to use /v2/... endpoint and expect background job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liranbg reworked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
One comment though I suggest we use the other Background Task handler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! One minor fix for SDK side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
This PR addresses a bug where deleting an MLRun function from the Nuclio runtime only removed the function from the database, leaving the associated Nuclio pod running.
Since an MLRun function can be mapped to multiple Nuclio functions with different tags, the deletion process first checks if the function is associated with Nuclio. Then, it retrieves all tags belonging to this function, generates Nuclio function names, and sends requests to the Nuclio API to delete each Nuclio function mapped to the given MLRun function.
Jira - https://iguazio.atlassian.net/browse/ML-3432
https://iguazio.atlassian.net/browse/ML-6293