-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: add some exceptions when schedule computing session #1887
Conversation
.feature.md -> 1887.feature.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I left some comments
if requested_architectures is None: | ||
raise GenericBadRequest("Requested session has no architecture information") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cannot happen because requested_architectures
is always set
, not None
if candidate_agents is None: | ||
raise InstanceNotAvailable(extra_msg=("No agents are registered with the manager")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. candidate_agents
is always a list
, not None
if candidate_agents is None: | ||
raise InstanceNotAvailable( | ||
extra_msg=("No agents are registered with the manager") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. candidate_agents
cannot be None
@@ -997,7 +1001,7 @@ async def _schedule_multi_node_session( | |||
AgentRow.occupied_slots, | |||
]).where(AgentRow.id == agent.id) | |||
) | |||
).fecthall()[0] | |||
).fetchall()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Could you test whether this fix works well?
if not requested_architectures: | ||
raise GenericBadRequest("Requested session has no architecture information") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requested_architectures
can be empty when the computing session does not have any sibling computing kernels.
How about checking session's sibling kernels instead of requested_architectures
?
like this.
if not sess_ctx.kernels:
raise GenericBadRequest(f"Given session does not have any sibling kernel. (id: {sess_ctx.id}")
if not candidate_agents: | ||
raise InstanceNotAvailable(extra_msg="No agents are registered with the manager") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
"No agents are available for scheduling" error msg would be more comprehensible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you check the error message again? The sentence "Registered with the manager" can confuse clients
if not candidate_agents: | ||
raise InstanceNotAvailable( | ||
extra_msg="No agents are registered with the manager" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
changes/1887.feature.md
Outdated
@@ -0,0 +1 @@ | |||
add some exceptions when schedule computing session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be logged on release note, so we should consider how users will read this and understand.
Something like "Eager error handling when scheduling compute session" would be readable for readers. We could use chatGPT or ask to other seniors for it !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new change log is good! I left some comments
if not candidate_agents: | ||
raise InstanceNotAvailable(extra_msg="No agents are registered with the manager") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you check the error message again? The sentence "Registered with the manager" can confuse clients
if not candidate_agents: | ||
raise InstanceNotAvailable( | ||
extra_msg="No agents are registered with the manager" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
This PR resolves #1814
add some exceptions when schedule computing session
it is possible to express the error situation in detail
Checklist: (if applicable)
ai.backend.test
docs
directory