Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: add some exceptions when schedule computing session #1887

Merged
merged 9 commits into from
Feb 22, 2024

Conversation

minseokey
Copy link
Contributor

@minseokey minseokey commented Feb 5, 2024

This PR resolves #1814
add some exceptions when schedule computing session

  1. When candidates_agent itself is empty.
  2. When requested_architecture is empty.
  3. When the agent's architecture and the requested architecture are different

it is possible to express the error situation in detail

Checklist: (if applicable)

  • Milestone metadata specifying the target backport version
  • Mention to the original issue
  • Installer updates including:
    • Fixtures for db schema changes
    • New mandatory config options
  • Update of end-to-end CLI integration tests in ai.backend.test
  • API server-client counterparts (e.g., manager API -> client SDK)
  • Test case(s) to:
    • Demonstrate the difference of before/after
    • Demonstrate the flow of abstract/conceptual models with a concrete implementation
  • Documentation
    • Contents in the docs directory
    • docstrings in public interfaces and type annotations

@github-actions github-actions bot added the comp:manager Related to Manager component label Feb 5, 2024
@github-actions github-actions bot added size:S 10~30 LoC effort:easy Need to understand only a specific region of codes (good first issue, easy). labels Feb 5, 2024
@minseokey minseokey changed the title Feature/better log no agent schedule session feature: add some exceptions when schedule computing session Feb 5, 2024
@minseokey minseokey added size:S 10~30 LoC and removed size:S 10~30 LoC labels Feb 5, 2024
Copy link
Member

@fregataa fregataa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I left some comments

Comment on lines 717 to 718
if requested_architectures is None:
raise GenericBadRequest("Requested session has no architecture information")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot happen because requested_architectures is always set, not None

Comment on lines 725 to 726
if candidate_agents is None:
raise InstanceNotAvailable(extra_msg=("No agents are registered with the manager"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. candidate_agents is always a list, not None

Comment on lines 1031 to 1034
if candidate_agents is None:
raise InstanceNotAvailable(
extra_msg=("No agents are registered with the manager")
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. candidate_agents cannot be None

@@ -997,7 +1001,7 @@ async def _schedule_multi_node_session(
AgentRow.occupied_slots,
]).where(AgentRow.id == agent.id)
)
).fecthall()[0]
).fetchall()[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Could you test whether this fix works well?

Comment on lines 717 to 718
if not requested_architectures:
raise GenericBadRequest("Requested session has no architecture information")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requested_architectures can be empty when the computing session does not have any sibling computing kernels.
How about checking session's sibling kernels instead of requested_architectures?
like this.

if not sess_ctx.kernels:
    raise GenericBadRequest(f"Given session does not have any sibling kernel. (id: {sess_ctx.id}")

Comment on lines 725 to 726
if not candidate_agents:
raise InstanceNotAvailable(extra_msg="No agents are registered with the manager")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
"No agents are available for scheduling" error msg would be more comprehensible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check the error message again? The sentence "Registered with the manager" can confuse clients

Comment on lines 1031 to 1034
if not candidate_agents:
raise InstanceNotAvailable(
extra_msg="No agents are registered with the manager"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Member

@fregataa fregataa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@@ -0,0 +1 @@
add some exceptions when schedule computing session
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be logged on release note, so we should consider how users will read this and understand.
Something like "Eager error handling when scheduling compute session" would be readable for readers. We could use chatGPT or ask to other seniors for it !

Copy link
Member

@fregataa fregataa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new change log is good! I left some comments

Comment on lines 725 to 726
if not candidate_agents:
raise InstanceNotAvailable(extra_msg="No agents are registered with the manager")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check the error message again? The sentence "Registered with the manager" can confuse clients

Comment on lines 1031 to 1034
if not candidate_agents:
raise InstanceNotAvailable(
extra_msg="No agents are registered with the manager"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@achimnol achimnol added this pull request to the merge queue Feb 22, 2024
Merged via the queue into main with commit 7831270 Feb 22, 2024
26 checks passed
@achimnol achimnol deleted the feature/better-log-no-agent-schedule-session branch February 22, 2024 16:04
@achimnol achimnol added this to the 23.09 milestone Feb 23, 2024
achimnol added a commit that referenced this pull request Feb 23, 2024
…1887)

Backported-from: main (24.03)
Backported-to: 23.09
Co-authored-by: Joongi Kim <joongi@lablup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:manager Related to Manager component effort:easy Need to understand only a specific region of codes (good first issue, easy). size:S 10~30 LoC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better log when no agent is available for schedule Computing session
3 participants