Skip to content

[core] add pdb controller in ome#283

Merged
slin1237 merged 3 commits intomainfrom
yifeliu/pdb
Sep 25, 2025
Merged

[core] add pdb controller in ome#283
slin1237 merged 3 commits intomainfrom
yifeliu/pdb

Conversation

@pallasathena92
Copy link
Collaborator

What type of PR is this?

/kind feature

What this PR does / why we need it:

A Pod Disruption Budget (PDB) in Kubernetes is a resource that specifies the minimum number or percentage of pods that must remain available for an application during voluntary disruptions.
Add it to our service to ensure the availiblity and volutary disruptions.

Which issue(s) this PR fixes:

there are two mian parts of this pr:

  1. add pdb controller
  2. remove hpa reference deployment name modify we checked in for migration previously.
    during migration, we have add "-new" as suffix to deployment name. In current inference service behavior, the deployment name will be "xxx-engine" and "xxx-decoder".

Does this PR introduce a user-facing change?

None. There is no impact on user's experience.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @pallasathena92, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new Pod Disruption Budget (PDB) controller to improve the resilience and availability of services by allowing users to define minimum availability or maximum unavailability constraints for pods during voluntary disruptions. Additionally, it refines the deployment naming convention by removing a legacy modification related to HPA references, ensuring more consistent resource identification. These changes collectively aim to bolster the stability of deployed applications.

Highlights

  • Pod Disruption Budget (PDB) Support: Introduced a new PDB controller to manage Pod Disruption Budgets for service components, enhancing application availability during voluntary disruptions. This includes new MinAvailable and MaxUnavailable fields in component specifications.
  • HPA Deployment Name Logic Removed: The logic that appended a '-new' suffix to deployment names for HPA references during migration has been removed, streamlining deployment naming conventions.
  • Dependency Cleanup: Performed a significant cleanup of unused Go module dependencies in go.sum, reducing the overall project footprint and improving build efficiency.
  • API Schema Updates: Updated the API schema (openapi_generated.go and swagger.json) to include the new minAvailable and maxUnavailable fields for component extension specifications.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Pod Disruption Budget (PDB) controller to improve service availability during voluntary disruptions, which is a great addition. The implementation is well-integrated into the existing reconciliation logic. The PR also includes a cleanup of some legacy code related to HPA deployment naming from a previous migration.

My review includes a high-severity comment regarding the default PDB configuration, which could inadvertently cause downtime for single-replica services. I've also pointed out a couple of minor typos in the code comments that affect the generated API documentation. Overall, the changes are solid, and addressing the default PDB behavior would make this feature safer for all use cases.

Comment on lines +56 to +62
if componentExt.MinAvailable == nil && componentExt.MaxUnavailable == nil {
// Set maxUnavailable = 1 as default
maxUnavailable = &intstr.IntOrString{
Type: intstr.Int,
IntVal: 1,
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The default behavior of creating a PodDisruptionBudget with maxUnavailable: 1 can be risky for single-replica services, as it would allow the single pod to be voluntarily disrupted, causing downtime. To better ensure availability, consider not creating a PDB by default if neither minAvailable nor maxUnavailable is specified by the user. This would make PDB creation an explicit opt-in, which is safer. An alternative could be to default to maxUnavailable: 0 for single-replica deployments.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@slin1237 slin1237 merged commit 78f436b into main Sep 25, 2025
24 checks passed
@zhyncs zhyncs deleted the yifeliu/pdb branch October 4, 2025 03:50
slin1237 pushed a commit that referenced this pull request Dec 22, 2025
* add pdb controller

* [Core] add pdb controller in ome

* Update pkg/apis/ome/v1beta1/component.go

---------

Co-authored-by: yifeliu <yifeng.liu@oracle.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants