Skip to content

fix: increase memory limits for Azure RHEL-AI provisioner task#750

Merged
adrianriobo merged 1 commit intoredhat-developer:mainfrom
XiyangDong:bump-azure-memory
Mar 25, 2026
Merged

fix: increase memory limits for Azure RHEL-AI provisioner task#750
adrianriobo merged 1 commit intoredhat-developer:mainfrom
XiyangDong:bump-azure-memory

Conversation

@XiyangDong
Copy link
Copy Markdown
Contributor

The mapt provisioner container running the Azure RHEL-AI Tekton task
is being OOMKilled (exit code 137) when provisioning large Azure VM
types. The previous limits (request: 200Mi, limit: 600Mi) are
insufficient for the Terraform operations required.

Increase memory resources for the infra-azure-rhel-ai Tekton task:

Before After
request 200Mi 1Gi
limit 600Mi 2Gi

The request is raised to 1Gi so the scheduler places the pod on a node
that can actually sustain the workload. The 2Gi limit provides headroom
for spikes during large Azure VM provisioning.

  • tkn/template/infra-azure-rhel-ai.yaml — source template updated
  • tkn/infra-azure-rhel-ai.yaml — regenerated via make tkn-update

Copy link
Copy Markdown
Collaborator

@adrianriobo adrianriobo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

  The mapt provisioner container running the Azure RHEL-AI Tekton task
  is being OOMKilled (exit code 137) when provisioning large Azure VM
  types. The previous limits (request: 200Mi, limit: 600Mi) are
  insufficient for the Terraform operations required.

  Increase memory resources for the `infra-azure-rhel-ai` Tekton task:

  | | Before | After |
  |---|---|---|
  | request | 200Mi | 1Gi |
  | limit | 600Mi | 2Gi |

  The request is raised to 1Gi so the scheduler places the pod on a node
  that can actually sustain the workload. The 2Gi limit provides headroom
  for spikes during large Azure VM provisioning.

  - `tkn/template/infra-azure-rhel-ai.yaml` — source template updated
  - `tkn/infra-azure-rhel-ai.yaml` — regenerated via `make tkn-update`
  - `tkn/template/infra-azure-rhel.yaml` — source template updated
  - `tkn/infra-azure-rhel.yaml` — regenerated via `make tkn-update`

Signed-off-by: Xiyang Dong <xdong@redhat.com>
@XiyangDong
Copy link
Copy Markdown
Contributor Author

Tested provisioning Azure with the change, now it works with no OOM Kill.

@adrianriobo adrianriobo merged commit ca76047 into redhat-developer:main Mar 25, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants