-
Notifications
You must be signed in to change notification settings - Fork 4
[PWCI] "[v1] net/mlx5: fix job leak on indirect meter creation failure" #534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Indirect meter_mark action needs to allocate a job to track asynchronous HW operation to create the meter object. When meter_mark creation failed, the job may have been leaked because there was no job cleanup code for sync API. Add necessary code to check if meter_mark creation failed before or after HW operation is enqueued and call job_put accordingly. Fixes: 4359d9d ("net/mlx5: fix sync meter processing in HWS") Cc: getelson@nvidia.com Cc: stable@dpdk.org Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Signed-off-by: 0-day Robot <robot@bytheb.org>
Reviewer's GuideRefactors meter mark allocation to return detailed error codes instead of NULL, propagates those errors through indirect meter creation paths, and ensures hardware queue jobs are properly released only when appropriate to avoid leaks on ASO meter failures. Sequence diagram for meter mark allocation and job lifecycle error handlingsequenceDiagram
participant F as flow_hw_action_handle_create
participant C as flow_hw_meter_mark_compile
participant A as flow_hw_meter_mark_alloc
participant P as mlx5_ipool_malloc
participant U as mlx5_aso_meter_update_by_wqe
participant W as mlx5_aso_mtr_wait
participant J as flow_hw_job_put
participant I as mlx5_ipool_free
rect rgb(235,235,255)
F->>F: allocate job (flow_hw_action_job_init)
alt action_type is METER_MARK (indirect handle)
F->>A: ret = flow_hw_meter_mark_alloc(dev, queue, action, job, push, &aso_mtr, error)
else indirect meter creation via compile
C->>C: allocate job (flow_hw_action_job_init)
C->>A: ret = flow_hw_meter_mark_alloc(dev, queue, action, job, true, &aso_mtr, error)
end
end
rect rgb(245,245,245)
alt shared_host is true
A-->>F: return ENOTSUP (<0)
else meter_mark profile is NULL
A-->>F: return EINVAL (<0)
else normal allocation path
A->>P: aso_mtr = mlx5_ipool_malloc(idx_pool, &mtr_id)
alt allocation fails
A->>I: mlx5_ipool_free(idx_pool, mtr_id) if mtr_id != 0
A-->>F: return ENOMEM (<0)
else allocation succeeds
A->>A: initialize aso_mtr fields
A->>U: rc = mlx5_aso_meter_update_by_wqe(..., aso_mtr, job, push)
alt update enqueue fails
A->>I: mlx5_ipool_free(idx_pool, mtr_id)
A-->>F: return EBUSY (<0)
else update enqueue succeeds
alt queue == MLX5_HW_INV_QUEUE
A->>W: rc = mlx5_aso_mtr_wait(priv, aso_mtr, true)
alt wait succeeds
A-->>F: return 0
else wait fails
A->>I: mlx5_ipool_free(idx_pool, mtr_id)
A-->>F: return -EIO
end
else async queue
A-->>F: return 0
end
end
end
end
end
rect rgb(235,255,235)
alt caller is flow_hw_action_handle_create or flow_hw_meter_mark_compile
alt ret == 0
F->>F: handle = job->action
F->>F: normal finalize path
else ret != 0
alt ret == -EIO (wait failure)
note over F: job already used by hardware, keep it to avoid leak
F-->>F: do not call flow_hw_job_put
else ret != -EIO (allocation/enqueue/config error)
alt queue == MLX5_HW_INV_QUEUE
F->>F: queue = CTRL_QUEUE_ID(priv)
end
F->>J: flow_hw_job_put(priv, job, queue)
end
end
end
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
WalkthroughThe Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Areas requiring attention:
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- The special handling of
-EIOinflow_hw_meter_mark_alloccallers is a bit opaque; consider adding a named constant or comment to explain why this error is treated differently from other non-zero returns and where the job is expected to be released in that path. - On the
mlx5_aso_mtr_wait()failure path you return-EIOwithout populatingrte_flow_error, which makes diagnostics inconsistent compared to the other error branches inflow_hw_meter_mark_alloc; consider setting a descriptiverte_flow_errorthere as well.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The special handling of `-EIO` in `flow_hw_meter_mark_alloc` callers is a bit opaque; consider adding a named constant or comment to explain why this error is treated differently from other non-zero returns and where the job is expected to be released in that path.
- On the `mlx5_aso_mtr_wait()` failure path you return `-EIO` without populating `rte_flow_error`, which makes diagnostics inconsistent compared to the other error branches in `flow_hw_meter_mark_alloc`; consider setting a descriptive `rte_flow_error` there as well.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
drivers/net/mlx5/mlx5_flow_hw.c (1)
1257-1275: Indirect meter create uses new alloc API; behaviour looks good but host/validate mismatch is worth a lookIn
flow_hw_action_handle_create():
The meter‑mark branch now uses
flow_hw_meter_mark_alloc()via an out‑param, and:
- sets
aso = true,- frees the job on any error except
-EIO(with theMLX5_HW_INV_QUEUE→CTRL_QUEUE_ID()normalize),- derives the handle directly from
job->action, and- still invokes
flow_hw_action_finalize()only whenhandleis non‑NULL (so failed creates don’t enqueue jobs).This aligns with the new alloc semantics and closes the job‑leak for indirect meter creation.
Two small points to sanity‑check:
As above, the
ret != -EIOfilter assumes that the-EIOpath fromflow_hw_meter_mark_alloc()has fully taken care of the job; otherwise we’d leak themlx5_hw_q_jobhere.
flow_hw_meter_mark_alloc()now unconditionally rejectspriv->shared_hostwithENOTSUP, whileflow_hw_validate_action_meter_mark()still skips theshared_hostcheck forindirect == true. That makes validation slightly more permissive than creation for indirect actions on shared‑host ports. If that’s not intentional, consider mirroring the shared‑host restriction in the validate path as well so applications don’t see validate‑OK / create‑fail solely due to host/guest placement.Functionally the changes look sound and targeted at the reported job leak.
Also applies to: 12719-12747
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
drivers/net/mlx5/mlx5_flow_hw.c(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
drivers/net/mlx5/mlx5_flow_hw.c (2)
drivers/net/mlx5/mlx5_flow.c (1)
int(205-213)drivers/net/mlx5/mlx5_utils.c (1)
mlx5_ipool_malloc(667-732)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Sourcery review
- GitHub Check: Sourcery review
🔇 Additional comments (2)
drivers/net/mlx5/mlx5_flow_hw.c (2)
1873-1934: Job‑safe ASO meter alloc; confirm -EIO/job ownership contractThe new
flow_hw_meter_mark_alloc()looks correct in terms of:
- enforcing host‑port usage on
shared_host,- validating
meter_mark->profile,- consistently freeing
mtr_idin all failure paths, and- returning an
intstatus with the allocatedmlx5_aso_mtr *viaaso_mtr.The special
-EIOreturn for themlx5_aso_mtr_wait()failure path is relied on by callers to not callflow_hw_job_put()(they assume the wait path has already consumed or never posted the job), whereas all other error codes cause the caller to free the job.Please double‑check that
mlx5_aso_mtr_wait()’s contract is indeed:
- on failure, either the WQE was never posted, or any posted job is already fully handled (including recycling the
mlx5_hw_q_job), and- no further completion will later hit
__flow_hw_pull_indir_action_comp()for this samejob.If
mlx5_aso_mtr_wait()can return non‑zero while a still‑owned job might later complete, we could either leak or double‑free the job depending on the path. Otherwise this change is fine and fixes the original job‑leak on enqueue failure.
1947-1964: Compile‑time meter error path now releases jobs correctly; keep -EIO semantics aligned
flow_hw_meter_mark_compile()now:
- captures the
intstatus fromflow_hw_meter_mark_alloc(),- calls
flow_hw_job_put()for all non‑zero returns except-EIO, and- still maps any failure to
return -1for the caller.This correctly plugs the previous leak for enqueue‑time failures (e.g.
EBUSY,ENOMEM), since the job is explicitly returned to the pool.As with the alloc helper, this relies on
-EIOmeaning “sync wait path has already dealt with the job”; otherwise we would leak it here. Please verify thatflow_hw_meter_mark_alloc()only returns-EIOin those cases and that no queued job remains outstanding then.
NOTE: This is an auto submission for "[v1] net/mlx5: fix job leak on indirect meter creation failure".
See "http://patchwork.dpdk.org/project/dpdk/list/?series=36860" for details.
Summary by Sourcery
Fix handling of meter mark indirect action creation failures to avoid leaking hardware queue jobs and improve error propagation.
Bug Fixes:
Enhancements:
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.