[PWCI] "[v1] net/mlx5: fix job leak on indirect meter creation failure" #534

ovsrobot · 2025-12-08T11:04:30Z

NOTE: This is an auto submission for "[v1] net/mlx5: fix job leak on indirect meter creation failure".

See "http://patchwork.dpdk.org/project/dpdk/list/?series=36860" for details.

Summary by Sourcery

Fix handling of meter mark indirect action creation failures to avoid leaking hardware queue jobs and improve error propagation.

Bug Fixes:

Ensure hardware queue jobs are properly released when ASO meter mark allocation or update fails, preventing job leaks on indirect meter creation failure.
Return appropriate error codes and avoid finalizing actions when meter mark creation does not produce a valid handle.

Enhancements:

Tighten validation for meter mark actions, including shared host restrictions and missing profiles, and propagate detailed errors through the meter allocation helper.

Summary by CodeRabbit

Refactor
- Updated MLX5 network driver's meter allocation and error handling with refined validation checks, improved resource cleanup procedures, and enhanced failure scenario management to ensure more consistent and robust operation across allocation workflows.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Indirect meter_mark action needs to allocate a job to track asynchronous HW operation to create the meter object. When meter_mark creation failed, the job may have been leaked because there was no job cleanup code for sync API. Add necessary code to check if meter_mark creation failed before or after HW operation is enqueued and call job_put accordingly. Fixes: 4359d9d ("net/mlx5: fix sync meter processing in HWS") Cc: getelson@nvidia.com Cc: stable@dpdk.org Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Signed-off-by: 0-day Robot <robot@bytheb.org>

sourcery-ai · 2025-12-08T11:04:36Z

Reviewer's Guide

Refactors meter mark allocation to return detailed error codes instead of NULL, propagates those errors through indirect meter creation paths, and ensures hardware queue jobs are properly released only when appropriate to avoid leaks on ASO meter failures.

Sequence diagram for meter mark allocation and job lifecycle error handling

sequenceDiagram
    participant F as flow_hw_action_handle_create
    participant C as flow_hw_meter_mark_compile
    participant A as flow_hw_meter_mark_alloc
    participant P as mlx5_ipool_malloc
    participant U as mlx5_aso_meter_update_by_wqe
    participant W as mlx5_aso_mtr_wait
    participant J as flow_hw_job_put
    participant I as mlx5_ipool_free

    rect rgb(235,235,255)
        F->>F: allocate job (flow_hw_action_job_init)
        alt action_type is METER_MARK (indirect handle)
            F->>A: ret = flow_hw_meter_mark_alloc(dev, queue, action, job, push, &aso_mtr, error)
        else indirect meter creation via compile
            C->>C: allocate job (flow_hw_action_job_init)
            C->>A: ret = flow_hw_meter_mark_alloc(dev, queue, action, job, true, &aso_mtr, error)
        end
    end

    rect rgb(245,245,245)
        alt shared_host is true
            A-->>F: return ENOTSUP (<0)
        else meter_mark profile is NULL
            A-->>F: return EINVAL (<0)
        else normal allocation path
            A->>P: aso_mtr = mlx5_ipool_malloc(idx_pool, &mtr_id)
            alt allocation fails
                A->>I: mlx5_ipool_free(idx_pool, mtr_id) if mtr_id != 0
                A-->>F: return ENOMEM (<0)
            else allocation succeeds
                A->>A: initialize aso_mtr fields
                A->>U: rc = mlx5_aso_meter_update_by_wqe(..., aso_mtr, job, push)
                alt update enqueue fails
                    A->>I: mlx5_ipool_free(idx_pool, mtr_id)
                    A-->>F: return EBUSY (<0)
                else update enqueue succeeds
                    alt queue == MLX5_HW_INV_QUEUE
                        A->>W: rc = mlx5_aso_mtr_wait(priv, aso_mtr, true)
                        alt wait succeeds
                            A-->>F: return 0
                        else wait fails
                            A->>I: mlx5_ipool_free(idx_pool, mtr_id)
                            A-->>F: return -EIO
                        end
                    else async queue
                        A-->>F: return 0
                    end
                end
            end
        end
    end

    rect rgb(235,255,235)
        alt caller is flow_hw_action_handle_create or flow_hw_meter_mark_compile
            alt ret == 0
                F->>F: handle = job->action
                F->>F: normal finalize path
            else ret != 0
                alt ret == -EIO (wait failure)
                    note over F: job already used by hardware, keep it to avoid leak
                    F-->>F: do not call flow_hw_job_put
                else ret != -EIO (allocation/enqueue/config error)
                    alt queue == MLX5_HW_INV_QUEUE
                        F->>F: queue = CTRL_QUEUE_ID(priv)
                    end
                    F->>J: flow_hw_job_put(priv, job, queue)
                end
            end
        end
    end

File-Level Changes

Change	Details	Files
Refactor flow_hw_meter_mark_alloc to return error codes and take an output aso_mtr pointer, tightening validation and error handling for meter mark creation.	Change flow_hw_meter_mark_alloc return type from mlx5_aso_mtr* to int and add an output parameter for the allocated aso_mtr pointer Return rte_flow_error_set codes (ENOTSUP, EINVAL, ENOMEM, EBUSY) and -EIO instead of NULL for various failure modes Add explicit check for missing meter_mark profile and treat it as EINVAL Ensure allocated ASO meter is initialized through the output pointer, including type, state, offset, pool, and initial color On ASO meter update or wait failures, free the pool entry and propagate specific error codes rather than silently failing	`drivers/net/mlx5/mlx5_flow_hw.c`
Update call sites to the new meter mark allocator API and ensure jobs are only released when appropriate, preventing leaks on ASO failures.	In flow_hw_meter_mark_compile, call the new allocator, handle non-zero return codes, and avoid releasing the job when failure occurs after enqueue/wait (-EIO case) In flow_hw_action_handle_create, use the new allocator, propagate errors similarly, and only put the job back on the queue when errors occur before ASO completion Adjust final job handling so that flow_hw_action_finalize is only called when a valid handle is produced, preventing use of incomplete actions	`drivers/net/mlx5/mlx5_flow_hw.c`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

coderabbitai · 2025-12-08T11:04:52Z

Walkthrough

The flow_hw_meter_mark_alloc function in the MLX5 network driver has been refactored to return an integer error code instead of a pointer, with the allocated ASO meter passed via an output parameter. All call sites have been updated to handle the new return type and dereference the output parameter accordingly.

Changes

Cohort / File(s)	Summary
ASO Meter Allocation Refactoring `drivers/net/mlx5/mlx5_flow_hw.c`	Changed `flow_hw_meter_mark_alloc` return type from `struct mlx5_aso_mtr ` to `int`, added `struct mlx5_aso_mtr *aso_mtr` output parameter. Updated host-port validation to return `ENOTSUP` on shared_host. Modified allocation path to assign meter entry via output parameter and return error codes (ENOMEM, EINVAL, EBUSY, -EIO). Updated all call sites (`flow_hw_meter_mark_compile`, `flow_hw_action_handle_create`, etc.) to handle int return values and dereference output pointer for meter field assignments.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Areas requiring attention:

Verify all call sites of flow_hw_meter_mark_alloc correctly pass &aso_mtr and handle the returned error code
Check that output parameter dereferences (e.g., (*aso_mtr)->offset, (*aso_mtr)->pool) are applied consistently across the function body
Validate error handling paths, particularly -EIO semantics and cleanup logic when allocation fails
Confirm assertion that output pointer is provided is appropriate for all call contexts

Poem

🐰 A meter's birth now takes a different road,
No pointer passed, but error codes bestowed,
With output params holding truth so neat,
The refactor makes the error handling sweet! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly addresses the main change: fixing a job leak during indirect meter creation failure by refactoring the flow_hw_meter_mark_alloc function to return error codes and use an output parameter.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch series_36860

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

The special handling of -EIO in flow_hw_meter_mark_alloc callers is a bit opaque; consider adding a named constant or comment to explain why this error is treated differently from other non-zero returns and where the job is expected to be released in that path.
On the mlx5_aso_mtr_wait() failure path you return -EIO without populating rte_flow_error, which makes diagnostics inconsistent compared to the other error branches in flow_hw_meter_mark_alloc; consider setting a descriptive rte_flow_error there as well.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The special handling of `-EIO` in `flow_hw_meter_mark_alloc` callers is a bit opaque; consider adding a named constant or comment to explain why this error is treated differently from other non-zero returns and where the job is expected to be released in that path.
- On the `mlx5_aso_mtr_wait()` failure path you return `-EIO` without populating `rte_flow_error`, which makes diagnostics inconsistent compared to the other error branches in `flow_hw_meter_mark_alloc`; consider setting a descriptive `rte_flow_error` there as well.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

drivers/net/mlx5/mlx5_flow_hw.c (1)

1257-1275: Indirect meter create uses new alloc API; behaviour looks good but host/validate mismatch is worth a look

In flow_hw_action_handle_create():

The meter‑mark branch now uses flow_hw_meter_mark_alloc() via an out‑param, and:

sets aso = true,

frees the job on any error except -EIO (with the MLX5_HW_INV_QUEUE → CTRL_QUEUE_ID() normalize),

derives the handle directly from job->action, and

still invokes flow_hw_action_finalize() only when handle is non‑NULL (so failed creates don’t enqueue jobs).

This aligns with the new alloc semantics and closes the job‑leak for indirect meter creation.

Two small points to sanity‑check:

As above, the ret != -EIO filter assumes that the -EIO path from flow_hw_meter_mark_alloc() has fully taken care of the job; otherwise we’d leak the mlx5_hw_q_job here.

flow_hw_meter_mark_alloc() now unconditionally rejects priv->shared_host with ENOTSUP, while flow_hw_validate_action_meter_mark() still skips the shared_host check for indirect == true. That makes validation slightly more permissive than creation for indirect actions on shared‑host ports. If that’s not intentional, consider mirroring the shared‑host restriction in the validate path as well so applications don’t see validate‑OK / create‑fail solely due to host/guest placement.

Functionally the changes look sound and targeted at the reported job leak.

Also applies to: 12719-12747

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd60dcd and 6eb45f2.

📒 Files selected for processing (1)

drivers/net/mlx5/mlx5_flow_hw.c (5 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

drivers/net/mlx5/mlx5_flow_hw.c (2)

drivers/net/mlx5/mlx5_flow.c (1)

int (205-213)

drivers/net/mlx5/mlx5_utils.c (1)

mlx5_ipool_malloc (667-732)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Sourcery review
GitHub Check: Sourcery review

🔇 Additional comments (2)

drivers/net/mlx5/mlx5_flow_hw.c (2)

1873-1934: Job‑safe ASO meter alloc; confirm -EIO/job ownership contract

The new flow_hw_meter_mark_alloc() looks correct in terms of:

enforcing host‑port usage on shared_host,

validating meter_mark->profile,

consistently freeing mtr_id in all failure paths, and

returning an int status with the allocated mlx5_aso_mtr * via aso_mtr.

The special -EIO return for the mlx5_aso_mtr_wait() failure path is relied on by callers to not call flow_hw_job_put() (they assume the wait path has already consumed or never posted the job), whereas all other error codes cause the caller to free the job.

Please double‑check that mlx5_aso_mtr_wait()’s contract is indeed:

on failure, either the WQE was never posted, or any posted job is already fully handled (including recycling the mlx5_hw_q_job), and

no further completion will later hit __flow_hw_pull_indir_action_comp() for this same job.

If mlx5_aso_mtr_wait() can return non‑zero while a still‑owned job might later complete, we could either leak or double‑free the job depending on the path. Otherwise this change is fine and fixes the original job‑leak on enqueue failure.

1947-1964: Compile‑time meter error path now releases jobs correctly; keep -EIO semantics aligned

flow_hw_meter_mark_compile() now:

captures the int status from flow_hw_meter_mark_alloc(),

calls flow_hw_job_put() for all non‑zero returns except -EIO, and

still maps any failure to return -1 for the caller.

This correctly plugs the previous leak for enqueue‑time failures (e.g. EBUSY, ENOMEM), since the job is explicitly returned to the pool.

As with the alloc helper, this relies on -EIO meaning “sync wait path has already dealt with the job”; otherwise we would leak it here. Please verify that flow_hw_meter_mark_alloc() only returns -EIO in those cases and that no queued job remains outstanding then.

sourcery-ai bot reviewed Dec 8, 2025

View reviewed changes

coderabbitai bot reviewed Dec 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PWCI] "[v1] net/mlx5: fix job leak on indirect meter creation failure" #534

[PWCI] "[v1] net/mlx5: fix job leak on indirect meter creation failure" #534

Uh oh!

ovsrobot commented Dec 8, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

sourcery-ai bot commented Dec 8, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai bot commented Dec 8, 2025 •

edited

Loading

Uh oh!

sourcery-ai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[PWCI] "[v1] net/mlx5: fix job leak on indirect meter creation failure" #534

Are you sure you want to change the base?

[PWCI] "[v1] net/mlx5: fix job leak on indirect meter creation failure" #534

Uh oh!

Conversation

ovsrobot commented Dec 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Summary by CodeRabbit

Uh oh!

sourcery-ai bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for meter mark allocation and job lifecycle error handling

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ovsrobot commented Dec 8, 2025 •

edited by coderabbitai bot

Loading

sourcery-ai bot commented Dec 8, 2025 •

edited

Loading

coderabbitai bot commented Dec 8, 2025 •

edited

Loading