Skip to content

Conversation

@ovsrobot
Copy link
Owner

@ovsrobot ovsrobot commented Dec 8, 2025

NOTE: This is an auto submission for "[v1] net/mlx5: fix job leak on indirect meter creation failure".

See "http://patchwork.dpdk.org/project/dpdk/list/?series=36860" for details.

Summary by Sourcery

Fix handling of meter mark indirect action creation failures to avoid leaking hardware queue jobs and improve error propagation.

Bug Fixes:

  • Ensure hardware queue jobs are properly released when ASO meter mark allocation or update fails, preventing job leaks on indirect meter creation failure.
  • Return appropriate error codes and avoid finalizing actions when meter mark creation does not produce a valid handle.

Enhancements:

  • Tighten validation for meter mark actions, including shared host restrictions and missing profiles, and propagate detailed errors through the meter allocation helper.

Summary by CodeRabbit

  • Refactor
    • Updated MLX5 network driver's meter allocation and error handling with refined validation checks, improved resource cleanup procedures, and enhanced failure scenario management to ensure more consistent and robust operation across allocation workflows.

✏️ Tip: You can customize this high-level summary in your review settings.

Indirect meter_mark action needs to allocate a job to track
asynchronous HW operation to create the meter object.

When meter_mark creation failed, the job may have been leaked
because there was no job cleanup code for sync API.

Add necessary code to check if meter_mark creation failed before or
after HW operation is enqueued and call job_put accordingly.

Fixes: 4359d9d ("net/mlx5: fix sync meter processing in HWS")
Cc: getelson@nvidia.com
Cc: stable@dpdk.org

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Signed-off-by: 0-day Robot <robot@bytheb.org>
@sourcery-ai
Copy link

sourcery-ai bot commented Dec 8, 2025

Reviewer's Guide

Refactors meter mark allocation to return detailed error codes instead of NULL, propagates those errors through indirect meter creation paths, and ensures hardware queue jobs are properly released only when appropriate to avoid leaks on ASO meter failures.

Sequence diagram for meter mark allocation and job lifecycle error handling

sequenceDiagram
    participant F as flow_hw_action_handle_create
    participant C as flow_hw_meter_mark_compile
    participant A as flow_hw_meter_mark_alloc
    participant P as mlx5_ipool_malloc
    participant U as mlx5_aso_meter_update_by_wqe
    participant W as mlx5_aso_mtr_wait
    participant J as flow_hw_job_put
    participant I as mlx5_ipool_free

    rect rgb(235,235,255)
        F->>F: allocate job (flow_hw_action_job_init)
        alt action_type is METER_MARK (indirect handle)
            F->>A: ret = flow_hw_meter_mark_alloc(dev, queue, action, job, push, &aso_mtr, error)
        else indirect meter creation via compile
            C->>C: allocate job (flow_hw_action_job_init)
            C->>A: ret = flow_hw_meter_mark_alloc(dev, queue, action, job, true, &aso_mtr, error)
        end
    end

    rect rgb(245,245,245)
        alt shared_host is true
            A-->>F: return ENOTSUP (<0)
        else meter_mark profile is NULL
            A-->>F: return EINVAL (<0)
        else normal allocation path
            A->>P: aso_mtr = mlx5_ipool_malloc(idx_pool, &mtr_id)
            alt allocation fails
                A->>I: mlx5_ipool_free(idx_pool, mtr_id) if mtr_id != 0
                A-->>F: return ENOMEM (<0)
            else allocation succeeds
                A->>A: initialize aso_mtr fields
                A->>U: rc = mlx5_aso_meter_update_by_wqe(..., aso_mtr, job, push)
                alt update enqueue fails
                    A->>I: mlx5_ipool_free(idx_pool, mtr_id)
                    A-->>F: return EBUSY (<0)
                else update enqueue succeeds
                    alt queue == MLX5_HW_INV_QUEUE
                        A->>W: rc = mlx5_aso_mtr_wait(priv, aso_mtr, true)
                        alt wait succeeds
                            A-->>F: return 0
                        else wait fails
                            A->>I: mlx5_ipool_free(idx_pool, mtr_id)
                            A-->>F: return -EIO
                        end
                    else async queue
                        A-->>F: return 0
                    end
                end
            end
        end
    end

    rect rgb(235,255,235)
        alt caller is flow_hw_action_handle_create or flow_hw_meter_mark_compile
            alt ret == 0
                F->>F: handle = job->action
                F->>F: normal finalize path
            else ret != 0
                alt ret == -EIO (wait failure)
                    note over F: job already used by hardware, keep it to avoid leak
                    F-->>F: do not call flow_hw_job_put
                else ret != -EIO (allocation/enqueue/config error)
                    alt queue == MLX5_HW_INV_QUEUE
                        F->>F: queue = CTRL_QUEUE_ID(priv)
                    end
                    F->>J: flow_hw_job_put(priv, job, queue)
                end
            end
        end
    end
Loading

File-Level Changes

Change Details Files
Refactor flow_hw_meter_mark_alloc to return error codes and take an output aso_mtr pointer, tightening validation and error handling for meter mark creation.
  • Change flow_hw_meter_mark_alloc return type from mlx5_aso_mtr* to int and add an output parameter for the allocated aso_mtr pointer
  • Return rte_flow_error_set codes (ENOTSUP, EINVAL, ENOMEM, EBUSY) and -EIO instead of NULL for various failure modes
  • Add explicit check for missing meter_mark profile and treat it as EINVAL
  • Ensure allocated ASO meter is initialized through the output pointer, including type, state, offset, pool, and initial color
  • On ASO meter update or wait failures, free the pool entry and propagate specific error codes rather than silently failing
drivers/net/mlx5/mlx5_flow_hw.c
Update call sites to the new meter mark allocator API and ensure jobs are only released when appropriate, preventing leaks on ASO failures.
  • In flow_hw_meter_mark_compile, call the new allocator, handle non-zero return codes, and avoid releasing the job when failure occurs after enqueue/wait (-EIO case)
  • In flow_hw_action_handle_create, use the new allocator, propagate errors similarly, and only put the job back on the queue when errors occur before ASO completion
  • Adjust final job handling so that flow_hw_action_finalize is only called when a valid handle is produced, preventing use of incomplete actions
drivers/net/mlx5/mlx5_flow_hw.c

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link

coderabbitai bot commented Dec 8, 2025

Walkthrough

The flow_hw_meter_mark_alloc function in the MLX5 network driver has been refactored to return an integer error code instead of a pointer, with the allocated ASO meter passed via an output parameter. All call sites have been updated to handle the new return type and dereference the output parameter accordingly.

Changes

Cohort / File(s) Summary
ASO Meter Allocation Refactoring
drivers/net/mlx5/mlx5_flow_hw.c
Changed flow_hw_meter_mark_alloc return type from struct mlx5_aso_mtr * to int, added struct mlx5_aso_mtr **aso_mtr output parameter. Updated host-port validation to return ENOTSUP on shared_host. Modified allocation path to assign meter entry via output parameter and return error codes (ENOMEM, EINVAL, EBUSY, -EIO). Updated all call sites (flow_hw_meter_mark_compile, flow_hw_action_handle_create, etc.) to handle int return values and dereference output pointer for meter field assignments.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Areas requiring attention:

  • Verify all call sites of flow_hw_meter_mark_alloc correctly pass &aso_mtr and handle the returned error code
  • Check that output parameter dereferences (e.g., (*aso_mtr)->offset, (*aso_mtr)->pool) are applied consistently across the function body
  • Validate error handling paths, particularly -EIO semantics and cleanup logic when allocation fails
  • Confirm assertion that output pointer is provided is appropriate for all call contexts

Poem

🐰 A meter's birth now takes a different road,
No pointer passed, but error codes bestowed,
With output params holding truth so neat,
The refactor makes the error handling sweet! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main change: fixing a job leak during indirect meter creation failure by refactoring the flow_hw_meter_mark_alloc function to return error codes and use an output parameter.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch series_36860

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The special handling of -EIO in flow_hw_meter_mark_alloc callers is a bit opaque; consider adding a named constant or comment to explain why this error is treated differently from other non-zero returns and where the job is expected to be released in that path.
  • On the mlx5_aso_mtr_wait() failure path you return -EIO without populating rte_flow_error, which makes diagnostics inconsistent compared to the other error branches in flow_hw_meter_mark_alloc; consider setting a descriptive rte_flow_error there as well.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The special handling of `-EIO` in `flow_hw_meter_mark_alloc` callers is a bit opaque; consider adding a named constant or comment to explain why this error is treated differently from other non-zero returns and where the job is expected to be released in that path.
- On the `mlx5_aso_mtr_wait()` failure path you return `-EIO` without populating `rte_flow_error`, which makes diagnostics inconsistent compared to the other error branches in `flow_hw_meter_mark_alloc`; consider setting a descriptive `rte_flow_error` there as well.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
drivers/net/mlx5/mlx5_flow_hw.c (1)

1257-1275: Indirect meter create uses new alloc API; behaviour looks good but host/validate mismatch is worth a look

In flow_hw_action_handle_create():

  • The meter‑mark branch now uses flow_hw_meter_mark_alloc() via an out‑param, and:

    • sets aso = true,
    • frees the job on any error except -EIO (with the MLX5_HW_INV_QUEUECTRL_QUEUE_ID() normalize),
    • derives the handle directly from job->action, and
    • still invokes flow_hw_action_finalize() only when handle is non‑NULL (so failed creates don’t enqueue jobs).
  • This aligns with the new alloc semantics and closes the job‑leak for indirect meter creation.

Two small points to sanity‑check:

  1. As above, the ret != -EIO filter assumes that the -EIO path from flow_hw_meter_mark_alloc() has fully taken care of the job; otherwise we’d leak the mlx5_hw_q_job here.

  2. flow_hw_meter_mark_alloc() now unconditionally rejects priv->shared_host with ENOTSUP, while flow_hw_validate_action_meter_mark() still skips the shared_host check for indirect == true. That makes validation slightly more permissive than creation for indirect actions on shared‑host ports. If that’s not intentional, consider mirroring the shared‑host restriction in the validate path as well so applications don’t see validate‑OK / create‑fail solely due to host/guest placement.

Functionally the changes look sound and targeted at the reported job leak.

Also applies to: 12719-12747

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd60dcd and 6eb45f2.

📒 Files selected for processing (1)
  • drivers/net/mlx5/mlx5_flow_hw.c (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
drivers/net/mlx5/mlx5_flow_hw.c (2)
drivers/net/mlx5/mlx5_flow.c (1)
  • int (205-213)
drivers/net/mlx5/mlx5_utils.c (1)
  • mlx5_ipool_malloc (667-732)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Sourcery review
  • GitHub Check: Sourcery review
🔇 Additional comments (2)
drivers/net/mlx5/mlx5_flow_hw.c (2)

1873-1934: Job‑safe ASO meter alloc; confirm -EIO/job ownership contract

The new flow_hw_meter_mark_alloc() looks correct in terms of:

  • enforcing host‑port usage on shared_host,
  • validating meter_mark->profile,
  • consistently freeing mtr_id in all failure paths, and
  • returning an int status with the allocated mlx5_aso_mtr * via aso_mtr.

The special -EIO return for the mlx5_aso_mtr_wait() failure path is relied on by callers to not call flow_hw_job_put() (they assume the wait path has already consumed or never posted the job), whereas all other error codes cause the caller to free the job.

Please double‑check that mlx5_aso_mtr_wait()’s contract is indeed:

  • on failure, either the WQE was never posted, or any posted job is already fully handled (including recycling the mlx5_hw_q_job), and
  • no further completion will later hit __flow_hw_pull_indir_action_comp() for this same job.

If mlx5_aso_mtr_wait() can return non‑zero while a still‑owned job might later complete, we could either leak or double‑free the job depending on the path. Otherwise this change is fine and fixes the original job‑leak on enqueue failure.


1947-1964: Compile‑time meter error path now releases jobs correctly; keep -EIO semantics aligned

flow_hw_meter_mark_compile() now:

  • captures the int status from flow_hw_meter_mark_alloc(),
  • calls flow_hw_job_put() for all non‑zero returns except -EIO, and
  • still maps any failure to return -1 for the caller.

This correctly plugs the previous leak for enqueue‑time failures (e.g. EBUSY, ENOMEM), since the job is explicitly returned to the pool.

As with the alloc helper, this relies on -EIO meaning “sync wait path has already dealt with the job”; otherwise we would leak it here. Please verify that flow_hw_meter_mark_alloc() only returns -EIO in those cases and that no queued job remains outstanding then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants