Skip to content

[Efficiency Improver] Fix O(n2) hash collisions in DuplicateDataRowAnalyzer #7796

@github-actions

Description

@github-actions

🤖 This is an automated PR from Daily Efficiency Improver, an AI assistant focused on reducing the energy consumption and computational footprint of this repository.


Goal and Rationale

DuplicateDataRowAnalyzer uses a Dictionary<ImmutableArray<TypedConstant>, int> to detect duplicate [DataRow] attributes on a test method. The custom TypedConstantArrayComparer.GetHashCode only hashed Kind and Type — but not the actual Value.

This meant that for any test method with multiple [DataRow(int)] attributes (e.g. [DataRow(1)], [DataRow(2)], [DataRow(3)]...), every argument array produced the same hash, turning every dictionary operation into an O(n) linear scan and making the overall duplicate-detection algorithm O(n2).

Focus Area

Code-Level Efficiency — Algorithmic complexity: O(n2) where O(n) suffices.

Approach

Include the TypedConstant.Value in the hash function, matching the recursive comparison logic already present in Equals. For array-typed arguments, recursively hash the Values.

Before

public int GetHashCode(ImmutableArray<TypedConstant> obj)
{
    var hashCode = default(RoslynHashCode);
    foreach (TypedConstant typedConstant in obj)
    {
        hashCode.Add(typedConstant.Kind);
        hashCode.Add(SymbolEqualityComparer.Default.GetHashCode(typedConstant.Type));
    }
    return hashCode.ToHashCode();
}

After

public int GetHashCode(ImmutableArray<TypedConstant> obj)
{
    var hashCode = default(RoslynHashCode);
    foreach (TypedConstant typedConstant in obj)
    {
        hashCode.Add(typedConstant.Kind);
        hashCode.Add(SymbolEqualityComparer.Default.GetHashCode(typedConstant.Type));

        if (!typedConstant.IsNull)
        {
            if (typedConstant.Kind == TypedConstantKind.Array)
            {
                hashCode.Add(GetHashCode(typedConstant.Values));
            }
            else
            {
                hashCode.Add(typedConstant.Value);
            }
        }
    }
    return hashCode.ToHashCode();
}

Energy Efficiency Evidence

Proxy metric: CPU cycles / instruction count — fewer dictionary comparisons means fewer instructions executed by the Roslyn analyzer host on every build.

Scenario Before After
n DataRow attributes, same parameter type O(n2) hash comparisons O(n) — one per entry
[DataRow(1)]..[DataRow(100)] on one method ~5050 equality checks ~100 equality checks

MSTest projects routinely have data-driven test methods with dozens of [DataRow] attributes. The analyzer runs on every build — the fix directly reduces the number of CPU operations per compilation.

GSF principle: Hardware Efficiency — reducing unnecessary computation lowers energy draw per build on CI and developer machines.

Edge Cases

  • Null values: skipped (no change — IsNull is already checked before accessing Value).
  • Array arguments: recursively hashed to match the recursive Equals path.
  • 0.0f vs -0.0f: Equals treats these as different (via BitConverter); float.GetHashCode() returns different values for them in .NET, so the hash contract Equals → same hash is preserved. Even if two platforms hash them the same, this is only a collision (allowed), not a contract violation.

Trade-offs

  • Minimal added complexity: one if/else block.
  • Slightly more work per element in GetHashCode, but vastly fewer dictionary collisions compensate.

Reproducibility

To observe the improvement, benchmark a method with 100+ DataRow attributes of the same primitive type and compare Roslyn analysis time before/after.

Test Status

All 11 existing DuplicateDataRowAnalyzerTests pass, including the float/double negative-zero edge cases.

Note

🔒 Integrity filter blocked 6 items

The following items were blocked because they don't meet the GitHub integrity level.

  • #2340 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #757 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #6335 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #3759 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #7787 list_pull_requests: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #6611 list_pull_requests: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

Generated by Daily Efficiency Improver · ● 8.5M ·


Note

This was originally intended as a pull request, but GitHub Actions is not permitted to create or approve pull requests in this repository.
The changes have been pushed to branch efficiency/duplicate-datarow-hashcode-fix-4a6e49ff096d33a2.

Click here to create the pull request

To fix the permissions issue, go to SettingsActionsGeneral and enable Allow GitHub Actions to create and approve pull requests. See also: gh-aw FAQ

Show patch preview (53 of 53 lines)
From 80902e574a6df2008c412dbd20ff62caafda5568 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]" <github-actions[bot]@users.noreply.github.com>
Date: Fri, 24 Apr 2026 04:35:19 +0000
Subject: [PATCH] =?UTF-8?q?perf(analyzers):=20fix=20O(n=C2=B2)=20hash=20co?=
 =?UTF-8?q?llisions=20in=20DuplicateDataRowAnalyzer?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The TypedConstantArrayComparer.GetHashCode only hashed Kind and Type,
ignoring the actual value. This caused all DataRow attributes of the
same type (e.g., [DataRow(1)], [DataRow(2)], ...) to hash to the same
bucket, turning dictionary lookups into O(n) scans and the overall
duplicate-detection into O(n²).

Fix: include the TypedConstant Value (and recursive Values for arrays)
in the hash, so distinct DataRow arguments produce distinct buckets.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 .../MSTest.Analyzers/DuplicateDataRowAnalyzer.cs  | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/src/Analyzers/MSTest.Analyzers/DuplicateDataRowAnalyzer.cs b/src/Analyzers/MSTest.Analyzers/DuplicateDataRowAnalyzer.cs
index 6d560c7..99154d5 100644
--- a/src/Analyzers/MSTest.Analyzers/DuplicateDataRowAnalyzer.cs
+++ b/src/Analyzers/MSTest.Analyzers/DuplicateDataRowAnalyzer.cs
@@ -153,6 +153,21 @@ public int GetHashCode(ImmutableArray<TypedConstant> obj)
             {
                 hashCode.Add(typedConstant.Kind);
                 hashCode.Add(SymbolEqualityComparer.Default.GetHashCode(typedConstant.Type));
+
+                if (!typedConstant.IsNull)
+                {
+                    if (typedConstant.Kind == TypedConstantKind.Array)
+                    {
+                        // Recursively hash array elements to match recursive equality check.
+                        hashCode.Add(GetHashCode(typedConstant.Values));
+                    }
+                    else
+                    {
+                        // In
... (truncated)

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions