Skip to content

Conversation

@MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Nov 18, 2025

I was writing a Microsoft query and was once again bitten by the fact that, when you look at a field access such as:

x.f

The read step from x to f is probably a step that reads a DataFlow::FieldContent ... but it may also be reading a DataFlow::UnionContent. And in order to know which one to use you have to check the type of x.

This PR improves this experience by making FieldContent work for both union and non-union fields. It even simplifies some of the internal dataflow code as well 🎉

There should be no observable changes as a result of this (other than FieldContent now magically starts working for unions as well).

Commit-by-commit review recommended.

@github-actions github-actions bot added the C++ label Nov 18, 2025
```
Evaluated relational algebra for predicate DataFlowPrivate::storeStepImpl/4#b2c79f9a@13be12rc with tuple counts:
           9   ~0%    {3} r1 = JOIN `FlowSummaryImpl::Private::Steps::summaryStoreStep/3#5c2d4899` WITH DataFlowUtil::TFlowSummaryNode#40da8361 ON FIRST 1 OUTPUT Lhs.2, Lhs.1, Rhs.1
           9   ~0%    {4}    | JOIN WITH DataFlowUtil::TFlowSummaryNode#40da8361 ON FIRST 1 OUTPUT Lhs.2, Lhs.1, Rhs.1, _
           9  ~12%    {4}    | REWRITE WITH Out.3 := true

     1853420   ~0%    {3} r2 = SCAN `DataFlowPrivate::nodeHasInstruction/3#f469bb06` OUTPUT In.1, In.0, In.2
      100282   ~0%    {3}    | JOIN WITH `Instruction::StoreInstruction.getDestinationAddressOperand/0#dispred#596a4aba` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2
      127910   ~0%    {6}    | JOIN WITH `DataFlowPrivate::numberOfLoadsFromOperand/4#7e555666_1023#join_rhs` ON FIRST 1 OUTPUT _, Lhs.1, Rhs.1, Rhs.3, Lhs.2, Rhs.2
      127910   ~0%    {4}    | REWRITE WITH Tmp.0 := 1, Out.0 := (Tmp.0 + In.4 + In.5) KEEPING 4
  4178182721   ~1%    {4}    | JOIN WITH `DataFlowUtil::FieldContent.getIndirectionIndex/0#dispred#cc69866f_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3
  4290552803   ~0%    {5}    | JOIN WITH `DataFlowUtil::FieldContent.getAField/0#dispred#ba1c91e5` ON FIRST 1 OUTPUT Lhs.2, Lhs.1, Lhs.3, Lhs.0, Rhs.1
  3033745816   ~5%    {7}    | JOIN WITH DataFlowUtil::PostFieldUpdateNode#b86f3a84_1023#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Rhs.2, Rhs.3
  3033745816   ~3%    {9}    | JOIN WITH DataFlowUtil::TPostUpdateNodeImpl#f5e76b7a_21#join_rhs ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.0, Lhs.5, Lhs.6, Rhs.1, _
                      {8}    | REWRITE WITH Tmp.8 := 1, TEST InOut.7 = Tmp.8 KEEPING 8
  1516872908   ~0%    {7}    | SCAN OUTPUT In.4, In.5, In.6, In.0, In.1, In.2, In.3
  2409090286   ~1%    {6}    | JOIN WITH DataFlowUtil::PostFieldUpdateNode#b86f3a84_0231#join_rhs ON FIRST 3 OUTPUT Rhs.3, Lhs.6, Lhs.3, Lhs.4, Lhs.5, Lhs.0
       66016  ~45%    {4}    | JOIN WITH `DataFlowUtil::FieldAddress.getField/0#dispred#bdd01c1a` ON FIRST 2 OUTPUT Lhs.2, Lhs.4, Lhs.5, Lhs.3

       66025  ~45%    {4} r3 = r1 UNION r2
                      return r3
```
@MathiasVP MathiasVP marked this pull request as ready for review November 19, 2025 17:17
@MathiasVP MathiasVP requested a review from a team as a code owner November 19, 2025 17:17
Copilot AI review requested due to automatic review settings November 19, 2025 17:17
Copilot finished reviewing on behalf of MathiasVP November 19, 2025 17:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the dataflow content hierarchy by introducing a common base class FieldContent that unifies handling of both union and non-union field accesses. This simplifies querying for field accesses since callers no longer need to check whether they're dealing with a union or struct field.

Key changes:

  • Introduced FieldContent as a common base class with getAField() and getField() methods
  • Renamed the original FieldContent to NonUnionFieldContent
  • Made UnionContent extend the new FieldContent base class

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll Introduces the new type hierarchy with FieldContent as base class, renames FieldContent to NonUnionFieldContent, and makes UnionContent extend FieldContent
cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowPrivate.qll Updates store and read steps to use unified FieldContent type, simplifying the logic by using getAField() method; adds helper function for late-binding indirection index
cpp/ql/src/utils/modelgenerator/internal/CaptureModels.qll Updates references to use NonUnionFieldContent where specific type is needed and simplifies isField predicate to use the new base class

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@MathiasVP MathiasVP force-pushed the union-content-field-content-common-base-class branch from a50db84 to 6c4def1 Compare November 19, 2025 17:24
Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MathiasVP MathiasVP merged commit 14f9997 into github:main Nov 20, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants