Skip to content

Conversation

MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Apr 13, 2024

For performance reasons dataflow "compresses" a sequence of local flow steps into a larger "big step" relation as part of each subsequent dataflow traversal. This means the expensive parts of dataflow has to deal with smaller relations, but it also makes the computed paths less intelligible. This PR ensures that the sequence is always broken when it reaches a right-hand side of any Store instruction.

For example, if we have the following flow:

int x = source();
int y = x;
sink(y);

On main we'd get a simple step from source() to sink(y) in the path graph. After this PR, we'd get two steps: one from source() to x on the second line, and one from x to sink(y).

This matches what Ruby does here.

@MathiasVP MathiasVP requested a review from a team as a code owner April 13, 2024 00:06
@github-actions github-actions bot added the C++ label Apr 13, 2024
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Apr 13, 2024
@MathiasVP MathiasVP added the depends on internal PR This PR should only be merged in sync with an internal Semmle PR label Apr 13, 2024
Comment on lines 10 to 11
| test.cpp:21:13:21:18 | call to malloc | test.cpp:21:13:21:18 | call to malloc | provenance | |
| test.cpp:22:5:22:7 | *arr [p] | test.cpp:24:12:24:14 | arr [p] | provenance | |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this do what we want it to do? It seems to create a cycle, but not an edge that actually show the flow from the malloc into arr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this PR creates a single then there's a cycle in dataflow (which is very possible!) since this change simply unhides nodes in the final dataflow path. I'll be happy to investigate this as a follow-up.

I pushed a new commit to this PR that slightly changes which nodes are being shown to prevent a long chain of use-use from showing in the graph. This means the .expected file here is slightly changed, although there's still a possibility of an unexpected cycle on main

@MathiasVP
Copy link
Contributor Author

MathiasVP commented Apr 16, 2024

@jketema I pushed a change here that modified which nodes are shown in the path. With 2cbc59b we'd show a long sequence of use-use flow in examples such as:

int x = source();
int y1 = x;
int y2 = x;
...
int yN = x;
sink(yN);

since the store instruction's right-hand side would be x. In 8630630 I changed this so that the StoreInstruction is the one that's shown in the path graph. This means we'd still only get a final step from source() to sink(yN) since all the intermediate nodes never flow to the Store instructions. However, the final Store produced by:

int yN = x;

will be in the path.

When I was looking at paths earlier this gives a better UX, and this is also more closely aligned with the strategy Ruby/C# uses.

@jketema
Copy link
Contributor

jketema commented Apr 16, 2024

It seems that a lot of (implicit or explicit?) c-style casts now also become explicit in the paths?

@MathiasVP
Copy link
Contributor Author

It seems that a lot of (implicit or explicit?) c-style casts now also become explicit in the paths?

Yeah, I think this actually exposes a problem with asDefinition as that predicate should only return unconverted expressions 🤔

@jketema
Copy link
Contributor

jketema commented Apr 16, 2024

It seems that a lot of (implicit or explicit?) c-style casts now also become explicit in the paths?

Yeah, I think this actually exposes a problem with asDefinition as that predicate should only return unconverted expressions 🤔

Do you want to investigate this as part of this PR, or should we merge this and consider that a follow-up?

@MathiasVP
Copy link
Contributor Author

I'd prefer doing that as follow-up, if you're okay with it. I think there's value in these slightly more comprehensible path explanations even though they also show some casts that we didn't intend.

@MathiasVP MathiasVP merged commit 1847a6d into github:main Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ depends on internal PR This PR should only be merged in sync with an internal Semmle PR no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants