About the construction of DFG #88

wangdeze18 · 2021-11-29T12:08:05Z

Thank you for your great work! The code is very clear and concise to read.

I would like to ask about the logic behind each function in DFG.py. I would really like to implement CFG with reference, because I think there are times when CFG might be useful to understand the code as well.

guoday · 2021-11-30T02:02:09Z

We first keep a state table for all variables to indicate the last variable assignment position. And then we enumerate each variable in AST to decide whether their values changes. If their value changes (e.g. "a" in a = b + 1), the state table will update the position of "a" and record the value flow of "a" (i.e. "a" comes from "b"+1). If their value don't change (e.g. "b" in a = b + 1), we just need to record the value flow of "b" (i.e. "b" comes form the position of "b" in state table).

wangdeze18 · 2021-11-30T06:36:54Z

Thanks for your reply! This seems to have errors in extreme cases, for example, unused statements that

a = b + 1
a = c * 2

guoday · 2021-11-30T06:45:05Z

For this example, I don't find any problem according to my reply.
The first "a" (0,0) will come from "b"(0,2) and "1"(0,4).
The second "a" (1,0) will come from "c"(1,2) and "2"(1,4).

wangdeze18 · 2021-11-30T07:11:20Z

Thanks for the quick reply. The first statement is overwritten by the second statement and is therefore invalid, so the two data-dependent edges introduced according to the first statement are also meaningless. Of course, this is a relatively rare case (but extracting the graph features as accurately as possible is of great importance for the subsequent processing).

guoday · 2021-11-30T07:18:04Z

Two data-dependent edges introduced to the first statement is very important. This's also one of
our motivation for leveraging data flow. It can help to find dead code. As shown in Figure 1 of the paper, from the data flow, we can easily know that x=0 is a dead code and can help model ignore the statement.

wangdeze18 · 2021-11-30T07:32:52Z

From Figure 2, variable-alignment (dfg-to-code) considers x = 0. And, for data flow edge prediction (dfg-to-dfg), edge 7 and edge 9 will also consider the association with edge 3 (x = 0). Is it a better choice if edge 3 (x = 0) is not considered directly?

guoday · 2021-11-30T07:41:33Z

No, I don't think so. Considering x=0 in data flow can help model know that x=0 doesn‘t contribute return x since there's no path between x^3 and x^11 in the data flow. Therefore, the model can know that x=0 is a dead code and ignore it. The model will not be easily affected by dead codes and will be more robust.

wangdeze18 · 2021-11-30T07:47:46Z

I think perhaps this could be artificially screened out during the preprocessing phase to focus on the more important program statements. Additionally, are there any guidance suggestions for CFG construction？

guoday · 2021-11-30T07:54:06Z

"I think perhaps this could be artificially screened out during the preprocessing phase to focus on the more important program statements. "

Yes. You are indeed right and I totally agree. However, filtering these meaningless codes does not seem easy in the preprocessing phase. Therefore, we hope model can learn this feature in the pre-training phase. Thank you for this great idea.

"Additionally, are there any guidance suggestions for CFG construction？"

Actually I am a NLP researcher and I don't know much about CFG. Therefore, I don't know if there are any tools that can do this.

wangdeze18 · 2021-11-30T08:03:08Z

Thank you for your time！

guody5 closed this as completed Nov 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the construction of DFG #88

About the construction of DFG #88

wangdeze18 commented Nov 29, 2021

guoday commented Nov 30, 2021

wangdeze18 commented Nov 30, 2021

guoday commented Nov 30, 2021 •

edited

wangdeze18 commented Nov 30, 2021

guoday commented Nov 30, 2021 •

edited

wangdeze18 commented Nov 30, 2021

guoday commented Nov 30, 2021 •

edited

wangdeze18 commented Nov 30, 2021

guoday commented Nov 30, 2021 •

edited

wangdeze18 commented Nov 30, 2021

About the construction of DFG #88

About the construction of DFG #88

Comments

wangdeze18 commented Nov 29, 2021

guoday commented Nov 30, 2021

wangdeze18 commented Nov 30, 2021

guoday commented Nov 30, 2021 • edited

wangdeze18 commented Nov 30, 2021

guoday commented Nov 30, 2021 • edited

wangdeze18 commented Nov 30, 2021

guoday commented Nov 30, 2021 • edited

wangdeze18 commented Nov 30, 2021

guoday commented Nov 30, 2021 • edited

wangdeze18 commented Nov 30, 2021

guoday commented Nov 30, 2021 •

edited

guoday commented Nov 30, 2021 •

edited

guoday commented Nov 30, 2021 •

edited

guoday commented Nov 30, 2021 •

edited