New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the construction of DFG #88
Comments
We first keep a state table for all variables to indicate the last variable assignment position. And then we enumerate each variable in AST to decide whether their values changes. If their value changes (e.g. "a" in a = b + 1), the state table will update the position of "a" and record the value flow of "a" (i.e. "a" comes from "b"+1). If their value don't change (e.g. "b" in a = b + 1), we just need to record the value flow of "b" (i.e. "b" comes form the position of "b" in state table). |
Thanks for your reply! This seems to have errors in extreme cases, for example, unused statements that
|
For this example, I don't find any problem according to my reply. |
Thanks for the quick reply. The first statement is overwritten by the second statement and is therefore invalid, so the two data-dependent edges introduced according to the first statement are also meaningless. Of course, this is a relatively rare case (but extracting the graph features as accurately as possible is of great importance for the subsequent processing). |
Two data-dependent edges introduced to the first statement is very important. This's also one of |
From Figure 2, variable-alignment (dfg-to-code) considers x = 0. And, for data flow edge prediction (dfg-to-dfg), edge 7 and edge 9 will also consider the association with edge 3 (x = 0). Is it a better choice if edge 3 (x = 0) is not considered directly? |
No, I don't think so. Considering |
I think perhaps this could be artificially screened out during the preprocessing phase to focus on the more important program statements. Additionally, are there any guidance suggestions for CFG construction? |
Yes. You are indeed right and I totally agree. However, filtering these meaningless codes does not seem easy in the preprocessing phase. Therefore, we hope model can learn this feature in the pre-training phase. Thank you for this great idea.
Actually I am a NLP researcher and I don't know much about CFG. Therefore, I don't know if there are any tools that can do this. |
Thank you for your time! |
Thank you for your great work! The code is very clear and concise to read.
I would like to ask about the logic behind each function in DFG.py. I would really like to implement CFG with reference, because I think there are times when CFG might be useful to understand the code as well.
The text was updated successfully, but these errors were encountered: