Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotator for local variables and function paramters #128

Merged
merged 10 commits into from Aug 2, 2020
Merged

Annotator for local variables and function paramters #128

merged 10 commits into from Aug 2, 2020

Conversation

NirmalManoj
Copy link
Member

@NirmalManoj NirmalManoj commented Jul 30, 2020

Detailed description

This PR implements an annotator for annotating local variables and function parameters. The annotator uses the color attribute in the XML for classifying variables as local variables and function parameters.

...

Test plan

  1. Fetch PR #17375 from radare2 and compile it.
  2. Load any function (test functions with local variables and functions without local variables).
  3. Pipe the output of pdgj~{} to a file and open it in an editor of your choice.
  4. Because of the order of annotations, syntax highlight annotator with come right after local variable anntoation.
  5. Search for "syntax_highlight": "function_parameter" and "syntax_highlight": "local_variable". Make sure that you are seeing the exact local variable that the syntax highlight points in the annotation above it. For example, it should like this.
    {
      "start": 551,
      "end": 558,
      "type": "function_parameter",
      "name": "arg_10h"
    },
    {
      "start": 551,
      "end": 558,
      "type": "syntax_highlight",
      "syntax_highlight": "function_parameter"
    }, 
  1. Then search for "type": "function_parameter" and "type": "local_variable" and make sure that you can see the corresponding syntax highlight annotation right after that.

...

Closing issues

...

@NirmalManoj NirmalManoj changed the base branch from master to decompiler-refactoring Jul 30, 2020
@NirmalManoj NirmalManoj self-assigned this Jul 30, 2020
@NirmalManoj NirmalManoj changed the title Local variables Annotator for local variables and function paramters Jul 30, 2020
if (color == "param")
annotation.type = R_CODE_ANNOTATION_TYPE_FUNCTION_PARAMETER;
else if (color == "var")
annotation.type = R_CODE_ANNOTATION_TYPE_LOCAL_VARIABLE;
out->push_back(annotation);
Copy link
Member

@karliss karliss Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not make decisions based on color.

Copy link
Member Author

@NirmalManoj NirmalManoj Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karliss @thestr4ng3r I had seen the decision that leads to color. For PR #124 , I have used info from here. https://github.com/thestr4ng3r/ghidra/blob/6c10f36f06468f866188cccf960c019779fb9028/Ghidra/Features/Decompiler/src/decompile/cpp/variable.cc#L452-L470

In it (variable.cc), the same logic (if I didn't miss out anything) is used to classify local variables and parameters. This is the reason why I used color to classify a variable as a parameter and local variable. Have you seen examples where color is mistakenly annotated?

Copy link
Member Author

@NirmalManoj NirmalManoj Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that in some cases, the XML has just the color of the variable e.g. <variable color="param">param_1</variable>
In other cases where variable has an attribute varref, it is first checked to make sure that it's not a global variable or a constant variable.

Copy link
Member Author

@NirmalManoj NirmalManoj Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier, I forgot to add the following check, I have added it now.

	else if (!varnode->getHigh()->isPersist() && (varnode->getHigh()->getSymbol() != (Symbol *)0))
		AnnotateLocalVariable(node, out);

Copy link
Member

@thestr4ng3r thestr4ng3r Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know the place in printc.cc where these cases without varref like <variable color="param">param_1</variable> are emitted?

Copy link
Member Author

@NirmalManoj NirmalManoj Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thestr4ng3r Do you agree that the current code will work in cases where we have varref? From what I understand, in cases where we have varref, we are doing exactly what's done in variable.cc(#L452-L470) when we have the check

	else if (!varnode->getHigh()->isPersist() && (varnode->getHigh()->getSymbol() != (Symbol *)0))
		AnnotateLocalVariable(node, out);

Copy link
Member

@thestr4ng3r thestr4ng3r Aug 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Yes it would work but using color is still conceptually wrong. Moreover it currently uses the printed text node.child_value() as the variable name which is only marginally better than just "grepping".
What you do get however in the case of the declaration is the following:

<vardecl symref="0x400001000000000d">
        <type color="type" id="0x80feab8523c497cf">void</type>
        <syntax></syntax>
        <op>*</op>
        <syntax></syntax>
        <variable color="var">pvVar2</variable>
</vardecl>

So you can get the true info about the symbol itself from the parent's symref:

<symbol name="pvVar2" id="0x400001000000000d" cat="-1">
...

which also has the cat field indicating whether it is a parameter or regular variable.

Copy link
Member Author

@NirmalManoj NirmalManoj Aug 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, this is really helpful. Here's what I am planning to do now.
Make a Symbol_ID -> Symbol map.

  1. If varref exists, get Symbol from the varnode by varnode->getHigh()->getSymbol().
  2. Otherwise, get symbol from the map.

In order to get the symref, I will do node.parent() which will give the vardecl node and then get the attribute symref.

Copy link
Member

@thestr4ng3r thestr4ng3r Aug 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that approach sounds good!

Copy link
Member Author

@NirmalManoj NirmalManoj Aug 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thestr4ng3r @karliss I have pushed changes now.

Copy link
Member

@thestr4ng3r thestr4ng3r left a comment

Also second the request not to decide on color.

src/CodeXMLParse.cpp Outdated Show resolved Hide resolved
@thestr4ng3r
Copy link
Member

thestr4ng3r commented Jul 30, 2020

@NirmalManoj NirmalManoj requested review from thestr4ng3r and karliss Aug 2, 2020
Copy link
Member

@thestr4ng3r thestr4ng3r left a comment

Please fix the tests and add one where a "local_variable" can be seen in the json. Then looks good to me.

@NirmalManoj
Copy link
Member Author

NirmalManoj commented Aug 2, 2020

@thestr4ng3r I have added tests.

karliss
karliss approved these changes Aug 2, 2020
@thestr4ng3r thestr4ng3r merged commit 6104ebf into rizinorg:decompiler-refactoring Aug 2, 2020
3 checks passed
@NirmalManoj NirmalManoj moved this from In progress to Done in Improving Decompiler Widget (GSoC) Aug 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants