Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion in inst2vec_preprocess.py when reading code #28

Closed
island255 opened this issue Aug 31, 2020 · 6 comments
Closed

Confusion in inst2vec_preprocess.py when reading code #28

island255 opened this issue Aug 31, 2020 · 6 comments

Comments

@island255
Copy link
Contributor

When I reading code in inst2vec_preprocess.py, I find in line 865 that
assert check is not None, "Could not match argument list in:\n" + line + "\nFunction:\n" + func_name
may have to change to
assert check is None.
But I'm confused and don't know whether I should change it.


def get_num_args_func(line, func_name=None):
    """
    Get the number of arguments in a line containing a function
    :param line: LLVM IR line
    :param func_name: function name
    :return num_args: number of arguments
            arg_list: list of arguments
    """
    modif_line = re.sub(r'<[^<>]+>', '', line)  # commas in vectors/arrays should not be counted as argument-separators
    arg_list_ = find_outer_most_last_parenthesis(modif_line)  # get last parenthesis
    if arg_list_ is None:
        # Make sure that this is the case because the function has no arguments
        # and not because there was in error in regex matching
        check = re.match(rgx.func_call_pattern + r'\(\)', modif_line)
        **_assert check is not None, "Could not match argument list in:\n" + line + "\nFunction:\n" + func_name_**
        num_args = 0
        arg_list = ''
    elif arg_list_ == '()':
        # Make sure that this is the case because the function has no arguments
        # and not because there was in error in regex matching
        check = re.match(rgx.func_call_pattern + r'\(\)', modif_line)
        if check is None:
            check = re.search(r' asm (?:sideeffect )?(\".*\")\(\)', modif_line)
        if check is None:
            check = re.search(rgx.local_id + r'\(\)', modif_line)
        if check is None:
            okay = line[-2:] == '()'
            if not okay:
                check = None
            else:
                check = True
        assert check is not None, "Could not match argument list in:\n" + line + "\nFunction:\n" + func_name
        num_args = 0
        arg_list = ''
    else:
        arg_list = arg_list_[1:-1]
        arg_list = re.sub(r'<[^<>]+>', '', arg_list)
        arg_list_modif = re.sub(r'\([^\(\)]+\)', '', arg_list)
        arg_list_modif = re.sub(r'\([^\(\)]+\)', '', arg_list_modif)
        arg_list_modif = re.sub(r'\([^\(\)]+\)', '', arg_list_modif)
        arg_list_modif = re.sub(r'\([^\(\)]+\)', '', arg_list_modif)
        arg_list_modif = re.sub(r'\"[^\"]*\"', '', arg_list_modif)
        arg_list_modif = re.sub(r'{.*}', '', arg_list_modif)
        num_args = len(re.findall(',', arg_list_modif)) + 1

    return num_args, arg_list
@tbennun
Copy link
Collaborator

tbennun commented Aug 31, 2020

The assertion is correct. It checks for different regular expressions, and if there is no match continues to the next one. If the last check fails, then check is None and the assertion should fail. In any other case, the check is successful and execution should continue.

Why do you think it should be the opposite?

@island255
Copy link
Contributor Author

I think that if it goes to the branch if arg_list_ is None: , that check must be None (as it found no "( )"). And assert will meet False and assertion will fail.

But in this case, assertion in this brach will always fail. Does it mean that this brach should just not be executed?

@island255
Copy link
Contributor Author

Or it is the debug code left after testing the function find_outer_most_last_parenthesis. If it is that case, I understand it and the comments there. When I first see it, I just got confused and didn't realize the meaning of this code.

@tbennun
Copy link
Collaborator

tbennun commented Sep 1, 2020

Yes, the assertion and regexp in the branch is a sanity check meant to make sure that find_outer_most_last_parenthesis didn't miss an empty argument list somewhere.

@island255
Copy link
Contributor Author

Thanks for your reply! Thanks for your outstanding work! I really learn a lot from it.

@tbennun tbennun closed this as completed Sep 2, 2020
@tbennun
Copy link
Collaborator

tbennun commented Sep 2, 2020

Happy to help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants