Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parsing(c_parser): added full support for Binary Operators and any type of complicated Assignment in C parser #19029

Merged
merged 1 commit into from Apr 16, 2020

Conversation

smitgajjar
Copy link
Contributor

@smitgajjar smitgajjar commented Mar 30, 2020

References to other Issues or PRs

Brief description of what is fixed or changed

  • Added support for parsing binary operators +, -, *, /, %, ==, !=, <, <=, >, >=, && and ||
  • Added support for parsing assignment(=) from one variable to another
  • Added support for assignment of integer as well as floating point literal to a variable
  • Added support for an rhs expression comprising of any combination of the above operators
  • Any form of parenthesised rhs expression will also be parsed successfully
  • Added support for all forms of corresponding variable declaration as well as assignment
  • Added support for recognizing boolean literals(true and false) in variable declaration as well as assignment
  • Added support for boolean data type
  • Corresponding tests are added

Other comments

The support for bitwise operators is left to be added.

Release Notes

  • parsing
    • added support for parsing binary operators +, -, *, /, %, =, ==, !=, <, <=, >, >=, && and || in C parser
    • added support for parsing variable declarations and assignments, where one variable or a literal or any combination of them using binary operators is assigned to another variable in C parser
    • added support for variable declaration and assignment of boolean literal (true and false) as well as declaration of boolean data type in C parser

@sympy-bot
Copy link

sympy-bot commented Mar 30, 2020

Hi, I am the SymPy bot (v158). I'm here to help you write a release notes entry. Please read the guide on how to write release notes.

Your release notes are in good order.

Here is what the release notes will look like:

  • parsing
    • added support for parsing binary operators +, -, *, /, %, =, ==, !=, <, <=, >, >=, && and || in C parser (#19029 by @smitgajjar)

    • added support for parsing variable declarations and assignments, where one variable or a literal or any combination of them using binary operators is assigned to another variable in C parser (#19029 by @smitgajjar)

    • added support for variable declaration and assignment of boolean literal (true and false) as well as declaration of boolean data type in C parser (#19029 by @smitgajjar)

This will be added to https://github.com/sympy/sympy/wiki/Release-Notes-for-1.6.

Note: This comment will be updated with the latest check if you edit the pull request. You need to reload the page to see it.

Click here to see the pull request description that was parsed.

<!-- Your title above should be a short description of what
was changed. Do not include the issue number in the title. -->

#### References to other Issues or PRs

<!-- If this pull request fixes an issue, write "Fixes #NNNN" in that exact
format, e.g. "Fixes #1234" (see
https://tinyurl.com/auto-closing for more information). Also, please
write a comment on that issue linking back to this pull request once it is
open. -->


#### Brief description of what is fixed or changed
- Added support for parsing binary operators `+`, `-`, `*`, `/`, `%`, `==`, `!=`, `<`, `<=`, `>`, `>=`, `&&` and `||`
- Added support for parsing assignment(`=`) from one variable to another
- Added support for assignment of integer as well as floating point literal to a variable
- Added support for an rhs expression comprising of any combination of the above operators
- Any form of parenthesised rhs expression will also be parsed successfully
- Added support for all forms of corresponding variable declaration as well as assignment
- Added support for recognizing boolean literals(`true` and `false`) in variable declaration as well as assignment
- Added support for boolean data type
- Corresponding tests are added


#### Other comments
The support for bitwise operators is left to be added.
#### Release Notes

<!-- Write the release notes for this release below. See
https://github.com/sympy/sympy/wiki/Writing-Release-Notes for more information
on how to write release notes. The bot will check your release notes
automatically to see if they are formatted correctly. -->

<!-- BEGIN RELEASE NOTES -->
- parsing
    - added support for parsing binary operators `+`, `-`, `*`, `/`, `%`, `=`, `==`, `!=`, `<`, `<=`, ` >`, `>=`, `&&` and `||` in C parser
    - added support for parsing variable declarations and assignments, where one variable or a literal or any combination of them using binary operators is assigned to another variable in C parser
    - added support for variable declaration and assignment of boolean literal (`true` and `false`) as well as declaration of boolean data type in C parser
<!-- END RELEASE NOTES -->

Update

The release notes on the wiki have been updated.

@smitgajjar smitgajjar force-pushed the binary_op branch 3 times, most recently from 11f23b2 to 7051814 Compare March 30, 2020 20:18
@codecov
Copy link

codecov bot commented Mar 30, 2020

Codecov Report

Merging #19029 into master will decrease coverage by 0.100%.
The diff coverage is 0.000%.

@@              Coverage Diff              @@
##            master    #19029       +/-   ##
=============================================
- Coverage   75.784%   75.684%   -0.101%     
=============================================
  Files          647       650        +3     
  Lines       168657    169128      +471     
  Branches     39745     39891      +146     
=============================================
+ Hits        127816    128003      +187     
- Misses       35279     35529      +250     
- Partials      5562      5596       +34     

@smitgajjar
Copy link
Contributor Author

smitgajjar commented Mar 31, 2020

@Sc0rpi0n101 @certik Please review

@oscarbenjamin
Copy link
Contributor

Just a passing comment but why does sympy have its own code for parsing C when there are other libraries for this sort of thing?

The code here seems to be using the shunting yard algorithm but I'm sure there are better ways to implement parsers in general. For example a quick google shows:
https://pypi.org/project/pycparser/
which has the description

pycparser is a complete parser of the C language, written in pure Python using the
PLY parsing library. It parses C code into an AST and can serve as a front-end for
C compilers or analysis tools.

@smitgajjar
Copy link
Contributor Author

@oscarbenjamin I understand what you are trying to say. In fact, an issue #18968 has been opened for the same. Also, a discussion has been carried out here, where you can find the reason behind this!

Do comment with whatever seems more appropriate from your point of view.

Copy link
Member

@Sc0rpi0n101 Sc0rpi0n101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be a better idea to store the literal nodes in the combined_variable stack instead of a list of their spelling and type. The nodes can be used to extract the type and spelling later.

sympy/parsing/c/c_parser.py Outdated Show resolved Hide resolved
sympy/parsing/c/c_parser.py Outdated Show resolved Hide resolved
sympy/parsing/c/c_parser.py Show resolved Hide resolved
@Sc0rpi0n101
Copy link
Member

Just a passing comment but why does sympy have its own code for parsing C when there are other libraries for this sort of thing?

The other libraries like pycparser, can parse and generate the ASTs for their respective languages, for which we are using Clang for C.

But we have to build SymPy's AST from the C and other language's AST ourselves, which is what this parser is for.

@Sc0rpi0n101
Copy link
Member

Sc0rpi0n101 commented Apr 2, 2020

Although it is a good workaround with tokens to find the operators, due to the current state of the python bindings, I don't like using Tokens for the rest of the expression when we have children nodes with all the required data.

We are using external libraries like Clang was so that we don't have to do the hard parsing work, the external libraries provide us the AST and we just go through that AST, get the info, make a copy of the nodes in our AST and be done with it.

This is the Clang AST for var1 = ((bad + cat) - (( a - b) * c)) * d ;

image

It contains all the information we need. We should be taking the info from the Child nodes and creating our AST, similar to what's been done in the Fortran parser

def visit_BinOp(self, node):

What's the point of having the Clang AST if we have to manually tokenize the expression and build the Expressions from those.

@smitgajjar
Copy link
Contributor Author

smitgajjar commented Apr 2, 2020

Yes, I understand that pretty well. Clang becomes useless if we use tokens and manually code everything! I was first surprised why I could not obtain binary operators, so I used clang -cc1 -ast-dump file.cpp to obtain the same AST. That's how I found that only python bindings do not support it. Then, google confirmed it! And that's why I had to use this scratch level parsing for binary operators.

So, are you suggesting to obtain something from the AST in this PR? Please excuse me if I got it wrong.

@smitgajjar
Copy link
Contributor Author

test_integrals failed, how is it even possible?

@Sc0rpi0n101
Copy link
Member

So, are you suggesting to obtain something from the AST in this PR? Please excuse me if I got it wrong.

What I meant to say was that you should only be using tokens to get the operation, as the bindings do not provide an interface to do that at the moment.

But everything else should still be done using the AST Nodes.

Also, I found this patch to Clang for binary operations support, but I don't think it's been merged yet.
https://reviews.llvm.org/D10833?id=39158

@certik
Copy link
Member

certik commented Apr 2, 2020

@oscarbenjamin good question, and I commented here: #18968 (comment). I agree SymPy should not have a C parser and we do not have it. We use Clang to do the parsing, this module is simply to convert from Clang into SymPy. Unfortunately, the Python wrappers in Clang are not as useful and we have to do some workarounds. As I mentioned in the other issue, we should stick with Clang. It's a production compiler that works with C and C++, and the only issue with it seems to be that the Python wrappers are not as developed. But that we can fix --- we can simply submit PRs to Clang to improve the wrappers, or alternatively create a separate project that provides usable Python wrappers to Clang that we can use.

In the meantime, this PR seems fine to me.

@smitgajjar
Copy link
Contributor Author

smitgajjar commented Apr 2, 2020

@Sc0rpi0n101 I understand that I should have used tokens only for determining the type of binary operators, but the fact is, if you cannot determine the type of binary operator, you cannot even determine which binary operator we have to consider in a series of tokens.

Let me explain with an example:
a = 1 * 2 + 3;
Now, tree similar to the following will be obtained:

1 BINARY_OPERATOR // for =
2 -DECL_REF_EXPR // for a
3 -BINARY_OPERATOR  // for +
4 --BINARY_OPERATOR // for *
5 ---INTEGER_LITERAL // for '1'
6 ---INTEGER_LITERAL // for '2'
7 --INTEGER_LITERAL // for '3'

Now, if you will extract tokens from node obtained from line 3, 1 * 2 + 3 will be obtained.
As we know, the type of binary operator cannot be determined.
Indeed, it is supposed to be '+' and we will definitely obtain it if we are not working with python bindings, but since we don't know where to find it in the set of tokens(i.e explicit index of '+' token, which is 3), we won't be able to determine anything further!

Hope, you understood!

@smitgajjar
Copy link
Contributor Author

ping @Sc0rpi0n101 @certik

@certik
Copy link
Member

certik commented Apr 4, 2020 via email

@smitgajjar
Copy link
Contributor Author

As I said, it's fine for now, but in the long run we have to fix Clang's Python wrappers to give us the information we need. Do you agree?

I agree, since we should focus more on conversion to codegen AST rather than this, it would be better if we can fix Python wrappers!

Copy link
Member

@Sc0rpi0n101 Sc0rpi0n101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we plan to update it later, LGTM

But I have one question. What if the input is something like int a = b + c * 4; or something like that, how will that be handled?
since here, token[1] will not be = as it's a VarDecl and not a Binop

@smitgajjar
Copy link
Contributor Author

That's a good question. I am working on it now and will try to make changes to accommodate that!

@smitgajjar
Copy link
Contributor Author

But I have one question. What if the input is something like int a = b + c * 4; or something like that, how will that be handled?

This has been handled properly, please review

sympy/parsing/c/c_parser.py Outdated Show resolved Hide resolved
'{' + '\n' +
'int b;' + '\n' +
'int c;' + '\n' +
'int a = b + c*4;' + '\n' +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a couple more examples of these VarDecl statements

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, might be a good idea to add these to test_var_del instead of this one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but there is no function called test_var_decl, am I supposed to create a new one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay...

Copy link
Member

@Sc0rpi0n101 Sc0rpi0n101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few comments, otherwise looks good.

elif (node.type.kind == cin.TypeKind.FLOAT):
type = FloatBaseType(String('real'))
value = Float(val)
# when only one unexposed_expr is assigned
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to repeat the comments.
Once should be enough
You can just leave the eg if you want

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

type = type,
value = value
)
raise NotImplementedError("Only int" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise NotImplementedError("Only int" \
raise NotImplementedError("Only int " \

elif (node.type.kind == cin.TypeKind.FLOAT):
type = FloatBaseType(String('real'))
else:
raise NotImplementedError("Only int" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise NotImplementedError("Only int" \
raise NotImplementedError("Only int " \

type = type,
value = value
)
raise NotImplementedError("Only int" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise NotImplementedError("Only int" \
raise NotImplementedError("Only int " \

@smitgajjar
Copy link
Contributor Author

smitgajjar commented Apr 6, 2020

Why does make html of sphinx build fail in travis tests? Someone who is familiar with it, please go through it. This happened for the fourth time, I guess.

I think this is relevant to issue #19084 It has been fixed

@smitgajjar
Copy link
Contributor Author

ping @Sc0rpi0n101

@smitgajjar
Copy link
Contributor Author

smitgajjar commented Apr 8, 2020

I think, all binary operators except relational, shift, logical and bitwise operators have been covered.

I have a query. I was trying to implement relational operators. I found these discrepancies for C and C++ files (though it was expected in a way since there is no boolean data type in C)

Example 1:

Content of template.cpp and template.c:

bool a=1==2;`

Tree of template.cpp:

-TRANSLATION_UNIT template.cpp TypeKind.INVALID
--VAR_DECL a TypeKind.BOOL
---BINARY_OPERATOR  TypeKind.BOOL
----INTEGER_LITERAL  TypeKind.INT
----INTEGER_LITERAL  TypeKind.INT

Tree of template.c (or template.h):

-TRANSLATION_UNIT template.c TypeKind.INVALID
--VAR_DECL a TypeKind.INT

Example 2:

Content of template.cpp and template.c:

bool a=1==2;
void func() {
    bool a=1==2;
}

Tree of template.cpp:

-TRANSLATION_UNIT template.cpp TypeKind.INVALID
--VAR_DECL a TypeKind.BOOL
---BINARY_OPERATOR  TypeKind.BOOL
----INTEGER_LITERAL  TypeKind.INT
----INTEGER_LITERAL  TypeKind.INT
--FUNCTION_DECL func TypeKind.FUNCTIONPROTO
---COMPOUND_STMT  TypeKind.INVALID
----DECL_STMT  TypeKind.INVALID
-----VAR_DECL a TypeKind.BOOL
------BINARY_OPERATOR  TypeKind.BOOL
-------INTEGER_LITERAL  TypeKind.INT
-------INTEGER_LITERAL  TypeKind.INT

Tree of template.c (or template.h):

-TRANSLATION_UNIT template.c TypeKind.INVALID
--VAR_DECL a TypeKind.INT
--FUNCTION_DECL func TypeKind.FUNCTIONNOPROTO
---COMPOUND_STMT  TypeKind.INVALID

Considering these, how should I implement relational operators? If we will consider C++ too, we have to change temp file extension to .cpp.

I believe, if we are trying to implement only C here, we should keep .h, but then we will not be able to obtain its children as seen in the above egs.

All suggestions regarding this are most welcome! :)

@Sc0rpi0n101
Copy link
Member

It's showing invalid in C as boolean is not a primary data type in C. It is only available after C99. You have to either import stdbool.h or reference it as _Boolto use it. But, it's a primary data type in C++. If you declare the variables as any other type, it works just fine.

If we will consider C++ too, we have to change temp file extension to .cpp.

If you want to test features that are only supported in C++, you should use .cpp for those files.

I believe, if we are trying to implement only C here, we should keep .h, but then we will not be able to obtain its children as seen in the above egs.

Actually, we do plan to support both C and C++. So, there are no limitations like that.

Also, should we be doing div by 0 checks for div and mod operations?

@smitgajjar
Copy link
Contributor Author

smitgajjar commented Apr 8, 2020

It's showing invalid in C as boolean is not a primary data type in C. It is only available after C99. You have to either import stdbool.h or reference it as _Boolto use it. But, it's a primary data type in C++. If you declare the variables as any other type, it works just fine.

Just tried it. After importing this header, the tree remains the same. But... _Bool worked! Thanks for this suggestion. So, should I use _Bool(without changing temp file to .cpp) or bool(after changing temp file to .cpp) in tests?

If we will consider C++ too, we have to change temp file extension to .cpp.

If you want to test features that are only supported in C++, you should use .cpp for those files.

Actually, I was talking about parse_str() function, which creates a temporary .h file. I want to change that to .cpp

Also, should we be doing div by 0 checks for div and mod operations?

No, because I guess we are only parsing C to expr. We should rely on SymPy for further checks.e.g.; Pow(0, -1) will return zoo :)

@smitgajjar
Copy link
Contributor Author

smitgajjar commented Apr 8, 2020

One more discrepancy found:

Here, .cpp tree seems to check whether the function definition exists or not. If not, the function call is not recognized, which is good in fact. Also, function cannot be called from outside a function, i.e. from global scope, hence .cpp tree seems to be better here as well since it doesn't recognize that function call too...

Whereas, .h tree doesn't check these two necessary conditions!

But, both gives good(and same) resultant tree, if function is called properly(i.e if not called from global scope and if the correct function definition exists)!

Looking at these circumstances, we have to update function call parsing too! (I found this when I ran tests for parsing relational operators, but unfortunately, the test for the function call failed)

@smitgajjar
Copy link
Contributor Author

ping @Sc0rpi0n101

@smitgajjar
Copy link
Contributor Author

smitgajjar commented Apr 10, 2020

IMO, if everything looks good, it is ready to be merged now. Bitwise operators are the only binary operators, which are left, but I am unable to find any corresponding SymPy class in core. If you find any, do suggest... :)

if combined_variable[1] == 'expr':
return combined_variable[0]
if combined_variable[1] == 'boolean':
return true if combined_variable[0] == 'true' else false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't return combined_variable[0] work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because 'true' is a string and true is boolean true of sympy (S.true). Similar will be the case for false.

elif (token.kind == cin.TokenKind.KEYWORD
and token.spelling in ['true', 'false']):
combined_variables_stack.append(
[token.spelling, 'boolean'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be an else condition too with NotImplementedError maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely!

@@ -150,7 +153,7 @@ def parse_str(self, source, flags):
A list of sympy AST nodes

"""
file = tempfile.NamedTemporaryFile(mode = 'w+', suffix = '.h')
file = tempfile.NamedTemporaryFile(mode = 'w+', suffix = '.cpp')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about creating two functions parse_c_str and parse_cpp_str, one with a .c extenstion and the other with a .cpp extension.
But that can be done better in another PR, as we can also change the API a little accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that will be better. Will discuss it in a new PR...

@Sc0rpi0n101
Copy link
Member

IMO, if everything looks good, it is ready to be merged now. Bitwise operators are the only binary operators, which are left, but I am unable to find any corresponding SymPy class in core. If you find any, do suggest... :)

Yeah, sure
After this is finalized, you can rebase it.

Other improvements can be done in another pull request.

@smitgajjar
Copy link
Contributor Author

Please review soon if possible so that I can start implementing unary operators(because that will be somewhat similar) in next PR! Thanks :)

@certik
Copy link
Member

certik commented Apr 15, 2020

@Sc0rpi0n101 do you have any further comments?

I noticed the tests are getting really large. I wonder if there is something that can be done with it. We can do that later also after this is merged. I think it would make sense to finalize this PR, merge it, and go from there.

Copy link
Member

@Sc0rpi0n101 Sc0rpi0n101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I think a few things can be done to reduce the redundancy in the tests. But that will be better done in another PR. It would be better now for @smitgajjar to rebase it and we can merge this.

@smitgajjar
Copy link
Contributor Author

smitgajjar commented Apr 15, 2020

I noticed the tests are getting really large. I wonder if there is something that can be done with it.

Agreed, those will grow larger as we proceed. IMO we have to refactor them and move each test according to the parsing functionality provided by each.

For instance, test_c_var_decl.py for variable declarations, test_c_function_call.py for function calls, etc.

It would be better now for @smitgajjar to rebase it and we can merge this.

Alright!

@Sc0rpi0n101
Copy link
Member

The commit message would look much better if you can do it like the current PR description with bullet points for the changes.

- Added support for parsing binary operators `+`, `-`, `*`, `/`, `%`, `==`, `!=`, `<`, `<=`, `>`, `>=`, `&&` and `||`
- Added support for parsing assignment(`=`) from one variable to another
- Added support for assignment of integer as well as floating point literal to a variable
- Added support for an rhs expression comprising of any combination of the above operators
- Any form of parenthesised rhs expression will also be parsed successfully
- Added support for all forms of corresponding variable declaration as well as assignment
- Added support for recognizing boolean literals(`true` and `false`) in variable declaration as well as assignment
- Added support for boolean data type
- Corresponding tests are added
@smitgajjar
Copy link
Contributor Author

ping

@Sc0rpi0n101
Copy link
Member

Looks good. Merging it now.

Thank you for your contribution.

@Sc0rpi0n101 Sc0rpi0n101 merged commit 5db74b0 into sympy:master Apr 16, 2020
@smitgajjar smitgajjar deleted the binary_op branch April 16, 2020 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants