New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parsing(c_parser): added full support for Binary Operators and any type of complicated Assignment in C parser #19029
Conversation
✅ Hi, I am the SymPy bot (v158). I'm here to help you write a release notes entry. Please read the guide on how to write release notes. Your release notes are in good order. Here is what the release notes will look like:
This will be added to https://github.com/sympy/sympy/wiki/Release-Notes-for-1.6. Note: This comment will be updated with the latest check if you edit the pull request. You need to reload the page to see it. Click here to see the pull request description that was parsed.
Update The release notes on the wiki have been updated. |
11f23b2
to
7051814
Compare
Codecov Report
@@ Coverage Diff @@
## master #19029 +/- ##
=============================================
- Coverage 75.784% 75.684% -0.101%
=============================================
Files 647 650 +3
Lines 168657 169128 +471
Branches 39745 39891 +146
=============================================
+ Hits 127816 128003 +187
- Misses 35279 35529 +250
- Partials 5562 5596 +34 |
@Sc0rpi0n101 @certik Please review |
Just a passing comment but why does sympy have its own code for parsing C when there are other libraries for this sort of thing? The code here seems to be using the shunting yard algorithm but I'm sure there are better ways to implement parsers in general. For example a quick google shows:
|
@oscarbenjamin I understand what you are trying to say. In fact, an issue #18968 has been opened for the same. Also, a discussion has been carried out here, where you can find the reason behind this! Do comment with whatever seems more appropriate from your point of view. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be a better idea to store the literal nodes in the combined_variable stack instead of a list of their spelling and type. The nodes can be used to extract the type and spelling later.
The other libraries like pycparser, can parse and generate the ASTs for their respective languages, for which we are using Clang for C. But we have to build SymPy's AST from the C and other language's AST ourselves, which is what this parser is for. |
Although it is a good workaround with tokens to find the operators, due to the current state of the python bindings, I don't like using Tokens for the rest of the expression when we have children nodes with all the required data. We are using external libraries like Clang was so that we don't have to do the hard parsing work, the external libraries provide us the AST and we just go through that AST, get the info, make a copy of the nodes in our AST and be done with it. This is the Clang AST for It contains all the information we need. We should be taking the info from the Child nodes and creating our AST, similar to what's been done in the Fortran parser sympy/sympy/parsing/fortran/fortran_parser.py Line 128 in d3ae627
What's the point of having the Clang AST if we have to manually tokenize the expression and build the Expressions from those. |
Yes, I understand that pretty well. Clang becomes useless if we use tokens and manually code everything! I was first surprised why I could not obtain binary operators, so I used So, are you suggesting to obtain something from the AST in this PR? Please excuse me if I got it wrong. |
test_integrals failed, how is it even possible? |
What I meant to say was that you should only be using tokens to get the operation, as the bindings do not provide an interface to do that at the moment. But everything else should still be done using the AST Nodes. Also, I found this patch to Clang for binary operations support, but I don't think it's been merged yet. |
@oscarbenjamin good question, and I commented here: #18968 (comment). I agree SymPy should not have a C parser and we do not have it. We use Clang to do the parsing, this module is simply to convert from Clang into SymPy. Unfortunately, the Python wrappers in Clang are not as useful and we have to do some workarounds. As I mentioned in the other issue, we should stick with Clang. It's a production compiler that works with C and C++, and the only issue with it seems to be that the Python wrappers are not as developed. But that we can fix --- we can simply submit PRs to Clang to improve the wrappers, or alternatively create a separate project that provides usable Python wrappers to Clang that we can use. In the meantime, this PR seems fine to me. |
@Sc0rpi0n101 I understand that I should have used tokens only for determining the type of binary operators, but the fact is, if you cannot determine the type of binary operator, you cannot even determine which binary operator we have to consider in a series of tokens. Let me explain with an example: 1 BINARY_OPERATOR // for = 2 -DECL_REF_EXPR // for a 3 -BINARY_OPERATOR // for + 4 --BINARY_OPERATOR // for * 5 ---INTEGER_LITERAL // for '1' 6 ---INTEGER_LITERAL // for '2' 7 --INTEGER_LITERAL // for '3' Now, if you will extract tokens from node obtained from line 3, Hope, you understood! |
ping @Sc0rpi0n101 @certik |
As I said, it's fine for now, but in the long run we have to fix Clang's Python wrappers to give us the information we need. Do you agree?
…On Sat, Apr 4, 2020, at 3:33 AM, Smit Gajjar wrote:
ping @Sc0rpi0n101 <https://github.com/Sc0rpi0n101> @certik
<https://github.com/certik>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19029 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAFAWAYOMRPT5YOEOXP2CDRK35GHANCNFSM4LW32MGA>.
|
I agree, since we should focus more on conversion to codegen AST rather than this, it would be better if we can fix Python wrappers! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we plan to update it later, LGTM
But I have one question. What if the input is something like int a = b + c * 4;
or something like that, how will that be handled?
since here, token[1] will not be =
as it's a VarDecl and not a Binop
That's a good question. I am working on it now and will try to make changes to accommodate that! |
This has been handled properly, please review |
sympy/parsing/tests/test_c_parser.py
Outdated
'{' + '\n' + | ||
'int b;' + '\n' + | ||
'int c;' + '\n' + | ||
'int a = b + c*4;' + '\n' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a couple more examples of these VarDecl statements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, might be a good idea to add these to test_var_del
instead of this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, but there is no function called test_var_decl
, am I supposed to create a new one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a few comments, otherwise looks good.
sympy/parsing/c/c_parser.py
Outdated
elif (node.type.kind == cin.TypeKind.FLOAT): | ||
type = FloatBaseType(String('real')) | ||
value = Float(val) | ||
# when only one unexposed_expr is assigned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to repeat the comments.
Once should be enough
You can just leave the eg if you want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
sympy/parsing/c/c_parser.py
Outdated
type = type, | ||
value = value | ||
) | ||
raise NotImplementedError("Only int" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise NotImplementedError("Only int" \ | |
raise NotImplementedError("Only int " \ |
sympy/parsing/c/c_parser.py
Outdated
elif (node.type.kind == cin.TypeKind.FLOAT): | ||
type = FloatBaseType(String('real')) | ||
else: | ||
raise NotImplementedError("Only int" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise NotImplementedError("Only int" \ | |
raise NotImplementedError("Only int " \ |
sympy/parsing/c/c_parser.py
Outdated
type = type, | ||
value = value | ||
) | ||
raise NotImplementedError("Only int" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise NotImplementedError("Only int" \ | |
raise NotImplementedError("Only int " \ |
|
ping @Sc0rpi0n101 |
I think, all binary operators except relational, shift, logical and bitwise operators have been covered. I have a query. I was trying to implement relational operators. I found these discrepancies for C and C++ files (though it was expected in a way since there is no boolean data type in C) Example 1:Content of
|
It's showing invalid in C as boolean is not a primary data type in C. It is only available after C99. You have to either import
If you want to test features that are only supported in C++, you should use
Actually, we do plan to support both C and C++. So, there are no limitations like that. Also, should we be doing div by 0 checks for div and mod operations? |
Just tried it. After importing this header, the tree remains the same. But...
Actually, I was talking about
No, because I guess we are only parsing C to expr. We should rely on SymPy for further checks.e.g.; Pow(0, -1) will return |
One more discrepancy found: Here, Whereas, But, both gives good(and same) resultant tree, if function is called properly(i.e if not called from global scope and if the correct function definition exists)! Looking at these circumstances, we have to update function call parsing too! (I found this when I ran tests for parsing relational operators, but unfortunately, the test for the function call failed) |
ping @Sc0rpi0n101 |
IMO, if everything looks good, it is ready to be merged now. Bitwise operators are the only binary operators, which are left, but I am unable to find any corresponding SymPy class in core. If you find any, do suggest... :) |
if combined_variable[1] == 'expr': | ||
return combined_variable[0] | ||
if combined_variable[1] == 'boolean': | ||
return true if combined_variable[0] == 'true' else false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't return combined_variable[0]
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because 'true'
is a string and true
is boolean true of sympy (S.true). Similar will be the case for false.
elif (token.kind == cin.TokenKind.KEYWORD | ||
and token.spelling in ['true', 'false']): | ||
combined_variables_stack.append( | ||
[token.spelling, 'boolean']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't there be an else condition too with NotImplementedError
maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely!
@@ -150,7 +153,7 @@ def parse_str(self, source, flags): | |||
A list of sympy AST nodes | |||
|
|||
""" | |||
file = tempfile.NamedTemporaryFile(mode = 'w+', suffix = '.h') | |||
file = tempfile.NamedTemporaryFile(mode = 'w+', suffix = '.cpp') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about creating two functions parse_c_str
and parse_cpp_str
, one with a .c
extenstion and the other with a .cpp
extension.
But that can be done better in another PR, as we can also change the API a little accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that will be better. Will discuss it in a new PR...
Yeah, sure Other improvements can be done in another pull request. |
Please review soon if possible so that I can start implementing unary operators(because that will be somewhat similar) in next PR! Thanks :) |
@Sc0rpi0n101 do you have any further comments? I noticed the tests are getting really large. I wonder if there is something that can be done with it. We can do that later also after this is merged. I think it would make sense to finalize this PR, merge it, and go from there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I think a few things can be done to reduce the redundancy in the tests. But that will be better done in another PR. It would be better now for @smitgajjar to rebase it and we can merge this.
Agreed, those will grow larger as we proceed. IMO we have to refactor them and move each test according to the parsing functionality provided by each. For instance,
Alright! |
The commit message would look much better if you can do it like the current PR description with bullet points for the changes. |
- Added support for parsing binary operators `+`, `-`, `*`, `/`, `%`, `==`, `!=`, `<`, `<=`, `>`, `>=`, `&&` and `||` - Added support for parsing assignment(`=`) from one variable to another - Added support for assignment of integer as well as floating point literal to a variable - Added support for an rhs expression comprising of any combination of the above operators - Any form of parenthesised rhs expression will also be parsed successfully - Added support for all forms of corresponding variable declaration as well as assignment - Added support for recognizing boolean literals(`true` and `false`) in variable declaration as well as assignment - Added support for boolean data type - Corresponding tests are added
ping |
Looks good. Merging it now. Thank you for your contribution. |
References to other Issues or PRs
Brief description of what is fixed or changed
+
,-
,*
,/
,%
,==
,!=
,<
,<=
,>
,>=
,&&
and||
=
) from one variable to anothertrue
andfalse
) in variable declaration as well as assignmentOther comments
The support for bitwise operators is left to be added.
Release Notes
+
,-
,*
,/
,%
,=
,==
,!=
,<
,<=
,>
,>=
,&&
and||
in C parsertrue
andfalse
) as well as declaration of boolean data type in C parser