-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser Improvements and Additions #1298
Conversation
This currently gives an error as:
|
Looks good to me. It'll also be useful to be able to parse strings like |
@isuruf Can you review? I'm not sure this is how it is supposed to be implemented. |
std::map<std::string, | ||
std::function<RCP<const Boolean>(const RCP<const Basic> &, | ||
const RCP<const Basic> &)>> | ||
double_arg_boolean_functions = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename this to boolean_functions
?
Edit : nevermind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compilation error is possibly due to the overloaded Eq
function. See how I have done it for the overloaded log
function above. You have to cast it to a specific function type (in this case the double argument variant)
@@ -119,7 +131,7 @@ class ExpressionParser | |||
// the string to be parsed, obtained after removing all spaces from input | |||
// string | |||
std::string s; | |||
// it's length | |||
// its length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar nazi! 😛
symengine/parser.cpp
Outdated
parse_string(iter + 1, operator_end[iter])); | ||
iter = operator_end[iter] - 1; | ||
|
||
} else if (s[iter] == '<' and s[iter + 1] == '=') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can iter
be the last index? s[iter+1]
may segfault
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have my doubts about that as well. A hacky alternative that occurred to me was to use other symbols, such as #
or @
for replacing all the instances of <=
and >=
, just like it's being done for **
to ^
during preprocessing. What would you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No that isn't a good solution. For now, will checking iter + 1 < end
work? (or <=
I don't remember exactly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't. Each time the terminal returns Operator Inconsistency!
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try and look into this further? Why does adding && iter + 1 <= end
cause it to throw everytime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Tried for <
as well as <=
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srajangarg Though I'm not sure, probably the error occurs in parse_expr()
where x >= y
get split into x >
and the rest. I think it is during the simplification of this expression that the error is thrown up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, ideally to fix this we should move from a character based approach to a "token" based approach, where each token is one or more characters. Everything proceeds the same way but instead of iterating over characters we iterate over tokens. Tokens are generated in the parse_expr
stage. You can think of **
being converted to ^
a tokenization itself (right now our tokens are only single characters, and we tokenize the multiple characters to a single one).
Do you think you can implement this? As we add more and more operators, using only single characters as tokens will become a problem (and we can soon run out of symbols). If you don't want to tackle this now, go ahead with the special symbol hacky approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had tried changing the set<char> OPERATORS
to set<std::string> OPERATORS
and subsequently std::map<char, int> op_precedence
to std::map<std::string, int> op_precedence
. But these changes would require an overhaul of the current iterative algorithm. I'd like to open an issue, for now, and tackle it later.
I would like to see some more complicated test cases, and cases which will not be parsed correctly (ie will throw) using these new symbols |
what does parsing |
That gives out an
|
Should this be an error while parsing (ie. you cant have this string), or error while actually constructing the symbolic tree (ie. a relational inside a I feel it should parse correctly. I'm not sure. |
I think it should be handled in the respective classes. More recently, like it is handled in |
So then our parser throwing |
@srajangarg Can you review? The build failure is unrelated to the changes, probably that needs to be restarted. |
Is the operator precedence set properly? How is |
@srajangarg Can you restart the failing build? Also, should I add more tests? |
res = parse(s); | ||
CHECK(eq(*res, *Le(mul(x, y), add(x, y)))); | ||
|
||
s = "x - y = x/y"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=
should not be used for equality. Use ==
res = parse(s); | ||
CHECK(eq(*res, *Le(sub(x, y), div(x, y)))); | ||
|
||
s = "x = y < 2"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test and all the tests below should be removed. They don't make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were implemented to check for operator precedence. I'll remove them.
Ping @isuruf @srajangarg |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor issue. Also can you try and see if And(x < y, w >= z)
works ?
@@ -269,6 +362,7 @@ TEST_CASE("Parsing: constants", "[parser]") | |||
|
|||
s = "E*pi"; | |||
res = parse(s); | |||
s = "2*(x+1)**10 + 3*(x+2)**5"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this is accidental.
@isuruf @srajangarg I've added support for some additional |
symengine/parser.cpp
Outdated
@@ -374,11 +485,27 @@ class ExpressionParser | |||
s.clear(); | |||
s.reserve(in.length()); | |||
|
|||
// Replacing ** with ^ | |||
// TODO: Implement multi-character operator parsing support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like this hack. Would it take a long time to implement this?
symengine/parser.cpp
Outdated
for (unsigned int i = 0; i < in.length(); ++i) { | ||
if (in[i] == '*' and i + 1 < in.length() and in[i + 1] == '*') { | ||
// Replacing ** with ^ | ||
s += '^'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@isuruf @srajangarg Should this be removed? Is there a need to parse &
, |
or ^
? Also, can you please review the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it should. If you've implemented multi character operator support.
First look this looks good. Give me some time to go through it fully. But then again, this is just still a hacky solution. We need to switch to a proper lexer/parser based approach for this to be scalable in the long run. |
2c4d09b
to
700c5a0
Compare
@isuruf What would be your take on this? I don't have a clear idea on implementing "tokenization" of operators, and as such, I'd like to open an issue for that instead. |
Parser Improvements and Additions
No description provided.