Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser Improvements and Additions #1298

Merged
merged 6 commits into from
Jul 19, 2017
Merged

Conversation

ShikharJ
Copy link
Member

No description provided.

@ShikharJ
Copy link
Member Author

This currently gives an error as:

/home/shikhar/symengine/symengine/parser.cpp:116:47: error: could not convert ‘{{"Eq", SymEngine::Eq}, {"Ne", SymEngine::Ne}, {"Ge", SymEngine::Ge}, {"Gt", SymEngine::Gt}, {"Le", SymEngine::Le}, {"Lt", SymEngine::Lt}}’ from ‘<brace-enclosed initializer list>’ to ‘std::map<std::__cxx11::basic_string<char>, std::function<Teuchos::RCP<const SymEngine::Boolean>(const Teuchos::RCP<const SymEngine::Basic>&, const Teuchos::RCP<const SymEngine::Basic>&)> >’
             {"Gt", Gt}, {"Le", Le}, {"Lt", Lt}};

@isuruf
Copy link
Member

isuruf commented Jun 21, 2017

Looks good to me. It'll also be useful to be able to parse strings like x < y as well.

@ShikharJ
Copy link
Member Author

@isuruf Can you review? I'm not sure this is how it is supposed to be implemented.

@isuruf isuruf requested a review from srajangarg June 22, 2017 02:09
std::map<std::string,
std::function<RCP<const Boolean>(const RCP<const Basic> &,
const RCP<const Basic> &)>>
double_arg_boolean_functions = {
Copy link
Contributor

@srajangarg srajangarg Jun 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename this to boolean_functions?
Edit : nevermind

Copy link
Contributor

@srajangarg srajangarg Jun 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compilation error is possibly due to the overloaded Eq function. See how I have done it for the overloaded log function above. You have to cast it to a specific function type (in this case the double argument variant)

@@ -119,7 +131,7 @@ class ExpressionParser
// the string to be parsed, obtained after removing all spaces from input
// string
std::string s;
// it's length
// its length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar nazi! 😛

parse_string(iter + 1, operator_end[iter]));
iter = operator_end[iter] - 1;

} else if (s[iter] == '<' and s[iter + 1] == '=') {
Copy link
Contributor

@srajangarg srajangarg Jun 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can iter be the last index? s[iter+1] may segfault

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have my doubts about that as well. A hacky alternative that occurred to me was to use other symbols, such as # or @ for replacing all the instances of <= and >=, just like it's being done for ** to ^ during preprocessing. What would you suggest?

Copy link
Contributor

@srajangarg srajangarg Jun 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No that isn't a good solution. For now, will checking iter + 1 < end work? (or <= I don't remember exactly)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't. Each time the terminal returns Operator Inconsistency!.

Copy link
Contributor

@srajangarg srajangarg Jun 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try and look into this further? Why does adding && iter + 1 <= end cause it to throw everytime?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Tried for < as well as <=.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srajangarg Though I'm not sure, probably the error occurs in parse_expr() where x >= y get split into x > and the rest. I think it is during the simplification of this expression that the error is thrown up.

Copy link
Contributor

@srajangarg srajangarg Jun 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, ideally to fix this we should move from a character based approach to a "token" based approach, where each token is one or more characters. Everything proceeds the same way but instead of iterating over characters we iterate over tokens. Tokens are generated in the parse_expr stage. You can think of ** being converted to ^ a tokenization itself (right now our tokens are only single characters, and we tokenize the multiple characters to a single one).

Do you think you can implement this? As we add more and more operators, using only single characters as tokens will become a problem (and we can soon run out of symbols). If you don't want to tackle this now, go ahead with the special symbol hacky approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had tried changing the set<char> OPERATORS to set<std::string> OPERATORS and subsequently std::map<char, int> op_precedence to std::map<std::string, int> op_precedence. But these changes would require an overhaul of the current iterative algorithm. I'd like to open an issue, for now, and tackle it later.

@srajangarg
Copy link
Contributor

srajangarg commented Jun 22, 2017

I would like to see some more complicated test cases, and cases which will not be parsed correctly (ie will throw) using these new symbols

@srajangarg
Copy link
Contributor

srajangarg commented Jun 27, 2017

what does parsing sin(x < y) generate?

@ShikharJ
Copy link
Member Author

ShikharJ commented Jun 27, 2017

That gives out an Operator Inconsistency error as well.
Edit: This turns out to give an error in SymPy as well (I've cut out most of the traceback):

In [1]: sin(x < y)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-e00934bdc3bd> in <module>()
----> 1 sin(x < y)
TypeError: cannot determine truth value of Relational

@srajangarg
Copy link
Contributor

Should this be an error while parsing (ie. you cant have this string), or error while actually constructing the symbolic tree (ie. a relational inside a sin should not be allowed)?

I feel it should parse correctly. I'm not sure.

@ShikharJ
Copy link
Member Author

I think it should be handled in the respective classes. More recently, like it is handled in Floor and Ceiling. SymPy handles this in the classes as well.

@srajangarg
Copy link
Contributor

So then our parser throwing Operator Inconsistency is wrong in the case of sin(x < y) right? Figure out what's going wrong, and try to fix it.

@ShikharJ
Copy link
Member Author

@srajangarg Can you review? The build failure is unrelated to the changes, probably that needs to be restarted. sin(x < y) currently returns sin(Lt(x, y)), though that can be fixed through another PR.

@srajangarg
Copy link
Contributor

Is the operator precedence set properly? How is x + y < 2 parsed? Please add more extensive test cases, dealing with brackets, operators, functions etc.

@ShikharJ
Copy link
Member Author

@srajangarg Can you restart the failing build? Also, should I add more tests?

res = parse(s);
CHECK(eq(*res, *Le(mul(x, y), add(x, y))));

s = "x - y = x/y";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= should not be used for equality. Use ==

res = parse(s);
CHECK(eq(*res, *Le(sub(x, y), div(x, y))));

s = "x = y < 2";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test and all the tests below should be removed. They don't make sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were implemented to check for operator precedence. I'll remove them.

@ShikharJ
Copy link
Member Author

Ping @isuruf @srajangarg

@srajangarg srajangarg requested a review from isuruf June 29, 2017 11:01
@srajangarg
Copy link
Contributor

LGTM

Copy link
Member

@isuruf isuruf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor issue. Also can you try and see if And(x < y, w >= z) works ?

@@ -269,6 +362,7 @@ TEST_CASE("Parsing: constants", "[parser]")

s = "E*pi";
res = parse(s);
s = "2*(x+1)**10 + 3*(x+2)**5";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this is accidental.

@ShikharJ
Copy link
Member Author

@isuruf @srajangarg I've added support for some additional Boolean functions. Please review the last commit.

@@ -374,11 +485,27 @@ class ExpressionParser
s.clear();
s.reserve(in.length());

// Replacing ** with ^
// TODO: Implement multi-character operator parsing support
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like this hack. Would it take a long time to implement this?

for (unsigned int i = 0; i < in.length(); ++i) {
if (in[i] == '*' and i + 1 < in.length() and in[i + 1] == '*') {
// Replacing ** with ^
s += '^';
Copy link
Member Author

@ShikharJ ShikharJ Jul 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isuruf @srajangarg Should this be removed? Is there a need to parse &, | or ^? Also, can you please review the PR?

Copy link
Contributor

@srajangarg srajangarg Jul 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it should. If you've implemented multi character operator support.

@srajangarg
Copy link
Contributor

srajangarg commented Jul 10, 2017

First look this looks good. Give me some time to go through it fully.

But then again, this is just still a hacky solution. We need to switch to a proper lexer/parser based approach for this to be scalable in the long run.

@ShikharJ ShikharJ force-pushed the Parser branch 4 times, most recently from 2c4d09b to 700c5a0 Compare July 15, 2017 03:52
@ShikharJ
Copy link
Member Author

@isuruf What would be your take on this? I don't have a clear idea on implementing "tokenization" of operators, and as such, I'd like to open an issue for that instead.

@srajangarg srajangarg merged commit 2695a9d into symengine:master Jul 19, 2017
@ShikharJ ShikharJ deleted the Parser branch July 19, 2017 09:20
isuruf pushed a commit to isuruf/symengine that referenced this pull request Aug 4, 2018
Parser Improvements and Additions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants