New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSA feature extractor #1811
base: master
Are you sure you want to change the base?
SSA feature extractor #1811
Conversation
Co-authored-by: intrigus-lgtm <60750685+intrigus-lgtm@users.noreply.github.com>
8941a20
to
38d25e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks like a good idea which I do approve of. But I have a few comments, mostly about being too conservative.
{"symex-trace", NULL, "print instructions during symbolic execution"}, | ||
{"ssa-trace", NULL, "print SSA during SMT encoding"}, | ||
{"ssa-smt-trace", NULL, "print generated SMT during SMT encoding"}, | ||
{"ssa-features-dump", NULL, "print features in the SSA (just before conversion)"}, | ||
{"symex-ssa-trace", NULL, "print generated SSA during symbolic execution"}, | ||
{"goto2c", NULL, "translate the GOTO program to C"}, | ||
{"show-goto-value-sets", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is weird formatting. Not your PR. Just wondering whether clang-format is having a bad time with this options.cpp file and we should just do it in one go...
case expr2t::modulus_id: | ||
if (!(is_constant_expr(dynamic_cast<const arith_2ops &>(*e).side_1) || | ||
is_constant_expr(dynamic_cast<const arith_2ops &>(*e).side_2))) | ||
features.insert(SSA_FEATURES::NON_LINEAR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questionable choice: Is (unsigned short)23 * 42
(i.e. a typecast) really a non-linear expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify? You commented on modulus. Also, shouldn't be simplified by the constant folding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was referring to mul_id
, which is also handled by this switch case, Github just defaults to 4 lines of context. Constant folding should do it, yes. Is it a necessary preprocessing step to this algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was referring to
mul_id
, which is also handled by this switch case, Github just defaults to 4 lines of context. Constant folding should do it, yes. Is it a necessary preprocessing step to this algorithm?
Ow I see, you are right. Well, I could add a contains_symbolic(const expr2tc &e)
into irep utils and use it then. Guess that is safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the question is, what "SSA contains non-linear features" really means. Consider for instance a := b / 0
. This cannot be simplified because it's undefined. But if b
was a constant, contains_symbolic() would return false. How would that correspond to "contains non-linear features"? Maybe an easier approach is to call the simplification directly and ask whether the result is "entirely constant" meaning, that it positively only contains constant_*2t expressions and defined functions (such as typecasts and no signed overflows). This predicate could be extended later, no need to get it perfect right now in this PR.
As discussed in the local meetings, we could start checking other types of encoding that could be used. But first we should check how many benchmarks could be affected by this. So this PR basically prints features found in the SSA trace, we can then use benchexec to collect statistics about how many benchmarks are going to be affected.