Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate better .path files #473

Closed
wants to merge 1 commit into from
Closed

Conversation

jirislaby
Copy link
Contributor

We need witnesses for the upcoming SV-COMP. For that purpose we need to know the path through the source file. I.e. sequence of code lines how to get to an erroneous location. I implemented it though the generated .path files. Instead of emitting only a number 0/1 (false branch/true branch), I also put there file name and line number.

I guess this won't be merged as-is. Rather I would like to hear some feedback how you prefer to do it?

@ccadar
Copy link
Contributor

ccadar commented Sep 29, 2016

Hi @jirislaby , I'm happy to see this feature being tackled -- in fact I have already opened an issue about this a long time ago (see #55) but never managed to do it. I won't have time to review it just now, but I'll take a look as soon as I can, as I'd like to get the details right.

@jirislaby
Copy link
Contributor Author

On 09/29/2016, 01:42 PM, Cristian Cadar wrote:

Hi @jirislaby https://github.com/jirislaby , I'm happy to see this
feature being tackled -- in fact I have already opened an issue about
this a long time ago (see #55 #55)

Haha, it just reminds me I opened #273 one year ago :).

BTW, in the current state, it breaks all the travis builds. It's
obviously because of the change of .path files format. This can be
easily fixed by introducing a command-line switch to turn the generation
on only optionally.

P.S. we successfully used the output in the last year SV_COMP (converted
from .path to XML by an external tool).

js

@andreamattavelli
Copy link
Contributor

andreamattavelli commented Sep 30, 2016

BTW, in the current state, it breaks all the travis builds. It's obviously because of the change of .path files format. This can be easily fixed by introducing a command-line switch to turn the generation on only optionally.

@jirislaby It is less straightforward than that. The current test fails because the replay functionality of KLEE does not work anymore with the new encoding of the .path files.
I would personally push to modify the loading of the .path files (main.cpp:551) to support the new encoding rather then add a new option to KLEE.

@jirislaby jirislaby force-pushed the better-paths branch 6 times, most recently from 8ae9dd8 to 01c4a0a Compare October 6, 2016 08:48
@andreamattavelli
Copy link
Contributor

@jirislaby Thanks for your improved patch! I have only a couple of questions/comments:

  1. I personally don't like the term way, and I would prefer branch
  2. I wonder if the while in TreeStreamWriter::readStream:176 can loop forever, since if tellg fails it returns -1 (although we should have already performed checks on the input stream)

@jirislaby
Copy link
Contributor Author

On 10/07/2016, 02:53 PM, Andrea Mattavelli wrote:

@jirislaby https://github.com/jirislaby Thanks for your improved
patch! I have only a couple of questions/comments:

  1. I personally don't like the term |way|, and I would prefer |branch|

Fixed.

  1. I wonder if the |while| in |TreeStreamWriter::readStream:176| can
    loop forever, since if |tellg| fails it returns -1 (although we
    should have already performed checks on the input stream)

Theoretically, it can loop infinitely, of course. If we want to avoid
tellg, I would switch from PathLocation constructor to a function
returning size of read data. (Instead of checking tellq retval.)

But if there is some problem anywhere (like one which would have caused
tellq to return -1), unsigned 'size' won't hit 0, but will overflow
anyway. We would need to switch to signed 'size'.

So whatever you prefer, I will rewrite the change accordingly :).

thanks,

js

@andreamattavelli
Copy link
Contributor

But if there is some problem anywhere (like one which would have caused tellq to return -1), unsigned 'size' won't hit 0, but will overflow anyway. We would need to switch to signed 'size'.

Right, but I would just check for the correctness of the returned value, and terminate. If something went wrong, there's nothing we can do, right? You might want to consider using klee_error, which displays the error message and then terminates the execution.

@jirislaby
Copy link
Contributor Author

I eliminated tellg completely :).

@andreamattavelli
Copy link
Contributor

I eliminated tellg completely :).

@jirislaby Great! LGTM, but I give @ccadar the opportunity to comment.

@@ -18,6 +18,19 @@ namespace klee {
typedef unsigned TreeStreamID;
class TreeOStream;

struct PathLocation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jirislaby This is a good idea but implementation wise you'll have a problem according to memory usage and performance.

Before you had one char per decision to make,
now you have: 1 char, a string potentially quite long, an int.
Normally, you'll generate a lot of entries per state.

Instead, I recommend you to use a reference to KInstruction and during printing the test case access that information.

Beside that, idea is great.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MartinNowack It's a good point, and some time ago I implemented this feature using KInstruction. The main issue (but maybe is due to my limited development skills) is the replay feature: do you want to instantiate (reading from the file!) the KInstruction? I don't think it is a good idea, but maybe you already have a better option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, we discard everything from the .path file during replay except for the branch decision. So we're paying the price for extra memory consumption, without getting any benefits. As I mentioned before, we should indeed read the location back and use it to double-check that the replay proceeds correctly. However, we could leave this for later and just change PathLocation to have a reference to a KInstruction. @jirislaby , would you be happy with this decision?

@ccadar
Copy link
Contributor

ccadar commented Oct 12, 2016

Thanks again, @jirislaby .

I agree with the current decision, argued by @andreamattavelli , of breaking backwards-compatibility with the old .path format, as I don't think it was used much.

I agree with @MartinNowack regarding memory consumption. It could be addressed as suggested, although there are advantages to be able to just load a vector of PathLocation when reading back files. Right now this doesn't happen, but it would be useful to check during replay that the expected branches are followed. As a solution, I guess we could search for a KInstruction with a certain location when loading the path files.

@andreamattavelli
Copy link
Contributor

@jirislaby Your code works just fine (but I guess you already know it). Unfortunately, on large runs the cost of dumping paths is tremendous: 30x-50x.
Something we could think about is to compress the .path files since most of the strings there are simply replicated. I might look into it.

@jirislaby
Copy link
Contributor Author

I appreciate any help to move this forward :). I haven't had enough time to look into it yet.

@jirislaby
Copy link
Contributor Author

jirislaby commented Jan 21, 2017

Looking into this again, I don't quite understand how do you want me to proceed. Compressing suggested by @andreamattavelli can be clearly done, there should be no issue with that. But adding a KInstruction reference to PathLocation does not make much sense, because PathLocation has to be serialized to TreeStream and deserialized later from the stream to the .path file. How would I de/serialize KInstruction in PathLocation?

@andreamattavelli
Copy link
Contributor

Hi @jirislaby, sorry for the late response.

Looking into this again, I don't quite understand how do you want me to proceed. Compressing suggested by @andreamattavelli can be clearly done, there should be no issue with that.

Yes, I suppose it is doable (based on infinite effort :) ).

But adding a KInstruction reference to PathLocation does not make much sense, because PathLocation has to be serialized to TreeStream and deserialized later from the stream to the .path file. How would I de/serialize KInstruction in PathLocation?

As I mentioned in my comment before, this is tricky. What I would attempt to do now is to just focus on serializing the PathLocation with the KInstruction, which is easy: the instruction has a pointer to its info (id, file, line and assemblyLine).
The deserialization is the main issue, but as @ccadar mentioned the instruction is not currently used to check that the execution is following the same instructions as before. We can therefore deserialize just the branch condition (thus throwing away the remaining part of the string).
Does it sound good?

@gladtbx
Copy link
Contributor

gladtbx commented Feb 13, 2017

I do want to point out that using only '0' and '1' as branch direction is not enough.
Current klee implementation does not record switchInst correctly. I don't use the replay functionality so I don't know how it is handled when doing replays.

I had similar need as the @jirislaby had, track down the execution path (source code line by line) for both correct and erroneous executions.

I extended klee with my own function, that processes the bytecode to form a graph, and reads in the path info at the same time .ktest files are generated. Using both path info and the bytecode graph I got what I needed.

My question is why do we have to stay with using '0' and '1' to record the branch decisions.

@ccadar
Copy link
Contributor

ccadar commented Feb 13, 2017

Hi @gladtbx , this is what this PR is trying to accomplish.

@jirislaby , I have some free cycles this week, and I could try to amend this along the lines we discussed, but only if it's not already on your todo list for the week :)

@gladtbx
Copy link
Contributor

gladtbx commented Feb 13, 2017

Do we have to dump the branch info to files along the execution? And do we have to dump the info as string?

I wonder because I would like to use some of the existing path info when picking new executionstate to advance. Dumping and reading from file is kind of painful to me. However I am OK with whatever decision you make. Thanks for your work @jirislaby .

@gladtbx
Copy link
Contributor

gladtbx commented Feb 13, 2017

As I mentioned in my comment before, this is tricky. What I would attempt to do now is to just focus on serializing the PathLocation with the KInstruction, which is easy: the instruction has a pointer to its info (id, file, line and assemblyLine).
The deserialization is the main issue, but as @ccadar mentioned the instruction is not currently used to check that the execution is following the same instructions as before. We can therefore deserialize just the branch condition (thus throwing away the remaining part of the string).
Does it sound good?

Another option is only record the branch decisions when doing serialization. Path info can be added at deserialization time. I have my code implemented this way. It avoids the overhead of recording KInstructions mutliple times, at a cost of additional processing time for each test case.

@jirislaby
Copy link
Contributor Author

@jirislaby , I have some free cycles this week, and I could try to amend this along the lines we discussed, but only if it's not already on your todo list for the week :)

Definitely not this week :). But I can test whatever you come up with.

@jirislaby
Copy link
Contributor Author

I have my code implemented this way

@gladtbx is the code available somewhere?

@ccadar
Copy link
Contributor

ccadar commented Feb 22, 2017

Sorry, it looks like I overestimated my free time. I might still do this at some point, but cannot promise a due date. But I'm happy to help with a review, I think we have now agreed on how to implement this.

We need witnesses for the upcoming SV-COMP. For that purpose we need
to know the path through the source file. I.e. sequence of code lines
how to get to an erroneous location. I implemented it though the
generated .path files. Instead of emitting only a number 0/1 (false
branch/true branch), I also put there file name and line number.

This implements issue klee#55.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
@ccadar ccadar changed the title generate better .path files Generate better .path files Jun 19, 2020
@ccadar ccadar added this to Extension in KLEE Extensions Jun 19, 2020
@ccadar
Copy link
Contributor

ccadar commented Jun 19, 2020

Given the age of the PR, I'll close it and move it to the KLEE Extensions project https://github.com/klee/klee/projects/4

@ccadar ccadar closed this Jun 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants