Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly update symbolic variables that have been changed externally #1407

Merged
merged 6 commits into from
Feb 29, 2024

Conversation

MartinNowack
Copy link
Contributor

Before, external changes to symbolic variables have not been propagated
back to their internal representation.

Do a byte-by-byte comparison and potential update if required.

The test case is based on the example provided by Mingyi Liu from the
mailing list.

Summary:

Checklist:

  • The PR addresses a single issue. If it can be divided into multiple independent PRs, please do so.
  • The PR is divided into a logical sequence of commits OR a single commit is sufficient.
  • There are no unnecessary commits (e.g. commits fixing issues in a previous commit in the same PR).
  • Each commit has a meaningful message documenting what it does.
  • All messages added to the codebase, all comments, as well as commit messages are spellchecked.
  • The code is commented OR not applicable/necessary.
  • The patch is formatted via clang-format OR not applicable (if explicitly overridden leave unchecked and explain).
  • There are test cases for the code you added or modified OR no such test cases are required.

@codecov
Copy link

codecov bot commented May 11, 2021

Codecov Report

Merging #1407 (7021a89) into master (a01f46c) will increase coverage by 0.01%.
The diff coverage is 70.96%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1407      +/-   ##
==========================================
+ Coverage   70.19%   70.21%   +0.01%     
==========================================
  Files         162      162              
  Lines       19395    19405      +10     
  Branches     4630     4634       +4     
==========================================
+ Hits        13615    13625      +10     
- Misses       3741     3743       +2     
+ Partials     2039     2037       -2     
Files Coverage Δ
lib/Core/AddressSpace.h 100.00% <ø> (ø)
lib/Core/Executor.h 77.27% <ø> (ø)
lib/Core/Memory.cpp 79.54% <100.00%> (+1.34%) ⬆️
lib/Core/Memory.h 84.90% <ø> (ø)
lib/Core/Executor.cpp 76.96% <88.88%> (+0.03%) ⬆️
lib/Core/AddressSpace.cpp 74.61% <52.94%> (-1.35%) ⬇️

@MartinNowack MartinNowack force-pushed the fix_concretizing_external_symbols branch from 92d93b8 to 2e3ebe0 Compare May 12, 2021 00:43
@MartinNowack MartinNowack force-pushed the fix_concretizing_external_symbols branch 3 times, most recently from f3550f4 to 24df55c Compare June 7, 2021 16:19
@ccadar
Copy link
Contributor

ccadar commented Nov 3, 2021

@MartinNowack thanks for this. I have finally looked at this patch, which is a bit tricky (in particular, I find the ObjectState class to be poorly documented, I'm currently doing a pass over it).

Here's how I would fix this: if the object was modified by the external call (if (memcmp(address, os->concreteStore, mo->size) != 0)) then we just need to mark it as fully concrete from then on and copy all the bytes. So essentially I would add before the existing memcpy(wos->concreteStore, address, mo->size); a call to wos->initializeToZero();.

Does this make sense?

@MartinNowack
Copy link
Contributor Author

@ccadar Thanks for the review and suggestion. One goal of the implementation was to keep as much symbolic information as possible.
I was thinking about this, the downside from this suggestion is if larger memory objects are used for the external functions call, i.e. pointers into larger pre-allocated buffers.
Of course, with external functions anyway, we loose some of the symbolic information in the first place.

@MartinNowack MartinNowack force-pushed the fix_concretizing_external_symbols branch from 24df55c to dd3f8b7 Compare January 5, 2022 16:33
@MartinNowack MartinNowack force-pushed the fix_concretizing_external_symbols branch 2 times, most recently from b8defb8 to 8133382 Compare February 5, 2024 10:10
@MartinNowack
Copy link
Contributor Author

That one is ready to be merged.

@MartinNowack
Copy link
Contributor Author

@ccadar Please merge this.

@ccadar
Copy link
Contributor

ccadar commented Feb 17, 2024

Hi @MartinNowack, thanks for this patch and sorry for the delay. However, this patch is not straightforward at all, despite its small size.

As I said, to me the straightforward solution I suggested in #1407 (comment) makes most sense, but you seem to suggest that a more complex one is needed, and I want to understand why.

Let me start by adding some review comments.

// functions. Do a byte-by-byte comparison and update if needed.
for (size_t i = 0, ie = mo->size; i < ie; ++i) {
u_int8_t updated_value = *(address + i);
if (updated_value != wos->concreteStore[i]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why this if statement is needed. Can you explain? You cannot rely at this point that wos->concreteStore[i] is valid, can you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is where all the magic happens:

wos->concreteStore[i] contains the concrete object-state representation before the call and is copied to the external address before the call. This includes the concrete values representing symbolic expressions.
update_value represents the current external value at location i after the call. If update_value and wos->concreteStore[i] differ, we know that the external function has changed the object, so we do need to propagate this back to the object. This is especially important, if the external function has changed the value that has been previously represented by a symbolic value. In this case we need to update the object state accordingly, therefore we call the wos->write8() function. This has been handled incorrectly by KLEE before, i.e. not updating the information that a value is not symbolic anymore, only updating the concreteStore representation. This lead to subsequent reads of that object still using the symbolic values instead of the concrete one.

In summary, we have to update the object state's meta-data one way or another.

The major drawback of your solution is that always the full object will be concretised even if only one byte has changed externally.

This PR preserves all the symbolic information of an object that has not been modified by external functions but updates everything else accordingly.
This shouldn't have a higher performance impact for smaller objects.

The performance impact of your suggestion depends on the original state of the object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MartinNowack thanks for the detailed explanation. I would add a (shorter) comment along those lines in the text -- the key part would be the first sentence, I think.

This being said, while your solution is more performant in some cases, it over-approximates things. That's because it can happen that the external call happened to write the same values for some bytes, and keeping those symbolic would lead to inconsistencies. It is easy to construct such an example, and might not be so unlikely in practice given that you do this at the byte granularity (and some values, such as 0, might be quite common).

One solution would be to do this only under the over-approx policy I introduced in #1696, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ccadar Indeed this is a problem. The only upside is that it's less likely to have an impact as the concretisation (i.e. using the results of the solver call) still has the same result. Therefore, recurring queries with the same constraint set would lead to the same results.

I like the over-approx solution, that would make a cleaner design and we could switch between both implementations easily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, then let's guard this by the --over-approx flag once #1696 is merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good plan!

if (os->readOnly) {
return false;
} else {
ObjectState *wos = getWriteable(mo, os);
memcpy(wos->concreteStore, address, mo->size);
if (wos->unflushedMask) {
// Handle the case, that objects have been modified by external
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Handle the case, that objects" -> "Handle the case when symbolic objects"

// RUN: rm -rf %t.klee-out
// RUN: %klee --output-dir=%t.klee-out -external-calls=all %t.bc > %t.log
// RUN: FileCheck -input-file=%t.log %s
// REQUIRES: not-darwin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not Darwin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the specific printf feature is not supported under Darwin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then can we choose something more portable, maybe sscanf?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder about this

int b;
int c;

// Symbolize argument that gets constified by the external call
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constified -> concretized

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder about this

@@ -199,7 +199,7 @@ const UpdateList &ObjectState::getUpdates() const {
void ObjectState::flushToConcreteStore(TimingSolver *solver,
const ExecutionState &state) const {
for (unsigned i = 0; i < size; i++) {
if (isByteKnownSymbolic(i)) {
if (!isByteConcrete(i)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This seems like it could have affected other types of external calls, right?

@MartinNowack
Copy link
Contributor Author

@ccadar Thanks for the review, I added a comment to the location of interest.

@MartinNowack MartinNowack force-pushed the fix_concretizing_external_symbols branch 2 times, most recently from 1d93749 to 9b73c9c Compare February 27, 2024 21:22
/// in the concrete store.
///
/// \param executor
/// \param state containt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a misspelling

///
/// \param executor
/// \param state containt
/// \param concretize provide value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a more clear explanation? E.g., reusing the text from toConstant

// RUN: rm -rf %t.klee-out
// RUN: %klee --output-dir=%t.klee-out -external-calls=all %t.bc > %t.log
// RUN: FileCheck -input-file=%t.log %s
// REQUIRES: not-darwin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder about this

int b;
int c;

// Symbolize argument that gets constified by the external call
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder about this

state.addressSpace.resolveOne(cvalue, op) && !op.second->readOnly) {
auto *os = state.addressSpace.getWriteable(op.first, op.second);
os->flushToConcreteStore(*this, state,
ExternalCalls == ExternalCallPolicy::All);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be ExternalCalls != ExternalCallPolicy::OverApprox?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to write a test case demonstrating the bug you fixed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First comment: This is already guarded in the earlier lines by:

if (ExternalCalls == ExternalCallPolicy::All ||
        ExternalCalls == ExternalCallPolicy::OverApprox) {

I'm using the same approach as in line 4020 to be consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, nothing to do for this one then.

lib/Core/AddressSpace.h Show resolved Hide resolved
@ccadar
Copy link
Contributor

ccadar commented Feb 28, 2024

Thanks, @MartinNowack , great changes. Only a few minor issues left, see comments. The only bigger request, if feasible, would be for a test case demonstrating the bug you fixed.

…ymbolic variables

The test case is based on the example provided by Mingyi Liu from the KLEE
mailing list.
Before, external changes to symbolic variables have not been propagated
back to their internal representation.

Do a byte-by-byte comparison and update object state if required.
…bolic

Before, only partially symbolic variables have been concretized.
Now, every object that is not fully concrete is concretized correctly
this includes fully symbolic objects.
Use existing `Executor::toConstant()` function to transform a symbolic
byte of an `ObjectState` to its concrete representation.

This will also add constraints if required.
Provide an additional argument to select the concretisation policy.

Fix a bug where the concretisation of a shared memory object was visible
across different states by retrieving a writable object state first.
Propagate ExternalCallPolicy to allow user-based selection.
@MartinNowack MartinNowack force-pushed the fix_concretizing_external_symbols branch 2 times, most recently from 6d92472 to 7021a89 Compare February 29, 2024 17:52
@ccadar
Copy link
Contributor

ccadar commented Feb 29, 2024

Thanks, @MartinNowack, let's merge this one for v3.1, and keep in mind my remaining comments for future work involving external calls.

@ccadar ccadar merged commit 76a2ade into klee:master Feb 29, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants