Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/cross language #348

Merged
merged 9 commits into from Aug 17, 2018
Merged

Conversation

jjgarzella
Copy link
Contributor

Fixes #332

Major reorganization of top.py, allowing for files of mutliple languages to be used together.

Now, the frontend() function calls individual frontends, which return bitcodes. The frontend() function finally calls link_bc_files().

There are two places that now are easily customizable by language:

  • in the frontend function (the compile command, switching the entry point, etc.)
  • in a customizable build_libs() function (see the examples for C++ and FORTRAN).

Also adds modeling for C++ new and delete operators, fixing #338.

Note: this currently breaks the boogie frontend's concatenating functionality, so that SMACK only accepts one boogie file. I didn't think this was very important (concatenating files is better done via shell), but I can put it back in if we think it's worth it.

@shaobo-he
Copy link
Contributor

Can we split top.py into multiple files? This monolithic file seems cumbersome to maintain and modify.

@zvonimir zvonimir requested a review from shaobo-he July 25, 2018 20:07
add_libs(lang)
frontend = frontends()[lang]
for input_file in args.input_files:
bitcodes.append(frontend(input_file,lang))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should maybe be implemented the same as below:

bitcode = frontends()[lang](input_file,args) 

bitcodes.append(bitcode)

if len(bitcodes) == 0: # for specialized single-file frontends
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation is wrong.

add_libs(lang)
bitcode = frontends()[lang](input_file,args)
if bitcode is not None:
bitcodes.append(bitcode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation is wrong.

#ifdef __cplusplus

void * operator new(size_t size) {
return malloc(size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong indentation.

#include <cstddef>
#include <new>

extern void * malloc(size_t size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove space after *.

extern void free(void * p);

void * operator new(size_t size) throw(std::bad_alloc) {
return malloc(size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong indentation.

}

void operator delete(void * p) throw() {
free(p);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong indentation.

extern void * malloc(size_t size);
extern void free(void * p);

void * operator new(size_t size) throw(std::bad_alloc) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove space after *.

top.py is now structured so that frontend()
calls [language]_frontend(), which returns
a bitcode, and then frontend() calls
link_ll_files().

So far, only the C, C++, FORTRAN, D, and OBJC,
and LLVM frontends have been modified. BOOGIE,
SVCOMP, and JSON will not work in this commit.
Adds modeling for the C++ new and delete
operators. This requires a specialized
build_libs for C++.

Done in this branch because the new structure
makes it much easier than hacking these
changes into an older branch.
Fixes the Boogie, Json, and SVCOMP
frontends so that they only accept a
single file. For these frontends, taking
more than one file doesn't make sense.

This breaks the behavior where SMACK
would concatenate boogie files and verify them
all. Currently, SMACK will only verify one
boogie file at a time.
Made frontend logic more explicit, showing
which frontends are "noreturning" - meaning
they only accept one file and don't return
bitcode for the rest of top.py to compile.

Minor whitespace fixes
@jjgarzella
Copy link
Contributor Author

@shaobo-he @zvonimir

Originally, I thought a way to do this would be to extract out all of the custom frontends/build_libs into some sort of a frontends.py/build_libs.py. In order to do this, though, we would need to move everything in the first half of the file, making that name misleading.

I recommend one of two things:

  1. We split top.py up into two files, frontend.py and backend.py. Essentially, everything beneath link_bc_files() (line 542) would go into backend, and everything above it into frontend.

  2. We extract most of the top half of the current top.py into a frontend.py, but keep the frontend() and argument parsing functions. Then, we would have top.py and frontend.py

The only difference between these two is whether or not we want to factor out the argument parsing code and basic frontend logic as well - it's more of an attitude decision than anything.

@zvonimir
Copy link
Member

I think I like option (2) better.

For frontends with a non-"main" entry-point,
it no longer makes sense to replace "main"
with this entry point. Fixes a bug where
if any FORTRAN file is included, SMACK assumes
that the FORTRAN is the entry point for the program.

Minor whitespace fix
Move all the vast majority of frontend
logic, including anything language-specific,
to a new file called frontend.py.

Also, fix a (presumed) bug where smack_headers()
uses a missing "args" parameter. I have no idea
how this wasn't caught in daily SMACK usage.
#ifdef __cplusplus

void * operator new(size_t size) {
return malloc(size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong indentation here. I think you maybe used tab instead of spaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny thing. This code should not be here - either I forgot to delete it or I botched something with git.

As smack.c was not being compiled into C++ anyways,
this has never been used. It is completely useless
now that smack.cpp exists.
Copy link
Contributor

@michael-emmi michael-emmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of questions and ideas.

if lang in ['BOOGIE', 'SVCOMP', 'JSON']:
noreturning_frontend = True

add_libs(lang)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you guys have thought a bit about mixing smack modules meant for different languages, e.g., C and C++. Is it clear how to prevent naming conflicts in the generated bitcodes, e.g., when both languages want to define the same symbol, such as __VERIFIER_nondet_int?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I don't think this problem is possible in the current implementation. If both C and C++ define and implement a function with the same name, then the whole program wouldn't compile normally anyways. If both C and C++ want to use __VERIFIER_nondet_int, or some other such symbol in smack.h, then they both import smack.h, and when compiled, each source file is compiled and will have a declare void .... external declaration in it's IR. Then, smack.c is compiled seperately, contains this function's implementation, and when all the files are linked everyone is happy. The key point is that the libraries are not necessarily the same language as the source. smack.c is never compiled as c++, and never should be. As another example, when I verify kotlin examples I need to use a C++ file and link it against the kotlin source in LLVM.

There is a problem here, though, maybe this is what you were getting at, the data structure of libs is a Python list and not a set. Duplicate libraries could be added, which would break things. I will fix this promptly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not worried about the particulars of your Python code — though it’s a reasonable concern! — but about the following question: should we permit the case in which some function(s) have per-language implementations in cross-language projects?

I am not sure what the right answer is; my original thought was that each language can have its own header for a given function, but will ultimately link to the same implementation. But now I’m not sure that functions with the same name (e.g., exit) in different languages are supposed to have the same behavior.

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I understand now. In my opinion, the difference is really between symbols that are declared by SMACK, and symbols that are language-specific but are modeled by SMACK.

For language-specific symbols, I don't see how we can get around different implementations. For example, array and string implementations vary wildly between languages, and basically anything else in a standard library must be rewritten for every language (though maybe there is a way to double-dip for OBJC and swift, because they both use foundation?).

For SMACK-specifics like assert, assume, and the nondet_* family of functions, we are free to require the same implementation across languages, and I believe we should do so, because the basic functionality of verifier primitives will be the same across all languages.

Maybe this is the problem we're having in #359. smack.h and smack.c should contain only smack-declared symbols. All other functions (e.g. exit, calloc) should be in seperate modeling files like stdlib.c, string.c, etc. When we generalize our share/smack/lib/ folder, these modeling files can be demoted to only the C programming language. But smack.c contains the unique, cross-language implementations of basic verifier primitives.

Of course, like you said, we can have language-specific headers without much trouble. My pet example is changing the header so that __VERIFIER_assert takes BOOL as it's parameter for objc, uses bool for C++, and has _Bool for everything else (e.g. with _Bool swift will bridge to its boolean type and not its Int type).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think I mostly agree with this point of view. Although I still wonder whether the ObjectiveC-Swift situation that you mention also surfaces, e.g., between C and C++.

from utils import temporary_file, try_command
from svcomp.utils import svcomp_frontend

def languages():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One wonders whether a more modular approach, in which the extension for each language is contained in its own module, might be clearer and more easily extensible.

Essentially, there would be some frontends/common.py module, which contains definitions for common things like smack_root and default_libs. But then each language-specific frontend would show up in its own module, e.g., frontends/cxx.py and frontends/fortran.py. Each such module could define a functions according to some standard interface, e.g., file_extensions returns the list of file extensions on which it works, build_libs defines the extra libraries, compile generates byte code from the given input file, etc.

This design might make things a bit clearer, and also facilitate adding new languages easily. In principle, you could even add a language without modifying the aggregator of front-ends, e.g., if it globbed frontends/*.py and imported each one, then just called their respective file_extensions and compile functions.

Maybe you guys won’t be a fan of this approach, but I think anything that helps modularity is worth considering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually like this approach, and I think we should do it. However, in light of a few things I think we should kick this can down the road, and not include it in this PR.

First, I think you've described a good way to generalize the frontend code. But it will probably be good to also generalize the file structure of /share/smack/include and /share/smack/lib/, and we haven't thought a lot about this yet. Certainly it wouldn't be too hard, but this PR is starting to drag on.

Second, at 276 LOC, frontends.py hasn't fully achieved "giant monolith" status yet, although I imagine it will not take very long.

Third, we have an entire frontend, Rust, that's stuck in a branch and hasn't merged yet. I think waiting until that's in would make that merge easier, and would make the following changes more informed (what if rust has special considerations?).

So, to recap, I think we should start by getting these changes in, then expand and test the functionality for a short time (in particular, until rust is merged in, and we've worked on C++ issues, e.g. #361). Then we should make the changes you're talking about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me 👍

Semicolons are used in names of a few Kotlin
examples.
Make the data structure in top.py which keeps
track of language-specific libraries be a set,
and not a list. Without this, using multiple
non-c files would cause a library to be compiled
and linked twice, probably breaking things.
@zvonimir zvonimir merged commit d4e4ee8 into smackers:develop Aug 17, 2018
bjohannesmeyer added a commit to PLSysSec/FaCT that referenced this pull request Sep 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants