New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/cross language #348
Conversation
Can we split top.py into multiple files? This monolithic file seems cumbersome to maintain and modify. |
share/smack/top.py
Outdated
add_libs(lang) | ||
frontend = frontends()[lang] | ||
for input_file in args.input_files: | ||
bitcodes.append(frontend(input_file,lang)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should maybe be implemented the same as below:
bitcode = frontends()[lang](input_file,args)
share/smack/top.py
Outdated
bitcodes.append(bitcode) | ||
|
||
if len(bitcodes) == 0: # for specialized single-file frontends | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation is wrong.
share/smack/top.py
Outdated
add_libs(lang) | ||
bitcode = frontends()[lang](input_file,args) | ||
if bitcode is not None: | ||
bitcodes.append(bitcode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation is wrong.
share/smack/lib/smack.c
Outdated
#ifdef __cplusplus | ||
|
||
void * operator new(size_t size) { | ||
return malloc(size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong indentation.
share/smack/lib/smack.cpp
Outdated
#include <cstddef> | ||
#include <new> | ||
|
||
extern void * malloc(size_t size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove space after *
.
share/smack/lib/smack.cpp
Outdated
extern void free(void * p); | ||
|
||
void * operator new(size_t size) throw(std::bad_alloc) { | ||
return malloc(size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong indentation.
share/smack/lib/smack.cpp
Outdated
} | ||
|
||
void operator delete(void * p) throw() { | ||
free(p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong indentation.
share/smack/lib/smack.cpp
Outdated
extern void * malloc(size_t size); | ||
extern void free(void * p); | ||
|
||
void * operator new(size_t size) throw(std::bad_alloc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove space after *
.
top.py is now structured so that frontend() calls [language]_frontend(), which returns a bitcode, and then frontend() calls link_ll_files(). So far, only the C, C++, FORTRAN, D, and OBJC, and LLVM frontends have been modified. BOOGIE, SVCOMP, and JSON will not work in this commit.
Adds modeling for the C++ new and delete operators. This requires a specialized build_libs for C++. Done in this branch because the new structure makes it much easier than hacking these changes into an older branch.
Fixes the Boogie, Json, and SVCOMP frontends so that they only accept a single file. For these frontends, taking more than one file doesn't make sense. This breaks the behavior where SMACK would concatenate boogie files and verify them all. Currently, SMACK will only verify one boogie file at a time.
Made frontend logic more explicit, showing which frontends are "noreturning" - meaning they only accept one file and don't return bitcode for the rest of top.py to compile. Minor whitespace fixes
Originally, I thought a way to do this would be to extract out all of the custom frontends/build_libs into some sort of a frontends.py/build_libs.py. In order to do this, though, we would need to move everything in the first half of the file, making that name misleading. I recommend one of two things:
The only difference between these two is whether or not we want to factor out the argument parsing code and basic frontend logic as well - it's more of an attitude decision than anything. |
c0f76c9
to
ed2a161
Compare
I think I like option (2) better. |
For frontends with a non-"main" entry-point, it no longer makes sense to replace "main" with this entry point. Fixes a bug where if any FORTRAN file is included, SMACK assumes that the FORTRAN is the entry point for the program. Minor whitespace fix
Move all the vast majority of frontend logic, including anything language-specific, to a new file called frontend.py. Also, fix a (presumed) bug where smack_headers() uses a missing "args" parameter. I have no idea how this wasn't caught in daily SMACK usage.
share/smack/lib/smack.c
Outdated
#ifdef __cplusplus | ||
|
||
void * operator new(size_t size) { | ||
return malloc(size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong indentation here. I think you maybe used tab instead of spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny thing. This code should not be here - either I forgot to delete it or I botched something with git.
As smack.c was not being compiled into C++ anyways, this has never been used. It is completely useless now that smack.cpp exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of questions and ideas.
if lang in ['BOOGIE', 'SVCOMP', 'JSON']: | ||
noreturning_frontend = True | ||
|
||
add_libs(lang) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you guys have thought a bit about mixing smack modules meant for different languages, e.g., C and C++. Is it clear how to prevent naming conflicts in the generated bitcodes, e.g., when both languages want to define the same symbol, such as __VERIFIER_nondet_int
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I don't think this problem is possible in the current implementation. If both C and C++ define and implement a function with the same name, then the whole program wouldn't compile normally anyways. If both C and C++ want to use __VERIFIER_nondet_int, or some other such symbol in smack.h, then they both import smack.h, and when compiled, each source file is compiled and will have a declare void ....
external declaration in it's IR. Then, smack.c is compiled seperately, contains this function's implementation, and when all the files are linked everyone is happy. The key point is that the libraries are not necessarily the same language as the source. smack.c is never compiled as c++, and never should be. As another example, when I verify kotlin examples I need to use a C++ file and link it against the kotlin source in LLVM.
There is a problem here, though, maybe this is what you were getting at, the data structure of libs
is a Python list and not a set. Duplicate libraries could be added, which would break things. I will fix this promptly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not worried about the particulars of your Python code — though it’s a reasonable concern! — but about the following question: should we permit the case in which some function(s) have per-language implementations in cross-language projects?
I am not sure what the right answer is; my original thought was that each language can have its own header for a given function, but will ultimately link to the same implementation. But now I’m not sure that functions with the same name (e.g., exit
) in different languages are supposed to have the same behavior.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I understand now. In my opinion, the difference is really between symbols that are declared by SMACK, and symbols that are language-specific but are modeled by SMACK.
For language-specific symbols, I don't see how we can get around different implementations. For example, array and string implementations vary wildly between languages, and basically anything else in a standard library must be rewritten for every language (though maybe there is a way to double-dip for OBJC and swift, because they both use foundation?).
For SMACK-specifics like assert
, assume
, and the nondet_*
family of functions, we are free to require the same implementation across languages, and I believe we should do so, because the basic functionality of verifier primitives will be the same across all languages.
Maybe this is the problem we're having in #359. smack.h and smack.c should contain only smack-declared symbols. All other functions (e.g. exit, calloc) should be in seperate modeling files like stdlib.c
, string.c
, etc. When we generalize our share/smack/lib/
folder, these modeling files can be demoted to only the C programming language. But smack.c contains the unique, cross-language implementations of basic verifier primitives.
Of course, like you said, we can have language-specific headers without much trouble. My pet example is changing the header so that __VERIFIER_assert
takes BOOL
as it's parameter for objc, uses bool
for C++, and has _Bool
for everything else (e.g. with _Bool
swift will bridge to its boolean type and not its Int type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think I mostly agree with this point of view. Although I still wonder whether the ObjectiveC-Swift situation that you mention also surfaces, e.g., between C and C++.
from utils import temporary_file, try_command | ||
from svcomp.utils import svcomp_frontend | ||
|
||
def languages(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One wonders whether a more modular approach, in which the extension for each language is contained in its own module, might be clearer and more easily extensible.
Essentially, there would be some frontends/common.py
module, which contains definitions for common things like smack_root
and default_libs
. But then each language-specific frontend would show up in its own module, e.g., frontends/cxx.py
and frontends/fortran.py
. Each such module could define a functions according to some standard interface, e.g., file_extensions
returns the list of file extensions on which it works, build_libs
defines the extra libraries, compile
generates byte code from the given input file, etc.
This design might make things a bit clearer, and also facilitate adding new languages easily. In principle, you could even add a language without modifying the aggregator of front-ends, e.g., if it globbed frontends/*.py
and imported each one, then just called their respective file_extensions
and compile
functions.
Maybe you guys won’t be a fan of this approach, but I think anything that helps modularity is worth considering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually like this approach, and I think we should do it. However, in light of a few things I think we should kick this can down the road, and not include it in this PR.
First, I think you've described a good way to generalize the frontend code. But it will probably be good to also generalize the file structure of /share/smack/include and /share/smack/lib/, and we haven't thought a lot about this yet. Certainly it wouldn't be too hard, but this PR is starting to drag on.
Second, at 276 LOC, frontends.py hasn't fully achieved "giant monolith" status yet, although I imagine it will not take very long.
Third, we have an entire frontend, Rust, that's stuck in a branch and hasn't merged yet. I think waiting until that's in would make that merge easier, and would make the following changes more informed (what if rust has special considerations?).
So, to recap, I think we should start by getting these changes in, then expand and test the functionality for a short time (in particular, until rust is merged in, and we've worked on C++ issues, e.g. #361). Then we should make the changes you're talking about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me 👍
Semicolons are used in names of a few Kotlin examples.
Make the data structure in top.py which keeps track of language-specific libraries be a set, and not a list. Without this, using multiple non-c files would cause a library to be compiled and linked twice, probably breaking things.
Fixes #332
Major reorganization of top.py, allowing for files of mutliple languages to be used together.
Now, the frontend() function calls individual frontends, which return bitcodes. The frontend() function finally calls link_bc_files().
There are two places that now are easily customizable by language:
Also adds modeling for C++ new and delete operators, fixing #338.
Note: this currently breaks the boogie frontend's concatenating functionality, so that SMACK only accepts one boogie file. I didn't think this was very important (concatenating files is better done via shell), but I can put it back in if we think it's worth it.