-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC for improvement: (1) add support for GCC compilation. (2) This may be a red herring, but we may want to eliminate lines from the end, NOT fromt he begining. #109
Comments
Hey! First, thank you very much for your issue and the GCC bug report.
Well, taking 4 days is too much on a reasonably fast machine, my experience is that it should finish within a couple of hours. Anyway, please attach here the original test-case and the interestingness test, please.
Speaking about the fastest reduction (at the beginning):
This is a nice hint, but it's not typically necessary as the binary passes begin with bigger optimization chunks that is later on reduced to a smaller one. |
You are very welcome, and thank you for sharing and improving this useful code.
Yes, four days was a bit tough on a PC that is shared between work/hobby. Original preprocessed file: I copied the file to a local file named "target.cpp". My interestingness test script: Using this I tried cvise with my check shell script like this. cvise --debug --print-diff THE-CURRENT-DIRECTORY/reduce-test.sh target.cpp If you can figure the obvious issues from my interestingness test, I would love to hear them. Actually, for the initial three days, I interrupt the cvise session and occasionally tinkered with invoking some pass by "--start-with-pass" option several times because of the slowness.
Maybe then I will benefit from the later c-vise that uses llvm-16.
Right, for binary passes with linked list of information, it may be not that important. However, for LinesPass::0 and friends without any guidance from the semantic information (aside from the compilation status from interestingness test), the elimination from the end of the file is a poor man's educated guess of what would be more likely to be removed without an issue and later will benefit us so that we can remove the earlier declarations which are no longer referenced by the now eliminated later lines. Thank you again for sharing this great package. |
Well, you can easily run it with:
I would recommend using a reasonable |
Then, I would recommend switching to openSUSE Tumbleweed, where I push to the official package the latest |
It might help, but as already explained, the pass first starts with the biggest chunks (entire file, first half, second half, first quarter and so forth) and that works very quickly for a reasonable number of lines of code. |
I think I better wait for llvm-16 and g++-13 to hit Debian repositories because as of now if I invoke g++-12 (and llvm-15 instaleld), I get the following error and that is why I was talking about the possible speedup due to the backward elimination direction.
The error is inside libclang.a. .
It is possible that I may not have enough RAM to create the large AST for the input file. But I doubt it. Thank you again for your detailed comments. |
Yep, that's exactly the error I faced a multiple-time in the past with LLVM 15 and older. 16G is plenty of RAM, so for now, I would recommend using Podman with openSUSE container: |
I see. Thank you. |
I am a complete newbie to c-vise, but I used c-vise more or less successfully against a very large mozilla source code.
The preprocessed file at one point was like 160K lines of code.
It took me four days on 7 cores to reduce it.
See the problem originally reported at mozilla bugzilla,
https://bugzilla.mozilla.org/show_bug.cgi?id=1825516
and at GCC bugzilla for the reduced file.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109480
I said "more or less successfully" because I could not get the semantic analyzer of c-vise to work initially until the code was reduced to approximately 65K lines of code.
I am NOT sure if this is due to the sheer size of the code, OR if c-vise does not invoke clang program by specifying g++ extension.
The preprocessed source code was originally meant for GNU G++ compiler. So it contained a few builtin functions specific to g++.
That is, I wonder if adding "g++98, g++11, g++14, g++17" and maybe "g++20" to the set of options when cvise uses one at a time to invoke clang analysis tool would help. Maybe with these added options, my reduction could use semantic information at an earlier phase.
The error I noticed returned by cvise_delta while using cvise was (-11). This was under Debian GNU/linux.
It seems to be EAGAIN. Can that mean the memory issue or something? I have enough swap space, though.
(2) Start Elimination of lines during LinesPass::0 from the END of the file (or for that matter even for ClangBinarySearchPass::replace-function-def-with-decl, ClangBinarySearchPass::remove-unused-function, ClangPass::remove-unused-function, etc.)
Since the reduction of 160K lines of code when the semanatic analysis does not work was so slow,
I added --print-diff to see what kind of progress, if any, was being made.
I noticed that LinesPass:0 eventually degrades into eliminating a few lines at a time consecutively but it starts from the beginning of the file.
Well, that is natural. We count form 0, 1, ....
However, for eliminating a small number of consecutive lines from a C source file, and to a lesser dgree from a C++ source file,
I think start eliminating the lines at the end is more productive. That is my gut feeling.
The reason is quite simple.
If you eliminate lines from the beginning you are likely to remove typedefs, variable declarations and such that are quite likely to cause compilation errors later in the file.
On the other hand, starting the elimination from the end of the file is likely to cause less compilation errors, and is likely to help us remove the declarations that are no longer referenced by the later eliminated code.
That is the guess.
It may not matter when the semantic analysis works like a charm, but for my 160K lines of code, not having the semantic help and
looking at the repetition of elimination from the start of the file caused me to think very hard how I could make the process even an iota faster, and I concluded the elimination from the end of the file would be a win.
These two points are just a thought after my struggle with 160K lines of code to reduce it.
Thank you for sharing the great software with the wide developer community.
EDIT: I wrote 65 lines of code, which should read 65K lines of code. Minor fixes for expressions, etc.
The text was updated successfully, but these errors were encountered: