-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
It is not possible to use JavaSymbolSolver in a multithreading context #2671
Comments
I see that the goal is to deprecate and remove More than happy to have improvements made until that happens though :) Modules and multi-threading are two of my weakest areas, so some extra support from you on these would be very gratefully received. Personally I'm a fan of the idea supporting multi-threading, I'm just not sure what is needed to make it happen! Hopefully then we can also run the test suites in parallel thus much faster!! :) |
A reply I got on chat when I asked for comment about this issue is that we would usually advise not using JP across threads whenever similar questions are asked (somewhat paraphrasing here -- note that I'm still quite new and @matozoid has been around a lot longer than I have, so I'm inclined to trust him more than I trust my own views here!). My presumption is that it would be a substantive amount of work to do well before we could have "thread-safe" being something that could actually be used to describe JavaParser. Or maybe this is an easy/trivial thing to do for someone who knows what they're doing? At the risk of being annoyingly open (and repetitive 😉) about pointing this out, personally I don't know enough to say one way or the other. Given that you have clearly gone further down this route than I have and it sounds like you are already using a modified/patched/extended version that is able to be used across threads (pending #2668), would you mind sharing some details about this and what would be involved if we were to actively pursue a thread-safe JavaParser? perhaps there is a distinction to be made between making JP thread-safe, versus something that actively blocks/breaks other people trying to add a thread-safe wrapper/extension to JP? |
There are some ""philosophical observations"" to consider:
1) if you must process 30.000 or 100.000 sources (my project) and you have a 64core+HT machine (or, 10core+HT) and ONLY 32GB of ram, it is not smart not to use all the available computing power -> sequential analysis is not a solution!
1.1) it is possible, and simple to create the AST (parsing the code) in parallel.
2) to resolve a symbol you need:
2.1) the source code
2.2) the external libraries
2.3) the rules to follow for the resolution
you suppose to remove each ""secondary caching"" mechanism used ONLY to improve the speed. 2.1) and 2.2) are READONLY objects, this means that they can be used in parallel without problems. If they are NOT already analyzed, this can be done in a synchronized way and put the result in a ""primary cache""
3) the analysis of each source code needs a TypeSolver tree. This tree is the same for each source (this is not correct, but for now it is enough), and each solver tree (assigned to each source file) refers to the SAME source code and external libraries.
This means that it is NOT responsibility of the type solver to maintain the dictionary of the symbols, (otherwise you must maintain in memory 64*2 copies of the same information), but it must use a reference to a centralized repository that contains the definitions of 2.1) and 2.2) (a single copy used by ALL solvers)
4) using this approach, you can analyze each source in parallel. To speed
up the analysis, it is possible to add some extra "caching" mechanism.
There are two contexts:
4.1) at global level, share by ALL threads
4.2) at thread level (single source)
But this is only a ""speedup" solution.
5) it is necessary to introduce also another minimal infrastructure. More or less, each ""medium/big project" is composed by 2 or more ""modules"". Each module contains a list of sources and a list of dependencies with other modules and external libraries. This means that to solve the symbols
in sources in different modules you must use different type solver trees. This means also that it is not responsibility of the parser to assign a type solver to the compilation unit, because the parser is the same. Ok, it is possible to use multiple parser configured in different way, BUT it is possible that the place where the source is parsed doesn't contain the information to create the solver tree.
Now you have two objects:
1) a type solver tree
2) the AST to analyze
You can: to assign 1) to 2) (current library implementation), to assign 2) to 1) (my current mplementation). Or you can use another object that contain a reference to the AST, a reference to the solver tree. In this way you can use the same tree multiple times.
This is the list of rules I followed to create a ""parallelizable"" version of JavaSymbolSolver. I use a ""custom" version of JavaParserTypeSolver and JarTypeSolvers, plus some ""module"" type solvers:
1) I use a "ByteCodeAnalzer" that cache the content of ALL jars, and a ByteCodeTypeSolver and ByteCodeCollectionTypeSolver as type solvers 2) I split JavaParserTypeSolver in two parts: a JavaParserPool, used to compile the sources of a single module (it is responsible to put in cache
the AST, and to collect the list source roots) and the ""new"" JavaParserTypeSolver that contains a link to the pool.
You can do ALL this WITHOUT touching the library.
BUT the library USES ""JavaParserFacade.get(typeSolver)"" (in several places) that uses a WeakHashMap that it is not thread-safe. A simple ""synchronized"" on the method ""get"" it is able to resolve 99% of the problems. It has no impact with the current implementation, and when used
in a single thread application.
I am not sure that I solved ALL problems, but I am sure that this is the correct direction.
And it is the only direction that I can take.
|
At the moment, "JavaParserFacade.get()" IS ""the problem""because
I tried to remove it, but I have not implemented this part.
The call (repeat, used in the library implementation) JavaParserFacade.get(typeSolver) it will converted in public static JavaParserFacade get(TypeSolver typeSolver) {
return typesolver.getRoot().getFactory().get(typeSolver)
} In this way, the root typeSolver contains the ""thread-local"" cache of type solvers, and with a minimum impact on the current code. The problem is to change the current typeSolver interface. But I am not sure. |
It'll take me a few more read-throughs before I can properly understand, and then intelligently reply to, what you have written... First though, I wanted to drop a note sooner rather than later to say that I really appreciate the detailed and clearly well thought out comments -- thank you! |
What is the problem you encounter in a multithreaded environment? Maarten |
I have already described the problems in the previous posts.
public V get(Object key) {
...
Entry<K,V> e = tab[index];
while (e != null) {
if (e.hash == h && eq(k, e.get()))
return e.value;
e = e.next; <========
}
return null;
} JavaParser/SymbolSolver 3.15.21 & 3.15.22. Obviously I have not idea if previous versions of the solver have the same problem or not. I am using 8, 20 or 64 thread (it depends on the hardware used). |
The current implementation of JavaSymbolSolver can be NOT used in a multithreading application. It is necessary to reimplement some classes and to control the access to some internal data structures.
The main problem is that it uses too memory: this because a type resolver is a dictionary and it is necessary to create the hierarchy of type solvers for each source code analyzed. BUT the content of this dictionary is the same for ALL source codes analyzed in parallel.
It is necessary to separate the dictionary from the typesolver: multiple type solvers, same dictionary. For example the JDK is the same for ALL.
Another problem is JavaParserFacade.get(): synchronizid this method we introduce a point of synchronization that can be avoid.
This is only the list of the major problems found.
The text was updated successfully, but these errors were encountered: